Fisseha Berhane, PhD

Data Scientist

443-970-2353 fisseha@jhu.edu CV Resume Linkedin GitHub twitter twitter

Visualizing world cities using R

In this post, we will create a world map that shows world cities using ggplot2. We will get the cities and their attributies from Wikipedia. To scrape the data from Wikipedia, we will use the rvest R package, which is very user friendly for web scraping.

After we get the world cities data from Wikipedia, we will use the string manipulation packages stringi and stringr.

Load Required Libraries

In [40]:
library(rvest)
library(stringi)
library(stringr)
library(ggplot2)
library(ggmap)
library(maptools)
library(maps)

Mine the data from Wikipedia and process it.

In [ ]:
wiki= read_html("https://en.wikipedia.org/wiki/List_of_cities_by_latitude")

cities=data.frame(c())

for(i in seq(3,19)){
table=wiki %>%
  html_nodes("table") %>%
  .[[i]]%>%
  html_table(fill=T)
table=table[,1:5]
names(table)=c("Latitude","Longitude", "City","Province/State","Country")
cities=rbind(cities,table)
}

latlon=cities[,1:2]

latlon=str_replace_all(latlon, "[^[:alnum:]]", " ")

latlon=iconv(latlon, "latin1", "ASCII", sub="")
latlon=stri_sub(latlon,4)
lat=latlon[1]
lon=latlon[2]
lat=unlist(str_split(lat,"   "))
lon=unlist(str_split(lon,"   "))

z=as.data.frame(list(degree=c(),minute=c(),NS=c()))

for(i in 1:length(lat)){
  a=lat[i]
  a= unlist(str_split(a,"\\s+"))
  if(length(a)>3){
    if(nchar(a[1])==0){
      a=a[2:length(a)]
    }
  }
  b= as.data.frame(list(degree=as.numeric(as.character(a[1])),minute=as.numeric(as.character(a[2])),NS=a[3]))
  z=rbind(z,b)
}


lat=z

z=as.data.frame(list(degree=c(),minute=c(),NS=c()))
for(i in 1:length(lon)){
  a=lon[i]
  a= unlist(str_split(a,"\\s+"))
  if(length(a)>3){
    if(nchar(a[1])==0){
      a=a[2:length(a)]
    }
  }
  a= as.data.frame(list(degree=as.numeric(as.character(a[1])),minute=as.numeric(as.character(a[2])),NS=a[3]))
  z=rbind(z,a)
}
lon=z




lat[,2]=lat[,2]/60
lon[,2]=lon[,2]/60

lat[,1]=lat[,1]+lat[,2]
lon[,1]=lon[,1]+lon[,2]

for(i in 1:nrow(lat)){
  if(lat[i,3]=="S"){
    lat[i,1]=lat[i,1]*-1
  }
}

for(i in 1:nrow(lon)){
  if(lon[i,3]!="E"){
    lon[i,1]=lon[i,1]*-1
  }
}

Latitude=lat[,1]
Longitude=lon[,1]

Create base world map using ggplot2

In [36]:
mapWorld <- borders("world", colour="gray50", fill="lightblue") # create a layer of borders
wp<- ggplot() +   mapWorld
wp

Add cities to the base map

In [37]:
wp <- wp+ geom_point(aes(x=Longitude,y=Latitude) ,color="blue", size=3) 
wp

Add some enhancements

In [39]:
wp+ ggtitle("World Cities")+theme(axis.text.y   = element_blank(),
                      line = element_blank(),
                      axis.text.x   = element_blank(),
                      axis.title.y  = element_blank(),
                      axis.title.x  = element_blank(),
                      panel.grid.major = element_blank(), 
                      panel.grid.minor = element_blank(),
                      plot.title = element_text(vjust=1.5,size = 30,colour="Red"),
                      panel.border = element_rect(colour = "gray70", fill=NA, size=1))

Interactive Map

The Tableau visualization below is made using the data created above with the R code. The cities are represented by points. If you hover over the circles, you can read the city and other associated attributes.

Summary

In this blog post, we scrappped world cities with their latitude and longitude information from Wikipedia using the rvest package and used the packages stringi and stringr for string manipulation. Finally, we used the ggplot2 package to create the map of the world that shows world cities.





comments powered by Disqus