# Fisseha Berhane, PhD

#### Data Scientist

443-970-2353 fisseha@jhu.edu CV Resume

### Visualizing world cities using R¶

In this post, we will create a world map that shows world cities using ggplot2. We will get the cities and their attributies from Wikipedia. To scrape the data from Wikipedia, we will use the rvest R package, which is very user friendly for web scraping.

After we get the world cities data from Wikipedia, we will use the string manipulation packages stringi and stringr.

In [40]:
library(rvest)
library(stringi)
library(stringr)
library(ggplot2)
library(ggmap)
library(maptools)
library(maps)


### Mine the data from Wikipedia and process it.¶

In [ ]:
wiki= read_html("https://en.wikipedia.org/wiki/List_of_cities_by_latitude")

cities=data.frame(c())

for(i in seq(3,19)){
table=wiki %>%
html_nodes("table") %>%
.[[i]]%>%
html_table(fill=T)
table=table[,1:5]
names(table)=c("Latitude","Longitude", "City","Province/State","Country")
cities=rbind(cities,table)
}

latlon=cities[,1:2]

latlon=str_replace_all(latlon, "[^[:alnum:]]", " ")

latlon=iconv(latlon, "latin1", "ASCII", sub="")
latlon=stri_sub(latlon,4)
lat=latlon[1]
lon=latlon[2]
lat=unlist(str_split(lat,"   "))
lon=unlist(str_split(lon,"   "))

z=as.data.frame(list(degree=c(),minute=c(),NS=c()))

for(i in 1:length(lat)){
a=lat[i]
a= unlist(str_split(a,"\\s+"))
if(length(a)>3){
if(nchar(a[1])==0){
a=a[2:length(a)]
}
}
b= as.data.frame(list(degree=as.numeric(as.character(a[1])),minute=as.numeric(as.character(a[2])),NS=a[3]))
z=rbind(z,b)
}

lat=z

z=as.data.frame(list(degree=c(),minute=c(),NS=c()))
for(i in 1:length(lon)){
a=lon[i]
a= unlist(str_split(a,"\\s+"))
if(length(a)>3){
if(nchar(a[1])==0){
a=a[2:length(a)]
}
}
a= as.data.frame(list(degree=as.numeric(as.character(a[1])),minute=as.numeric(as.character(a[2])),NS=a[3]))
z=rbind(z,a)
}
lon=z

lat[,2]=lat[,2]/60
lon[,2]=lon[,2]/60

lat[,1]=lat[,1]+lat[,2]
lon[,1]=lon[,1]+lon[,2]

for(i in 1:nrow(lat)){
if(lat[i,3]=="S"){
lat[i,1]=lat[i,1]*-1
}
}

for(i in 1:nrow(lon)){
if(lon[i,3]!="E"){
lon[i,1]=lon[i,1]*-1
}
}

Latitude=lat[,1]
Longitude=lon[,1]


#### Create base world map using ggplot2¶

In [36]:
mapWorld <- borders("world", colour="gray50", fill="lightblue") # create a layer of borders
wp<- ggplot() +   mapWorld
wp


#### Add cities to the base map¶

In [37]:
wp <- wp+ geom_point(aes(x=Longitude,y=Latitude) ,color="blue", size=3)
wp


In [39]:
wp+ ggtitle("World Cities")+theme(axis.text.y   = element_blank(),
line = element_blank(),
axis.text.x   = element_blank(),
axis.title.y  = element_blank(),
axis.title.x  = element_blank(),
panel.grid.major = element_blank(),
panel.grid.minor = element_blank(),
plot.title = element_text(vjust=1.5,size = 30,colour="Red"),
panel.border = element_rect(colour = "gray70", fill=NA, size=1))


### Interactive Map¶

The Tableau visualization below is made using the data created above with the R code. The cities are represented by points. If you hover over the circles, you can read the city and other associated attributes.

### Summary¶

In this blog post, we scrappped world cities with their latitude and longitude information from Wikipedia using the rvest package and used the packages stringi and stringr for string manipulation. Finally, we used the ggplot2 package to create the map of the world that shows world cities.