# Fisseha Berhane, PhD

#### Data Scientist

443-970-2353 fisseha@jhu.edu CV Resume

## Visualizing murder rates in the US¶

Letâ€™s visualize murder rate in the US using ggplot2 (a package I love). The data used here is provided by the U.S. Census Bureau and the FBI. Details about the data can be found here

In [12]:
library(ggplot2)
library(maps)
library(ggmap)
require(downloader)

statesMap = map_data("state")  # use world in lieu of state for world map
str(statesMap)

'data.frame':	15537 obs. of  6 variables:
$long : num -87.5 -87.5 -87.5 -87.5 -87.6 ...$ lat      : num  30.4 30.4 30.4 30.3 30.3 ...
$group : num 1 1 1 1 1 1 1 1 1 1 ...$ order    : int  1 2 3 4 5 6 7 8 9 10 ...
$region : chr "alabama" "alabama" "alabama" "alabama" ...$ subregion: chr  NA NA NA NA ...


We can now plot the states using ggplot2

In [15]:
ggplot(statesMap, aes(x = long, y = lat, group = group)) +
geom_polygon(fill = statesMap$group, color = "black") + theme(axis.title.y = element_text(colour="grey20", size=15,angle=90,hjust=.5,vjust=1,face="plain"), axis.title.x = element_text(colour="grey20",size=15,angle=0,hjust=.5,vjust=1,face="plain"), axis.text.y = element_text(colour="grey20",size=15,angle=0,hjust=1,vjust=0,face="plain"), axis.text.x = element_text(colour="grey20",size=15,angle=60,hjust=.5,vjust=.5,face="plain"))  Out[15]: Now, letâ€™s download the murder data. In [16]: url<-"http://courses.edx.org/asset-v1:MITx+15.071x_2a+2T2015+type@asset+block/murders.csv" download(url,dest="murders.csv") murders = read.csv("murders.csv") str(murders)  'data.frame': 51 obs. of 6 variables:$ State            : Factor w/ 51 levels "Alabama","Alaska",..: 1 2 3 4 5 6 7 8 9 10 ...
$Population : int 4779736 710231 6392017 2915918 37253956 5029196 3574097 897934 601723 19687653 ...$ PopulationDensity: num  94.65 1.26 57.05 56.43 244.2 ...
$Murders : int 199 31 352 130 1811 117 131 48 131 987 ...$ GunMurders       : int  135 19 232 93 1257 65 97 38 99 669 ...
$GunOwnership : num 0.517 0.578 0.311 0.553 0.213 0.347 0.167 0.255 0.036 0.245 ...  Now, letâ€™s create a new variable called region with the lowercase names to match the statesMap. In [17]: murders$region = tolower(murders$State)  We have to join the statesMap data and the murders data into one dataframe to use ggplot2. In [18]: murderMap = merge(statesMap, murders, by="region") str(murderMap)  'data.frame': 15537 obs. of 12 variables:$ region           : chr  "alabama" "alabama" "alabama" "alabama" ...
$long : num -87.5 -87.5 -87.5 -87.5 -87.6 ...$ lat              : num  30.4 30.4 30.4 30.3 30.3 ...
$group : num 1 1 1 1 1 1 1 1 1 1 ...$ order            : int  1 2 3 4 5 6 7 8 9 10 ...
$subregion : chr NA NA NA NA ...$ State            : Factor w/ 51 levels "Alabama","Alaska",..: 1 1 1 1 1 1 1 1 1 1 ...
$Population : int 4779736 4779736 4779736 4779736 4779736 4779736 4779736 4779736 4779736 4779736 ...$ PopulationDensity: num  94.7 94.7 94.7 94.7 94.7 ...
$Murders : int 199 199 199 199 199 199 199 199 199 199 ...$ GunMurders       : int  135 135 135 135 135 135 135 135 135 135 ...
$GunOwnership : num 0.517 0.517 0.517 0.517 0.517 0.517 0.517 0.517 0.517 0.517 ...  Now, letâ€™s plot the number of murder on our map of the United States. In [20]: ggplot(murderMap, aes(x = long, y = lat, group = group, fill = Murders)) + geom_polygon(color = "black") + scale_fill_gradient(low = "skyblue", high = "blue", guide = "legend")+ theme(axis.title.y = element_text(colour="grey20", size=15,angle=90,hjust=.5,vjust=1,face="plain"), axis.title.x = element_text(colour="grey20",size=15,angle=0,hjust=.5,vjust=1,face="plain"), axis.text.y = element_text(colour="grey20",size=15,angle=0,hjust=1,vjust=0,face="plain"), axis.text.x = element_text(colour="grey20",size=15,angle=60,hjust=.5,vjust=.5,face="plain"))  Out[20]: Now, letâ€™s see a map of the population. In [21]: ggplot(murderMap, aes(x = long, y = lat, group = group, fill = Population)) + geom_polygon(color = "black") + scale_fill_gradient(low = "gray", high = "black", guide = "legend")+ theme(axis.title.y = element_text(colour="grey20",size=15,angle=90,hjust=.5,vjust=1,face="plain"), axis.title.x = element_text(colour="grey20",size=15,angle=0,hjust=.5,vjust=1,face="plain"), axis.text.y = element_text(colour="grey20",size=15,angle=0,hjust=1,vjust=0,face="plain"), axis.text.x = element_text(colour="grey20",size=15,angle=60,hjust=.5,vjust=.5,face="plain"))  Out[21]: Now, letâ€™s create a new variable which is the number of murders per 100,000 population. In [22]: murderMap$MurderRate = murderMap$Murders / murderMap$Population * 100000


Now, letâ€™s generate a plot of the murder rate.

In [23]:
ggplot(murderMap, aes(x = long, y = lat, group = group, fill = MurderRate)) +
geom_polygon(color = "black") + scale_fill_gradient(low = "skyblue", high = "blue", guide = "legend")+
theme(axis.title.y = element_text(colour="grey20",size=15,angle=90,hjust=.5,vjust=1,face="plain"),
axis.title.x = element_text(colour="grey20",size=15,angle=0,hjust=.5,vjust=1,face="plain"),
axis.text.y = element_text(colour="grey20",size=15,angle=0,hjust=1,vjust=0,face="plain"),
axis.text.x = element_text(colour="grey20",size=15,angle=60,hjust=.5,vjust=.5,face="plain"))

Out[23]:

Letâ€™s remove states with murder rates above 10.

In [24]:
ggplot(murderMap, aes(x = long, y = lat, group = group, fill = MurderRate)) +
geom_polygon(color = "black") + scale_fill_gradient(low = "skyblue", high = "blue", guide = "legend", limits = c(0,10))+
theme(axis.title.y = element_text(colour="grey20",size=15,angle=90,hjust=.5,vjust=1,face="plain"),
axis.title.x = element_text(colour="grey20",size=15,angle=0,hjust=.5,vjust=1,face="plain"),
axis.text.y = element_text(colour="grey20",size=15,angle=0,hjust=1,vjust=0,face="plain"),
axis.text.x = element_text(colour="grey20",size=15,angle=60,hjust=.5,vjust=.5,face="plain"))

Out[24]:

comments powered by Disqus