Fisseha Berhane, PhD

Data Scientist

443-970-2353 [email protected] CV Resume Linkedin GitHub twitter twitter

Visualizing murder rates in the US

Let’s visualize murder rate in the US using ggplot2 (a package I love). The data used here is provided by the U.S. Census Bureau and the FBI. Details about the data can be found here

In [12]:
library(ggplot2)
library(maps)
library(ggmap)
require(downloader)

statesMap = map_data("state")  # use world in lieu of state for world map
str(statesMap)
'data.frame':	15537 obs. of  6 variables:
 $ long     : num  -87.5 -87.5 -87.5 -87.5 -87.6 ...
 $ lat      : num  30.4 30.4 30.4 30.3 30.3 ...
 $ group    : num  1 1 1 1 1 1 1 1 1 1 ...
 $ order    : int  1 2 3 4 5 6 7 8 9 10 ...
 $ region   : chr  "alabama" "alabama" "alabama" "alabama" ...
 $ subregion: chr  NA NA NA NA ...

We can now plot the states using ggplot2

In [15]:
ggplot(statesMap, aes(x = long, y = lat, group = group)) + 
geom_polygon(fill = statesMap$group, color = "black") +
theme(axis.title.y = element_text(colour="grey20",
                    size=15,angle=90,hjust=.5,vjust=1,face="plain"),
      axis.title.x = element_text(colour="grey20",size=15,angle=0,hjust=.5,vjust=1,face="plain"),
      axis.text.y = element_text(colour="grey20",size=15,angle=0,hjust=1,vjust=0,face="plain"),
      axis.text.x = element_text(colour="grey20",size=15,angle=60,hjust=.5,vjust=.5,face="plain"))
Out[15]:

Now, let’s download the murder data.

In [16]:
url<-"http://courses.edx.org/asset-v1:[email protected]+block/murders.csv"

download(url,dest="murders.csv")

murders = read.csv("murders.csv")

str(murders)
'data.frame':	51 obs. of  6 variables:
 $ State            : Factor w/ 51 levels "Alabama","Alaska",..: 1 2 3 4 5 6 7 8 9 10 ...
 $ Population       : int  4779736 710231 6392017 2915918 37253956 5029196 3574097 897934 601723 19687653 ...
 $ PopulationDensity: num  94.65 1.26 57.05 56.43 244.2 ...
 $ Murders          : int  199 31 352 130 1811 117 131 48 131 987 ...
 $ GunMurders       : int  135 19 232 93 1257 65 97 38 99 669 ...
 $ GunOwnership     : num  0.517 0.578 0.311 0.553 0.213 0.347 0.167 0.255 0.036 0.245 ...

Now, let’s create a new variable called region with the lowercase names to match the statesMap.

In [17]:
murders$region = tolower(murders$State)

We have to join the statesMap data and the murders data into one dataframe to use ggplot2.

In [18]:
murderMap = merge(statesMap, murders, by="region")
str(murderMap)
'data.frame':	15537 obs. of  12 variables:
 $ region           : chr  "alabama" "alabama" "alabama" "alabama" ...
 $ long             : num  -87.5 -87.5 -87.5 -87.5 -87.6 ...
 $ lat              : num  30.4 30.4 30.4 30.3 30.3 ...
 $ group            : num  1 1 1 1 1 1 1 1 1 1 ...
 $ order            : int  1 2 3 4 5 6 7 8 9 10 ...
 $ subregion        : chr  NA NA NA NA ...
 $ State            : Factor w/ 51 levels "Alabama","Alaska",..: 1 1 1 1 1 1 1 1 1 1 ...
 $ Population       : int  4779736 4779736 4779736 4779736 4779736 4779736 4779736 4779736 4779736 4779736 ...
 $ PopulationDensity: num  94.7 94.7 94.7 94.7 94.7 ...
 $ Murders          : int  199 199 199 199 199 199 199 199 199 199 ...
 $ GunMurders       : int  135 135 135 135 135 135 135 135 135 135 ...
 $ GunOwnership     : num  0.517 0.517 0.517 0.517 0.517 0.517 0.517 0.517 0.517 0.517 ...

Now, let’s plot the number of murder on our map of the United States.

In [20]:
ggplot(murderMap, aes(x = long, y = lat, group = group, fill = Murders)) + 
geom_polygon(color = "black") + scale_fill_gradient(low = "skyblue", high = "blue", guide = "legend")+
theme(axis.title.y = element_text(colour="grey20",
      size=15,angle=90,hjust=.5,vjust=1,face="plain"),
      axis.title.x = element_text(colour="grey20",size=15,angle=0,hjust=.5,vjust=1,face="plain"),
      axis.text.y = element_text(colour="grey20",size=15,angle=0,hjust=1,vjust=0,face="plain"),
      axis.text.x = element_text(colour="grey20",size=15,angle=60,hjust=.5,vjust=.5,face="plain"))
Out[20]:

Now, let’s see a map of the population.

In [21]:
ggplot(murderMap, aes(x = long, y = lat, group = group, fill = Population)) + 
geom_polygon(color = "black") + scale_fill_gradient(low = "gray", high = "black", guide = "legend")+
theme(axis.title.y = element_text(colour="grey20",size=15,angle=90,hjust=.5,vjust=1,face="plain"),
      axis.title.x = element_text(colour="grey20",size=15,angle=0,hjust=.5,vjust=1,face="plain"),
      axis.text.y = element_text(colour="grey20",size=15,angle=0,hjust=1,vjust=0,face="plain"),
      axis.text.x = element_text(colour="grey20",size=15,angle=60,hjust=.5,vjust=.5,face="plain"))
Out[21]:

Now, let’s create a new variable which is the number of murders per 100,000 population.

In [22]:
murderMap$MurderRate = murderMap$Murders / murderMap$Population * 100000

Now, let’s generate a plot of the murder rate.

In [23]:
ggplot(murderMap, aes(x = long, y = lat, group = group, fill = MurderRate)) + 
geom_polygon(color = "black") + scale_fill_gradient(low = "skyblue", high = "blue", guide = "legend")+
theme(axis.title.y = element_text(colour="grey20",size=15,angle=90,hjust=.5,vjust=1,face="plain"),
      axis.title.x = element_text(colour="grey20",size=15,angle=0,hjust=.5,vjust=1,face="plain"),
      axis.text.y = element_text(colour="grey20",size=15,angle=0,hjust=1,vjust=0,face="plain"),
      axis.text.x = element_text(colour="grey20",size=15,angle=60,hjust=.5,vjust=.5,face="plain"))
Out[23]:

Let’s remove states with murder rates above 10.

In [24]:
ggplot(murderMap, aes(x = long, y = lat, group = group, fill = MurderRate)) + 
geom_polygon(color = "black") + scale_fill_gradient(low = "skyblue", high = "blue", guide = "legend", limits = c(0,10))+
theme(axis.title.y = element_text(colour="grey20",size=15,angle=90,hjust=.5,vjust=1,face="plain"),
      axis.title.x = element_text(colour="grey20",size=15,angle=0,hjust=.5,vjust=1,face="plain"),
      axis.text.y = element_text(colour="grey20",size=15,angle=0,hjust=1,vjust=0,face="plain"),
      axis.text.x = element_text(colour="grey20",size=15,angle=60,hjust=.5,vjust=.5,face="plain"))
Out[24]:

comments powered by Disqus