443-970-2353
[email protected]
CV Resume
How many of the drug related adverse events in the FDA database resulted in deaths, disabilities, hopspitalization, etc? To answer this question, we will download various datasets, join them, do some analysis and produce maps.
library(readr)
require(downloader)
library(dplyr)
library(ggplot2)
library(jsonlite)
library(lubridate)
library(grid) # for 'unit'
library(scales) # for 'percent'
library(rgdal) # working with shapefile
library(tidyr) # reshaping data
library(ggthemes) # theme_map
library(gridExtra)
year_start=2012
year_last=as.integer(format(Sys.Date(), "%Y"))
for (i in year_start:year_last){
if (i==2012){
url<-"http://www.nber.org/fda/faers/2012/outc2012q4.csv.zip"
download(url,dest="data.zip")
unzip ("data.zip")
}
else{
if (i<year_last){
j=c(1:4)
for (m in j){
url<-paste0("http://www.nber.org/fda/faers/",i,"/outc",i,"q",m,".csv.zip")
download(url,dest="data.zip")
unzip ("data.zip")
}
}
else if (i==year_last){
j=c(1:2)
for (m in j){
url<-paste0("http://www.nber.org/fda/faers/",i,"/outc",i,"q",m,".csv.zip")
download(url,dest="data.zip")
unzip ("data.zip")
}
}
}
}
Let's see the variables in the quarterly outcome data. Let's use the outcome report for the fourth quarter of 2012, for example.
names(outcome)
From the description of the data on the website, the differenet variables are explained below:
primaryid:
caseid:
outc_cod:
To know where a specific event occured, we will use the demography data on the same website which has country code. We will use "primaryid" to merge the two datasets. First, let's create a single dataset from the "outcome" data, then similarly, we will download the quarterly demography data and concatenate them to create a single demography dataset. Finally, we will concatenate the demography and coutcome data based on "primaryid".
year_start=2012
year_last=as.integer(format(Sys.Date(), "%Y"))
for (i in year_start:year_last){
if (i==2012){
url<-"http://www.nber.org/fda/faers/2012/demo2012q4.csv.zip"
download(url,dest="data.zip")
unzip ("data.zip")
}
else{
if (i<year_last){
j=c(1:4)
for (m in j){
url<-paste0("http://www.nber.org/fda/faers/",i,"/demo",i,"q",m,".csv.zip")
download(url,dest="data.zip")
unzip ("data.zip")
}
}
else if (i==year_last){
j=c(1:2)
for (m in j){
url<-paste0("http://www.nber.org/fda/faers/",i,"/demo",i,"q",m,".csv.zip")
download(url,dest="data.zip")
unzip ("data.zip")
}
}
}
}
demo2015q1<-read.csv("demo2015q1.csv")
names(demo2015q1)
Some of the variables are described below:
primaryid:
event_dt:
age:
age_cod:
sex:
wt:
wt_cod:
occr_country: The country where the event occurred.
After downloading the quartely data sets, let's concatenate them. The unzipped data set for the first quarter of 2015 is demo2015q1.csv. The demographic data contains various variables.
Let's select "primaryid", "occr_country" and "sex". "occr_country" is the country where the adverse event took place.
We can use the select function from the dplyr package.
demo = select(read.csv("demo2012q4.csv",stringsAsFactors=F),primaryid,occr_country,sex)
for (i in year_start+1:year_last){
if (i<year_last){
j=c(1:4)
for (m in j){
demo=rbind(demo,select(read.csv(paste0("demo",i,"q",m,".csv"),stringsAsFactors=F),primaryid,occr_country,sex))
}
}
else if (i==year_last){
j = c(1:2)
for (m in j){
demo=rbind(demo,select(read.csv(paste0("demo",i,"q",m,".csv"),stringsAsFactors=F),primaryid,occr_country,sex))
}
}
}
The "occr_country" variable in the FAERS data is two letter country code. We can map the code to the country name using the appropriate codes from this website.
Let's change the variable name "occr_country" to "Code" to merge it with the country names and codes data.
demo <- readRDS("demo2012q4_2015q2.rds")
Let's merge the demography data with the outcome data based on the primaryid.
names(outcome)
names(demo)
dim(demo)
dim(outcome)
outcome_demo<-merge(outcome, demo, by="primaryid",all.x=T)
names(outcome_demo)
Let's rename "occr_country" to Code to use it for merging with the code and full country name data.
outcome_demo=rename(outcome_demo,Code=occr_country)
c_codes<-read.csv("http://data.okfn.org/data/core/country-list/r/data.csv")
Now, let's merge outcome_demo with the c_codes, which contains country codes and full names.
outcome_demo=merge(outcome_demo, c_codes,by ="Code",all.x = TRUE) #left outer join
outcome_demo$id=outcome_demo$Name # we will use this variable later. Name is the full country name.
names(outcome_demo)
In the variable names above, "Code" is country code, "outc_code" is the outcome code for the adverse event , and "Name" is the name of the country.
Download world geograhy data in shape file.
try(invisible(GET("http://www.pewglobal.org/wp-content/lib/js/world-geo.json",
write_disk("world-geo.json"))), silent=TRUE)
world <- readOGR("world-geo.json", "OGRGeoJSON")
world_wt <- spTransform(world, CRS("+proj=robin"))
worldmap <- fortify(world_wt)
worldmap %>%
left_join(data_frame(id=rownames(world@data), name=world@data$name)) %>%
select(-id) %>%
rename(id=name) -> worldmap
Let's group outcomes by country and merge it with the geography data.
outcome_by_country<-outcome_demo%>%group_by(id,outc_code)%>%summarise(count = n())
names(outcome_by_country)
world_map_outcome = merge(worldmap, outcome_by_country,by ="id",all.x = TRUE)
names(world_map_outcome)
# Reorder the data
world_map_outcome = world_map_outcome[order(world_map_outcome$group, world_map_outcome$order),]
Now, using ggplot2, let's produce maps for the various outcomes.
unique(world_map_outcome$outc_code)
The codes are again as follows:
DE Death
LT Life-Threatening
HO Hospitalization - Initial or Prolonged
DS Disability
CA Congenital Anomaly
RI Required Intervention to Prevent
Death=world_map_outcome[world_map_outcome$outc_code=="DE",]
Death = merge(worldmap, Death,by ="id",all.x = TRUE)
Death =Death[order(Death$group.x, Death$order.x),]
Disability=world_map_outcome[world_map_outcome$outc_code=="DS",]
Disability = merge(worldmap, Disability,by ="id",all.x = TRUE)
Disability =Disability[order(Disability$group.x, Disability$order.x),]
Hospitalization=world_map_outcome[world_map_outcome$outc_code=="HO",]
Hospitalization = merge(worldmap, Hospitalization,by ="id",all.x = TRUE)
Hospitalization = Hospitalization[order(Hospitalization$group.x, Hospitalization$order.x),]
Congenital_Anomaly=world_map_outcome[world_map_outcome$outc_code=="CA",]
Congenital_Anomaly = merge(worldmap, Congenital_Anomaly,by ="id",all.x = TRUE)
Congenital_Anomaly = Congenital_Anomaly[order(Congenital_Anomaly$group.x, Congenital_Anomaly$order.x),]
names(Death)
Now, let's write a wrapper ggplot2 function that helps to map the outcomes.
breaks = m_breaks, labels = m_breaks,
options(jupyter.plot_mimetypes = 'image/png')
options(repr.plot.width = 12)
options(repr.plot.height = 8)
my_map<-function(data){
ggplot(data, aes(x=long.x, y=lat.x, group=group.x)) +
geom_polygon(aes(fill=count), color="gray70") +
scale_fill_gradient(name = "count", trans = "log",
low = "#FFAD33", high = "#3D5C00",guide = "colorbar",na.value="white")+
theme(axis.text.y = element_blank(),
line = element_blank(),
axis.text.x = element_blank(),
axis.title.y = element_blank(),
axis.title.x = element_blank(),
panel.background = element_blank(),
panel.grid.major = element_blank(),
panel.grid.minor = element_blank(),
panel.border = element_rect(colour = "gray70", fill=NA, size=1))+
theme(aspect.ratio=0.4,legend.key.size = unit(1.5, "cm"),legend.title = element_text(size = 15, colour = "blue"),
legend.title.align=0.3,legend.text = element_text(size = 12))
}
my_map(Death)
my_map(Disability)
my_map(Hospitalization)