Fisseha Berhane, PhD

Data Scientist

443-970-2353 fisseha@jhu.edu CV Resume Linkedin GitHub twitter twitter

Lots of searches are performed in Google search engine every second. According to Google, Google Trends gives us a glimpse into the topics the world is searching for. We can use Google Trends to see the interest of a search term spatio-temporally. Google Trends also enables us to compare search terms and see thier popularity over time.

Can Google Trends data be used to track disease out breaks and natural hazards? At least, it can be a starting point. If there is a disease outbreak in a city, people will search about it, its symptoms and how to cure it. So, if we see an anomaly in the search pattern of a disease in a city, this will be a call for further investigation.

We can get Google Trends data for the whole world and for the whole time period the Google Trends API provides data for. Or, we can select a certain region (country or city) and time span.

Google trends begin in the year 2004 and go forward. The API allows us to compare up to five search terms at a time. However, using R we can write a program that fetches the trends of many search terms and do spatial comparison where the searches are performed. Further, once we have the data in R, we can perform any kind of anaysis that we think is fit. For example, in this blog post, I used R to find top-searches associated with each nation.

In this post, I will show how we can use Shiny to analyse Google Trends Data and create a dashboard. Shiny is an R package that makes it easy to build interactive web apps straight from R. For a nice look and feel, we will use the shinydashboard package.

The gtrendsR package is used to get data from Google Trends.

In the screenshot below, I launched the app and entered 'headache', 'nausea' and 'diarrhea' as search terms. I have selected world wide and for all time period. But, I can zoom in to a specific region or city and select a time span of interest. In the time series plot, we see the relative trend of each term.

In [ ]:


In the figures below, the top searches associated with each search term we entered are shown.

In [ ]:




In the figures below, the spatial distribution of the searches are shown.

In [ ]:




Usually Shiny apps have at least two parts. A server.R, which is R script, and ui.R, which controls the look and feel of the user interface.

My server.R and ui.R codes are given below. You can download them to the same folder and change AAAAA by your gmail and BBBBB by your gmail password (they are in server.R) and try the app.

ui. R

In [ ]:
# start user interface ----

library(shiny)
library(shinydashboard)

dashboardPage(
  dashboardHeader(title="By Fish"),
  
  dashboardSidebar(
    br(),
    
    
    h6(" Search Term(s)",style="text-align:center;color:#FFA319;font-size:150%"),
    
    helpText("Give one or more terms that you want R to retrieve data from the Google Trends API.
             Use comma to separate terms", style="text-align:center"),
    
    textInput('terms',''),
    
    
    selectInput("geography", 
label = tags$h4(strong(em("Geography")),style="text-align:center;color:#FFA319;font-size:150%"),
                choices = c("Worldwide","Afghanistan","Albania","Algeria","Angola","Argentina","Armenia","Australia","Austria",  "Azerbaijan","Bahamas","Bahrain","Bangladesh","Belarus","Belgium","Botswana", "Brazil","Bulgaria","Burkina Faso","Burundi","Cambodia","Cameroon","Canada","Chad","Chile","China","Colombia","Cuba","Cyprus","Czech Republic","Denmark","Djibouti","Ecuador","Egypt","Equatorial Guinea","Eritrea","Estonia","Ethiopia","Finland","France","Gabon","Gambia","Georgia","Germany","Ghana","Greece","Hong Kong","Hungary","Iceland","India","Indonesia","Iran","Iraq","Ireland","Israel","Italy","Jamaica","Japan","Jordan","Kazakhstan","Kenya","Kiribati","Korea (North)","Korea (South)","Kuwait","Kyrgyzstan","Lebanon","Liberia","Libya","Macedonia","Madagascar","Malawi","Malaysia","Mali","Malta","Mexico","Morocco","Mozambique","Namibia","Nepal","Netherlands","New Zealand","Niger","Nigeria","Norway","Oman","Pakistan","Paraguay","Peru","Philippines","Poland","Portugal","Qatar","Romania","Russian Federation","Rwanda","Saudi Arabia","Senegal","Serbia","Sierra Leone","Singapore","Somalia","South Africa","Spain","Sudan","Swaziland","Sweden","Switzerland","Syria","Taiwan","Tajikistan","Tanzania","Thailand","Togo","Tunisia","Turkey","Turkmenistan","Uganda","Ukraine","United Arab Emirates","United Kingdom","United States","Uzbekistan","Venezuela","Viet Nam","Yemen","Zaire","Zambia","Zimbabwe"),
                selected = "Worldwide"),           
            selectInput("period", 
                label = tags$h4(strong(em("Time Period")),style="text-align:center;color:#FFA319;font-size:150%"),
                choices = c("2004-present",
                            "Past30Days",
                            "Past90Days",
                            "Past12Months",
                            "2011",
                            "2012",
                            "2013",
                            "2014",
                            "2015"
                ),
                selected = "2004-present"),
    
    checkboxInput("corr", 
                  label = strong("Correlation",style="text-align:center;color:#FFA319;font-size:150%")),
    br(),
    
    tags$h1(submitButton("Update!"),style="text-align:center"),
    helpText("To get results, click the 'Update!' button",style="text-align:center"),
    
    br(),
    br(),
    br(),
    br(),
    br(),
    br()
    
    
    
    ),
  
  
  #####
  ##  Main Panel
  #### help ====        
  dashboardBody(    
    fluidRow(
      br(),
      h5(em(strong("Google Trends Analytics", style="color:darkblue;font-size:210%")),align = "center"),

      plotOutput("myplot"),
      br(),
      plotOutput("myplot3"),
      plotOutput("myplot2")
      
      
    )
  ))


server.R

In [ ]:
# Load libraries ====
if(!require(shiny)){
    install.packages('shiny')
      }
if(!require(gtrendsR)){
    install.packages('gtrendsR')
      }
if(!require(reshape2)){
    install.packages('reshape2')
      }
if(!require(ggplot2)){
    install.packages('ggplot2')
      }

library(shiny)
library(gtrendsR)
library(reshape2)
library(ggplot2)

data(countries)

# Start shiny application

shinyServer(function(input, output) {
  
  
  gconnect('AAAAA','BBBBB')
  
  
  out <- reactive({
    if(length(input$terms)>0){
      
      unlist(strsplit(input$terms,","))
    }
  })
  
  start_date<-reactive({
    
    if(input$period=="2004-present"){as.Date("2004-01-01")}
    
    else if (input$period=="Past90Days"){as.Date(Sys.time())-90}
    
    else if (input$period=="Past12Months"){
      m=as.POSIXlt(as.Date(Sys.time()))
      m$year=m$year-1
      m}
    
    else if (input$period=="2011"){as.Date("2011-01-01")}
    else if (input$period=="2012"){as.Date("2012-01-01")}
    else if (input$period=="2013"){as.Date("2013-01-01")}
    else if (input$period=="2014"){as.Date("2014-01-01")}
    else if (input$period=="2015"){as.Date("2015-01-01")}
    
    
    
  })
  
  
  end_date<-reactive({
    
    if(input$period %in% c("2004-present",
                           "Past90Days","Past12Months"))
                 {
      as.Date(Sys.time())}
      
    else if (input$period=="2011"){as.Date("2011-12-31")}
    else if (input$period=="2012"){as.Date("2012-12-31")}
    else if (input$period=="2013"){as.Date("2013-12-31")}
    else if (input$period=="2014"){as.Date("2014-12-31")}
    else if (input$period=="2015"){as.Date(Sys.time())} 
      
    })
  
geo<-reactive({
  if(input$geography=="Worldwide"){""}
  
  else{
  
    countries$CODE[countries$COUNTRY==input$geography]
  }
  
})
  
  data<-reactive({
    if(length(out()>0))
    {
      
      out2<-gtrends(query=out(),start_date=start_date(),end_date=end_date(),geo=geo())
      
    }
    
  })
  
  
  
  
  output$myplot <- renderPlot({
    if(length(out()>0)){
      z=data()
      trend=z$trend
      
      if("end"%in%names(trend)==T)
                           {
      trend=select(trend,-end)}
      
      trend <- melt(trend, id='start')
      
      ggplot(trend, aes(start,value, color=variable)) + geom_line()+ggtitle("Interest over time")+
        ylab("Relative Trend")+
        theme(plot.title = element_text(size = 18,colour="black"))+
        xlab('')+theme(axis.title.y = element_text(colour="#00007A",size=14,angle=90,hjust=.5,vjust=1),
                       axis.text.y = element_text(colour="darkred",size=14,angle=0,hjust=1,vjust=0),
                       axis.text.x = element_text(colour="darkred",size=14,angle=0,hjust=1,vjust=0))+
        theme(legend.title = element_text(colour="black", size=15, 
                                          face="bold"))+
        theme(legend.text = element_text(colour="blue", size=14, 
                                         face="bold"))
      
    }
    
  })
  
  
 corr<-reactive({
   
   if(input$corr==T & length(out()>1)){
     
     z=data()
     trend=z$trend
     trend=trend[,3:ncol(trend)]
     cor(trend)
     
   }
 }) 
  
  
 output$myplot3 <- renderPlot({
   if(length(corr()>0)){
     data=corr()
     
     qplot(x=Var1, y=Var2, data=melt(cor(data)), fill=value, geom="tile")+
       ggtitle('Correlation Matrix')+theme(axis.title.y =element_blank(),axis.title.x =element_blank(),
                                           axis.text.y = element_text(colour="darkred",size=14,angle=0,hjust=1,vjust=0),
                                           axis.text.x = element_text(colour="darkred",size=14,angle=0,hjust=1,vjust=0))+
       theme(legend.title=element_blank())+
       theme(legend.text = element_text(colour="black", size=14))+scale_fill_gradient2(limits=c(-1, 1),low="skyblue", high="blue")+
       theme(plot.title = element_text(size = 20,colour="black"))
   }
 })
     
  
  output$myplot2 <- renderPlot({
    if(length(out()>0)){
      data=data()
      
      
      z=data$searches
      rr=data$regions
      
      for (i in 1:length(z)){
        n=z[i]
        n=as.data.frame(n)
        names(n)=c("searches","hits")
        n$searches <- factor(n$searches, levels = n$searches[order(n$hits,decreasing =T)])
        
        colors=c("orange","skyblue","#999966")
        
        col=sample(c(1,2,3),1,replace=T)
        
        x11()
        
        print(ggplot(n, aes(searches,hits))+  
                geom_bar(stat='identity',fill=colors[col],color='black')+
                ggtitle(data$headers[2+2*length(z)+i])+ylab('Hits')+
                theme(plot.title = element_text(size = 18,colour="blue"))+
                theme(axis.title.x=element_blank(),axis.title.y = element_text(colour="blue",size=14),axis.text.x = element_text(colour="grey20",size=14,angle=60,hjust=.5,vjust=.5,face="plain"))
              
              
        )
        
        
        if(geo()=='')
        {
        x11()
        
        
        regions = as.data.frame(rr)[c(1,i+1)]
        
        names(regions)=c('region','hits')
        
        regions$region[regions$region=="United States"] = "USA"
        
        world_map = map_data("world")
        
        world_map =merge(world_map, regions, by="region",all.x = TRUE)
        
        world_map = world_map[order(world_map$group, world_map$order),]
        
        g=ggplot(world_map, aes(x=long, y=lat, group=group))+
          geom_polygon(aes(fill=hits), color="gray70") 
        
        print(g+theme(axis.text.y   = element_blank(),
                      axis.text.x   = element_blank(),
                      axis.title.y  = element_blank(),
                      axis.title.x  = element_blank(),
                      panel.background = element_blank(),
                      panel.grid.major = element_blank(), 
                      panel.grid.minor = element_blank())+
                scale_fill_gradient(low = "skyblue", high = "blue", guide = "colorbar",na.value="white")+ggtitle(data$headers[2+2*length(z)+i])+ylab('Hits')+
                theme(legend.key.size = unit(1, "cm"),
                      legend.title = element_text(size = 12, colour = "blue"),
                      legend.title.align=0.3,legend.text = element_text(size = 10))+
                theme(panel.border = element_rect(colour = "gray70", fill=NA, size=0.5))
        )
        }
      }
      
    }
    
  }) 
  
  }) 




comments powered by Disqus