Fisseha Berhane, PhD

Data Scientist

443-970-2353 fisseha@jhu.edu CV Resume Linkedin GitHub twitter twitter

Grammar of Graphics in Python and R

The grammar of graphics package (ggplot2) is the best data visualization library in R. The concept of grammar of graphics is also implemented in Python with the library ggplot and it has similar commands to ggplot2.

Let's see some examples.

The data used is from here.

Grammar of Graphics in Python

ggplot can be installed by simply using this command:

pip install ggplot

Learning ggplot is really easy specially for people who know how to use ggplot2 in R.

In [5]:
from ggplot import *
In [8]:
import pandas as pd
%matplotlib inline
import matplotlib.pyplot as plt
clim = pd.read_csv(r'https://courses.edx.org/asset-v1:MITx+15.071x_2a+2T2015+type@asset+block/climate_change.csv')

A quick look at the data

In [3]:
clim.head(3)  
Out[3]:
Year Month MEI CO2 CH4 N2O CFC-11 CFC-12 TSI Aerosols Temp
0 1983 5 2.556 345.96 1638.59 303.677 191.324 350.113 1366.1024 0.0863 0.109
1 1983 6 2.167 345.52 1633.71 303.746 192.057 351.848 1366.1208 0.0794 0.118
2 1983 7 1.741 344.15 1633.22 303.795 192.818 353.725 1366.2850 0.0731 0.137
In [6]:
ggplot(clim, aes('Year', 'Temp'))+geom_line(color='green')+geom_point()+ggtitle('Temperature Change')+xlab('')+ylab('Temperature')+stat_smooth(colour='blue', span=0.2)
Out[6]:
<ggplot: (33274821)>
In [5]:
ggplot(clim, aes('Year', 'CO2'))+geom_line(color='black')+geom_point(color='red') +ggtitle("Carbondioxide Concentration")+xlab('')+ylab('ppm')+\
geom_vline(xintercept=[1990, 2000],linetype='dashed',color='green')
Out[5]:
<ggplot: (35118803)>
In [6]:
ggplot(clim, aes('CO2'))+geom_density(fill='orange')+ggtitle("Carbondioxide Concentration")+xlab('')
Out[6]:
<ggplot: (33196734)>

Let's create a new variable whether the temperature anomaly is negative or positive.

In [7]:
clim['below_zero']=clim.Temp < 0
In [8]:
 ggplot(clim, aes('CO2',fill='below_zero'))+geom_density(alpha=0.5)+ggtitle("Carbondioxide Concentration")
Out[8]:
<ggplot: (35690602)>

We can also use the 'meat' dataset that comes with ggplot.

In [9]:
ggplot(pd.melt(meat, id_vars=['date']), aes(x='date', y='value', color='variable')) +\
    geom_line()
Out[9]:
<ggplot: (35162965)>
In [ ]:
meat_lng = pd.melt(meat[['date', 'beef', 'broilers', 'pork']], id_vars=['date'])
p = ggplot(aes(x='value', colour='variable', fill=True, alpha=0.3), data=meat_lng)
p + geom_density()

Grammar of Graphics in R

Let's re-generate the figures above with minor changes to the codes in ggplot in python.

In [1]:
library(ggplot2)
In [3]:
setwd("C:/Fish/Python/Python_vs_R")
options(jupyter.plot_mimetypes = 'image/png')
In [4]:
clim<-read.csv("climate_change.csv")
In [5]:
names(clim)
Out[5]:
  1. "Year"
  2. "Month"
  3. "MEI"
  4. "CO2"
  5. "CH4"
  6. "N2O"
  7. "CFC.11"
  8. "CFC.12"
  9. "TSI"
  10. "Aerosols"
  11. "Temp"
In [22]:
options(repr.plot.width = 8)
options(repr.plot.height = 6)

ggplot(clim, aes(Year, Temp))+geom_line(color='green')+geom_point()+ggtitle('Temperature Change')+xlab('')+ylab('Temperature')+stat_smooth(colour='blue', span=0.2)
Out[22]:

In [23]:
options(repr.plot.width = 8)
options(repr.plot.height = 6)

ggplot(clim, aes(Year, CO2))+geom_line(color='black')+geom_point(color='red') +ggtitle("Carbondioxide Concentration")+xlab('')+ylab('ppm')+
geom_vline(xintercept = c(1990,2000),colour="green", linetype = "longdash")
Out[23]:

In [14]:
clim$below_zero=clim$Temp < 0
In [27]:
options(repr.plot.width = 6)
options(repr.plot.height = 4)

ggplot(clim, aes(CO2,fill=below_zero))+geom_density(alpha=0.5)+ggtitle("Carbondioxide Concentration")
Out[27]:

In [24]:
data(mpg)
In [28]:
g<-ggplot(mpg, aes(displ, hwy, color=factor(year)))

g+geom_point()
Out[28]:

In [29]:
g+geom_point()+facet_grid(drv~cyl, margins=TRUE)
Out[29]:

In [31]:
g+geom_point()+facet_grid(drv~cyl, margins=TRUE)+geom_smooth(method="lm", se=FALSE,size=2, color="black")+labs(x="Displacement",y="Highway Mileage") 
Out[31]:

In [32]:
data(diamonds)
In [33]:
g<-ggplot(diamonds, aes(depth, price))

g+geom_point(alpha=1/3)
Out[33]:

In [34]:
cutpoints<-quantile(diamonds$carat,seq(0,1,length=4),na.rm=TRUE)

diamonds$car2<-cut(diamonds$carat,cutpoints)

g<-ggplot(diamonds, aes(depth, price))
g+geom_point(alpha=1/3)+facet_grid(cut~car2)
Out[34]:

In [35]:
g+geom_point(alpha=1/3)+facet_grid(cut~car2)+geom_smooth(method="lm",size=3,color="pink")
Out[35]:

In [37]:
ggplot(diamonds,aes(carat,price))+geom_boxplot()+facet_grid(.~cut)
Out[37]:



comments powered by Disqus