Fisseha Berhane, PhD

Data Scientist

443-970-2353 CV Resume Linkedin GitHub twitter twitter

Correlation map of climate variables

To understand physical mechanisms and to develop statistical rainfall prediction models, correlation analysis is used as a first step. Here, I show how to use R to generate correlation map between rainfall and sea surface temperature (SST).

NetCDF (Network Common Data Form) data is a self-describing, portable, scalable and appendable machine-independent data format. More information can be found from Wikipedia. This data format is commonly used in Atmospheric science and Oceanography as it is convinient to store various variables of many dimensions.

The datasets I am using are in NetCDF and I am using ncdf package to analyse the NetCDF data.

In [1]:
options(jupyter.plot_mimetypes = 'image/png') 
# I am using anaconda and this command helps me to make my figures inline

Set the working directory

In [2]:

Install required Packages

In [6]:



I am using rainfall data from the Climatic Research Unit (CRU). The SST data is Kaplan Extended SST V2. Both datasets are at monthly resolution.

In [83]:
# Open data
# Both datasets are at monthly resolution <- open.ncdf("cru_monthly_1981_2013_global.cdf") <- open.ncdf("sst.mon.anom_1981_2013.cdf")
In [5]:
# see variables and dimension names
[1] "file cru_monthly_1981_2013_global.cdf has 4 dimensions:"
[1] "LON   Size: 720"
[1] "LAT   Size: 360"
[1] "TIME   Size: 396"
[1] "bnds   Size: 2"
[1] "------------------------"
[1] "file cru_monthly_1981_2013_global.cdf has 2 variables:"
[1] "double TIME_bnds[bnds,TIME]  Longname:TIME_bnds Missval:1e+30"
[1] "double PRE[LON,LAT,TIME]  Longname:precipitation Missval:9.96920996838687e+36"
[1] "file sst.mon.anom_1981_2013.cdf has 4 dimensions:"
[1] "LON   Size: 72"
[1] "LAT   Size: 36"
[1] "TIME1   Size: 396"
[1] "bnds   Size: 2"
[1] "------------------------"
[1] "file sst.mon.anom_1981_2013.cdf has 2 variables:"
[1] "double TIME1_bnds[bnds,TIME1]  Longname:TIME1_bnds Missval:1e+30"
[1] "float SST[LON,LAT,TIME1]  Longname:Monthly Anomalies of SST Missval:-9.96920996838687e+36"

We see that CRU rainfall data is at 0.5 degree lat/lon resolution (lon size is 720 and lat size is 360) and the CRU rainfall variable name is PRE. PRE is 3D (lon, lat, time).

The SST data is at 5 degree lat/lon resolution (lon size =72 and lat size = 36). SST is also 3D (lon, lat, time). Both datsets cover the period from 1981-2013 (396 months).

Let's get the rainfall and SST data.

In [7]:
pre <- get.var.ncdf(, "PRE") 

sst<-get.var.ncdf(, "SST")

Next, get latitude and longitude of both variables.

In [8]:$dim$LON$vals -> lon_sst$dim$LAT$vals -> lat_sst$dim$LON$vals -> lon_pre$dim$LAT$vals -> lat_pre
In [85]:
# Let's close and


We can have a look at the latitude and longitude values.

In [10]:

-89.75, -89.25, -88.75, -88.25, -87.75


-179.75, -179.25, -178.75, -178.25, -177.75

We see tha rainfall is at 0.5 degrees resolution. We can also check the resolution of the SST in the same way.

Now, let's extract rainfall over a region. I am considering the Blue Nile river basin which is the main source of the Nile river, the longest river in the world. Let's consider the region from 7.75-12.25 degree North and 33.75-39.75 degree east.

In [34]:
west= lon_pre[max(which(lon_pre<34))]   # gives 33.75
east=lon_pre[max(which(lon_pre<40))]    # gives 39.75

south=lat_pre[max(which(lat_pre<8))]    # gives 7.75
north=lat_pre[min(which(lat_pre>12))]   # gives 12.25

In [35]:
  1. 13
  2. 10
  3. 396

Now, let's area average it.

In [37]:

Now, we can calculate monthly average rainfall over the region to see when the main rainy season is.

In [46]:

Let's plot the monthly average rainfall to see the main rainy season.

In [47]:
labels<-c("Jan","Feb","Mar","Apr","May","Jun","Jul","Aug","Sep","Oct","Nov","Dec",sep = " ")

title='Monthly Average rainfall'

           xaxt = "n",  xlab =" ")

axis(1, labels = FALSE)
      text(1:12, par("usr")[1] +(min(monthly_ave)+max(monthly_ave))/2, srt = 45, adj = 1,
           labels = labels, xpd = TRUE)

We see from the plot that the main rainy season is from June-September. To demonstarate how to calculate correlation analysis in R, let's consider September and calculate concurrent correlation (rainfall in September with SST in September).

In [86]:
In [87]:
  1. 72
  2. 36
  3. 396

Now, let's extract SST in September. We can do it using dplyr but for now let's use for loop, which is clear to the majority. There are 396 months (33*12). First, get the indices for September.

In [59]:
sep_index <-c()
for (i in 1:33){

Now, we can extract sst in September using sep_index.

In [61]:
sep_sst<-sst[ , ,sep_index]
dim(sep_sst)  # we are considering the month of September over 33 years
  1. 72
  2. 36
  3. 33

Again, we can calculate correlation between sst and rainfall in September using a for loop.

In [66]:
corr_pcp_sst<-matrix(0,72,36) # created a container

for (i in 1:72){
  for (j in 1:36){
In [88]:
lat_sst # let's see the latitude values to select region for map display.

-87.5, -82.5, -77.5, -72.5, -67.5, -62.5, -57.5, -52.5, -47.5, -42.5, -37.5, -32.5, -27.5, -22.5, -17.5, -12.5, -7.5, -2.5, 2.5, 7.5, 12.5, 17.5, 22.5, 27.5, 32.5, 37.5, 42.5, 47.5, 52.5, 57.5, 62.5, 67.5, 72.5, 77.5, 82.5, 87.5

Ok, let's generate the correlation map from 42.5 S to 42.5 N.

In [89]:
image.plot(lon_sst,lat_sst[10:27],corr_pcp_sst[ ,10:27], col = mypalette,
 xlab ="Longitude", ylab="Latitude",
 main ="Correlation between  Blue Nile rainfall and SST in September")

plot(wrld_simpl,add=TRUE) # overlay world map

We see from the correlation map that the Blue Nile river basin gets less rainfall when the eastern Pacific ocean is anomalously warm (in this phase it is called El Nino) since the correlation is negative. On the other hand, when the eastern Pacific ocean is anomalously cold (called La Nina), the Blue Nile river basin enjoys more rainfall and Ethiopia, Sudan and Egypt can get enough water.

comments powered by Disqus