# Fisseha Berhane, PhD

#### Data Scientist

443-970-2353 [email protected] CV Resume

### Correlation map of climate variables¶

To understand physical mechanisms and to develop statistical rainfall prediction models, correlation analysis is used as a first step. Here, I show how to use R to generate correlation map between rainfall and sea surface temperature (SST).

NetCDF (Network Common Data Form) data is a self-describing, portable, scalable and appendable machine-independent data format. More information can be found from Wikipedia. This data format is commonly used in Atmospheric science and Oceanography as it is convinient to store various variables of many dimensions.

The datasets I am using are in NetCDF and I am using ncdf package to analyse the NetCDF data.

In [1]:
options(jupyter.plot_mimetypes = 'image/png')
# I am using anaconda and this command helps me to make my figures inline


Set the working directory

In [2]:
setwd("C:/Fish/Rpackage/version1//data")


Install required Packages

In [6]:
library(chron);
library(RColorBrewer);
library(lattice);

library(ncdf);

library(maptools);
data(wrld_simpl);
library(fields);


I am using rainfall data from the Climatic Research Unit (CRU). The SST data is Kaplan Extended SST V2. Both datasets are at monthly resolution.

In [83]:
# Open data
# Both datasets are at monthly resolution

pre.nc <- open.ncdf("cru_monthly_1981_2013_global.cdf")

sst.nc <- open.ncdf("sst.mon.anom_1981_2013.cdf")

In [5]:
# see variables and dimension names
pre.nc

sst.nc

Out[5]:
[1] "file cru_monthly_1981_2013_global.cdf has 4 dimensions:"
[1] "LON   Size: 720"
[1] "LAT   Size: 360"
[1] "TIME   Size: 396"
[1] "bnds   Size: 2"
[1] "------------------------"
[1] "file cru_monthly_1981_2013_global.cdf has 2 variables:"
[1] "double TIME_bnds[bnds,TIME]  Longname:TIME_bnds Missval:1e+30"
[1] "double PRE[LON,LAT,TIME]  Longname:precipitation Missval:9.96920996838687e+36"
Out[5]:
[1] "file sst.mon.anom_1981_2013.cdf has 4 dimensions:"
[1] "LON   Size: 72"
[1] "LAT   Size: 36"
[1] "TIME1   Size: 396"
[1] "bnds   Size: 2"
[1] "------------------------"
[1] "file sst.mon.anom_1981_2013.cdf has 2 variables:"
[1] "double TIME1_bnds[bnds,TIME1]  Longname:TIME1_bnds Missval:1e+30"
[1] "float SST[LON,LAT,TIME1]  Longname:Monthly Anomalies of SST Missval:-9.96920996838687e+36"

We see that CRU rainfall data is at 0.5 degree lat/lon resolution (lon size is 720 and lat size is 360) and the CRU rainfall variable name is PRE. PRE is 3D (lon, lat, time).

The SST data is at 5 degree lat/lon resolution (lon size =72 and lat size = 36). SST is also 3D (lon, lat, time). Both datsets cover the period from 1981-2013 (396 months).

Let's get the rainfall and SST data.

In [7]:
pre <- get.var.ncdf(pre.nc, "PRE")

sst<-get.var.ncdf(sst.nc, "SST")


Next, get latitude and longitude of both variables.

In [8]:
sst.nc$dim$LON$vals -> lon_sst sst.nc$dim$LAT$vals -> lat_sst

pre.nc$dim$LON$vals -> lon_pre pre.nc$dim$LAT$vals -> lat_pre

In [85]:
# Let's close sst.nc and pre.nc

close(sst.nc)
close(pre.nc)


We can have a look at the latitude and longitude values.

In [10]:
lat_pre[1:5]
lon_pre[1:5]

Out[10]:

-89.75, -89.25, -88.75, -88.25, -87.75

Out[10]:

-179.75, -179.25, -178.75, -178.25, -177.75

We see tha rainfall is at 0.5 degrees resolution. We can also check the resolution of the SST in the same way.

Now, let's extract rainfall over a region. I am considering the Blue Nile river basin which is the main source of the Nile river, the longest river in the world. Let's consider the region from 7.75-12.25 degree North and 33.75-39.75 degree east.

In [34]:

west= lon_pre[max(which(lon_pre<34))]   # gives 33.75
east=lon_pre[max(which(lon_pre<40))]    # gives 39.75

south=lat_pre[max(which(lat_pre<8))]    # gives 7.75
north=lat_pre[min(which(lat_pre>12))]   # gives 12.25

pre_bl<-pre[west:east,south:north,]

In [35]:
dim(pre_bl)

Out[35]:
1. 13
2. 10
3. 396

Now, let's area average it.

In [37]:
pre_areaAve<-apply(pre_bl,3,mean,na.rm=TRUE)


Now, we can calculate monthly average rainfall over the region to see when the main rainy season is.

In [46]:
monthly_ave<-t(matrix(pre_areaAve,12,33))
monthly_ave<-apply(monthly_ave,2,mean)


Let's plot the monthly average rainfall to see the main rainy season.

In [47]:
labels<-c("Jan","Feb","Mar","Apr","May","Jun","Jul","Aug","Sep","Oct","Nov","Dec",sep = " ")

title='Monthly Average rainfall'
ylab1="mm/month"

plot(monthly_ave,type='o',col="blue",lwd=3,
ylab=ylab1,main=title,
xaxt = "n",  xlab =" ")

axis(1, labels = FALSE)

text(1:12, par("usr")[1] +(min(monthly_ave)+max(monthly_ave))/2, srt = 45, adj = 1,
labels = labels, xpd = TRUE)


We see from the plot that the main rainy season is from June-September. To demonstarate how to calculate correlation analysis in R, let's consider September and calculate concurrent correlation (rainfall in September with SST in September).

In [86]:
sep_pre<-matrix(pre_areaAve,12,33)[9,]

In [87]:
dim(sst)

Out[87]:
1. 72
2. 36
3. 396

Now, let's extract SST in September. We can do it using dplyr but for now let's use for loop, which is clear to the majority. There are 396 months (33*12). First, get the indices for September.

In [59]:
sep_index <-c()
for (i in 1:33){
sep_index<-c(sep_index,9+(i-1)*12)
}


Now, we can extract sst in September using sep_index.

In [61]:
sep_sst<-sst[ , ,sep_index]
dim(sep_sst)  # we are considering the month of September over 33 years

Out[61]:
1. 72
2. 36
3. 33

Again, we can calculate correlation between sst and rainfall in September using a for loop.

In [66]:
corr_pcp_sst<-matrix(0,72,36) # created a container

for (i in 1:72){
for (j in 1:36){
corr_pcp_sst[i,j]<-cor(sep_sst[i,j,],sep_pre)
}
}

In [88]:
lat_sst # let's see the latitude values to select region for map display.

Out[88]:

-87.5, -82.5, -77.5, -72.5, -67.5, -62.5, -57.5, -52.5, -47.5, -42.5, -37.5, -32.5, -27.5, -22.5, -17.5, -12.5, -7.5, -2.5, 2.5, 7.5, 12.5, 17.5, 22.5, 27.5, 32.5, 37.5, 42.5, 47.5, 52.5, 57.5, 62.5, 67.5, 72.5, 77.5, 82.5, 87.5

Ok, let's generate the correlation map from 42.5 S to 42.5 N.

In [89]:
mypalette<-c("#67000d","#a50f15","#cb181d","#ef3b2c","#fb6a4a","#fc9272",
"#fcbba1","#fee0d2","#fff5f0")
image.plot(lon_sst,lat_sst[10:27],corr_pcp_sst[ ,10:27], col = mypalette,
xlab ="Longitude", ylab="Latitude",
main ="Correlation between  Blue Nile rainfall and SST in September")