443-970-2353
[email protected]
CV Resume
To understand physical mechanisms and to develop statistical rainfall prediction models, correlation analysis is used as a first step. Here, I show how to use R to generate correlation map between rainfall and sea surface temperature (SST).
NetCDF (Network Common Data Form) data is a self-describing, portable, scalable and appendable machine-independent data format. More information can be found from Wikipedia. This data format is commonly used in Atmospheric science and Oceanography as it is convinient to store various variables of many dimensions.
The datasets I am using are in NetCDF and I am using ncdf package to analyse the NetCDF data.
options(jupyter.plot_mimetypes = 'image/png')
# I am using anaconda and this command helps me to make my figures inline
Set the working directory
setwd("C:/Fish/Rpackage/version1//data")
Install required Packages
library(chron);
library(RColorBrewer);
library(lattice);
library(ncdf);
library(maptools);
data(wrld_simpl);
library(fields);
I am using rainfall data from the Climatic Research Unit (CRU). The SST data is Kaplan Extended SST V2. Both datasets are at monthly resolution.
# Open data
# Both datasets are at monthly resolution
pre.nc <- open.ncdf("cru_monthly_1981_2013_global.cdf")
sst.nc <- open.ncdf("sst.mon.anom_1981_2013.cdf")
# see variables and dimension names
pre.nc
sst.nc
We see that CRU rainfall data is at 0.5 degree lat/lon resolution (lon size is 720 and lat size is 360) and the CRU rainfall variable name is PRE. PRE is 3D (lon, lat, time).
The SST data is at 5 degree lat/lon resolution (lon size =72 and lat size = 36). SST is also 3D (lon, lat, time). Both datsets cover the period from 1981-2013 (396 months).
Let's get the rainfall and SST data.
pre <- get.var.ncdf(pre.nc, "PRE")
sst<-get.var.ncdf(sst.nc, "SST")
Next, get latitude and longitude of both variables.
sst.nc$dim$LON$vals -> lon_sst
sst.nc$dim$LAT$vals -> lat_sst
pre.nc$dim$LON$vals -> lon_pre
pre.nc$dim$LAT$vals -> lat_pre
# Let's close sst.nc and pre.nc
close(sst.nc)
close(pre.nc)
We can have a look at the latitude and longitude values.
lat_pre[1:5]
lon_pre[1:5]
We see tha rainfall is at 0.5 degrees resolution. We can also check the resolution of the SST in the same way.
Now, let's extract rainfall over a region. I am considering the Blue Nile river basin which is the main source of the Nile river, the longest river in the world. Let's consider the region from 7.75-12.25 degree North and 33.75-39.75 degree east.
west= lon_pre[max(which(lon_pre<34))] # gives 33.75
east=lon_pre[max(which(lon_pre<40))] # gives 39.75
south=lat_pre[max(which(lat_pre<8))] # gives 7.75
north=lat_pre[min(which(lat_pre>12))] # gives 12.25
pre_bl<-pre[west:east,south:north,]
dim(pre_bl)
Now, let's area average it.
pre_areaAve<-apply(pre_bl,3,mean,na.rm=TRUE)
Now, we can calculate monthly average rainfall over the region to see when the main rainy season is.
monthly_ave<-t(matrix(pre_areaAve,12,33))
monthly_ave<-apply(monthly_ave,2,mean)
Let's plot the monthly average rainfall to see the main rainy season.
labels<-c("Jan","Feb","Mar","Apr","May","Jun","Jul","Aug","Sep","Oct","Nov","Dec",sep = " ")
title='Monthly Average rainfall'
ylab1="mm/month"
plot(monthly_ave,type='o',col="blue",lwd=3,
ylab=ylab1,main=title,
xaxt = "n", xlab =" ")
axis(1, labels = FALSE)
text(1:12, par("usr")[1] +(min(monthly_ave)+max(monthly_ave))/2, srt = 45, adj = 1,
labels = labels, xpd = TRUE)
We see from the plot that the main rainy season is from June-September. To demonstarate how to calculate correlation analysis in R, let's consider September and calculate concurrent correlation (rainfall in September with SST in September).
sep_pre<-matrix(pre_areaAve,12,33)[9,]
dim(sst)
Now, let's extract SST in September. We can do it using dplyr but for now let's use for loop, which is clear to the majority. There are 396 months (33*12). First, get the indices for September.
sep_index <-c()
for (i in 1:33){
sep_index<-c(sep_index,9+(i-1)*12)
}
Now, we can extract sst in September using sep_index.
sep_sst<-sst[ , ,sep_index]
dim(sep_sst) # we are considering the month of September over 33 years
Again, we can calculate correlation between sst and rainfall in September using a for loop.
corr_pcp_sst<-matrix(0,72,36) # created a container
for (i in 1:72){
for (j in 1:36){
corr_pcp_sst[i,j]<-cor(sep_sst[i,j,],sep_pre)
}
}
lat_sst # let's see the latitude values to select region for map display.
Ok, let's generate the correlation map from 42.5 S to 42.5 N.
mypalette<-c("#67000d","#a50f15","#cb181d","#ef3b2c","#fb6a4a","#fc9272",
"#fcbba1","#fee0d2","#fff5f0")
image.plot(lon_sst,lat_sst[10:27],corr_pcp_sst[ ,10:27], col = mypalette,
xlab ="Longitude", ylab="Latitude",
main ="Correlation between Blue Nile rainfall and SST in September")
plot(wrld_simpl,add=TRUE) # overlay world map
We see from the correlation map that the Blue Nile river basin gets less rainfall when the eastern Pacific ocean is anomalously warm (in this phase it is called El Nino) since the correlation is negative. On the other hand, when the eastern Pacific ocean is anomalously cold (called La Nina), the Blue Nile river basin enjoys more rainfall and Ethiopia, Sudan and Egypt can get enough water.