Fisseha Berhane, PhD

Data Scientist

443-970-2353 fisseha@jhu.edu CV Resume Linkedin GitHub twitter twitter

Composite analysis to capture non-linear relationships

Though correlation is able to capture linear relationships, since it does not handle non-linear relationships, composite analysis is widely used instead to understand physical mechanisms and develop statistical rainfall prediction models. Here, I show how to use R to generate composites of rainfall based on sea surface temperature (SST).

NetCDF (Network Common Data Form) data is a self-describing, portable, scalable and appendable machine-independent data format. More information can be found from Wikipedia. This data format is commonly used in Atmospheric science and Oceanography as it is convinient to store various variables of many dimensions.

The rainfall dataset I am using is in NetCDF and I am using ncdf package to analyse NetCDF data. I am considering the period 1981-2013 and my data are at monthly resolution.

In [100]:
options(jupyter.plot_mimetypes = 'image/png') 
# I am using anaconda and this command helps me to make my figures inline

Set the working directory

In [1]:
setwd("C:/Fish/Ranalysis")

Install required Packages

In [3]:
library(chron);
library(RColorBrewer);
library(lattice);

library(ncdf);

library(maptools);
data(wrld_simpl);
library(fields);

I am using rainfall data from the Climatic Research Unit (CRU). The SST index I am using here is NINA3.4, which is a measure of how warm the eastern Pacific Ocean is.

In [101]:
# Open data
# Both datasets are at monthly resolution

pre.nc <- open.ncdf("cru_monthly_1981_2013_global.cdf")

nina <- read.table("nana34_1981_2013.txt", sep='', header=F)
In [74]:
# see variable and dimension names 
pre.nc
Out[74]:
[1] "file cru_monthly_1981_2013_global.cdf has 4 dimensions:"
[1] "LON   Size: 720"
[1] "LAT   Size: 360"
[1] "TIME   Size: 396"
[1] "bnds   Size: 2"
[1] "------------------------"
[1] "file cru_monthly_1981_2013_global.cdf has 2 variables:"
[1] "double TIME_bnds[bnds,TIME]  Longname:TIME_bnds Missval:1e+30"
[1] "double PRE[LON,LAT,TIME]  Longname:precipitation Missval:9.96920996838687e+36"

We see that CRU rainfall data is at 0.5 degree lat/lon resolution (lon size is 720 and lat size is 360) and the CR rainfall variable name is PRE. PRE is 3D (lon, lat, time).

In [75]:
dim(nina)
Out[75]:
  1. 33
  2. 12

Let's get the rainfall data

In [76]:
pre <- get.var.ncdf(pre.nc, "PRE")

Next, get latitudes and longitudes.

In [77]:
pre.nc$dim$LON$vals -> lon
pre.nc$dim$LAT$vals -> lat
In [102]:
# Let's close pre.nc


close(pre.nc)

We can have a look at the latitude bins.

In [103]:
lat
Out[103]:

Longitude bins

In [104]:
lon
Out[104]:

To demonstrate how composite analysis is used to analyse climate data, let's investigate if the sea surface temperature over the eastern Pacific Ocean impacts rainfall in the Indian Monsoon region. The Indian Monsoon occurs from June-September. The region of our interest covers 68-90 E and 5-25 N.

In [120]:
lon_start<-max(which(lon<=68))
lon_end<-max(which(lon<=90))
lon[lon_start]
lon[lon_end]
Out[120]:
67.75
Out[120]:
89.75
In [121]:
lat_start<-max(which(lat<=5))
lat_end<-max(which(lat<=25))
lat[lat_start]
lat[lat_end]
Out[121]:
4.75
Out[121]:
24.75

Now, let's get rainfall over our region.

In [122]:
pre_India<-pre[lon_start:lon_end,lat_start:lat_end,]
dim(pre_India)
lat_india<-lat[lat_start:lat_end]
lon_india<-lon[lon_start:lon_end]
Out[122]:
  1. 45
  2. 41
  3. 396

Now, let's get rainfall for June-September

In [123]:
index <-c()
for (i in 1:33){
    index <-c(index,(6+(i-1)*12):(9+(i-1)*12))
}
pre_summer<-pre_India[ , ,index]
dim(pre_summer)  # 33 years, 4 months in each year =33*4 =132
Out[123]:
  1. 45
  2. 41
  3. 132

Each month is a separete column in the Nina index data. Let's extract summer season index values.

In [124]:
dim(nina)
nina_summer<-nina[,6:9]
dim(nina_summer)
Out[124]:
  1. 33
  2. 12
Out[124]:
  1. 33
  2. 4

Let's change nina_summer to a vector

In [125]:
nina_summer<-as.vector(t(nina_summer))

Now, let's standardaize the index

In [126]:
nina_summer<-(nina_summer-mean(nina_summer))/sd(nina_summer)
mean(nina_summer)
sd(nina_summer)   # the mean is 0 and the standard deviation is 1.
Out[126]:
-1.94138965872347e-15
Out[126]:
1

Let's plot the index

In [127]:
plot(nina_summer,type='l',ylab='nana_summer_standardaized', 
     xlab='',main="Standardized Index")
abline(h=1,col='skyblue')
abline(h=-1,col='skyblue')
>

Now, to understand the impacts of anomalously high and low SST on rainfall over India, we will take the average precipitation when the index is above one standard deviation (STD) and when the index is below negative one STD separately and subtract the mean rainfall to get the deviation in rainfall average during warmer and colder SST over the eastern Pacific Ocean.

In [128]:
above_1STD <-which(nina_summer>1)
below_negative1STD<-which(nina_summer< -1)
In [129]:
length(below_negative1STD)
Out[129]:
21
In [130]:
length(above_1STD)
Out[130]:
21

So, I have 21 months with SST over the eastern pacific greater than one STD. The months with SST less than negative one STD are also 21 for this example.

Now, let's get rainfall averages during the anomalousy warm and cold months.

In [131]:
warm_pre<-pre_summer[ , ,above_1STD]
cold_pre<-pre_summer[ , ,below_negative1STD]

dim(warm_pre)
dim(cold_pre)
Out[131]:
  1. 45
  2. 41
  3. 21
Out[131]:
  1. 45
  2. 41
  3. 21

Now, let's average the rainfall in the summer season (June-September) and in the warm and cold months.

In [132]:
pre_summer_ave<-apply(pre_summer,c(1,2),mean)
warm_pre_ave<-apply(warm_pre,c(1,2),mean)
cold_pre_ave<-apply(cold_pre,c(1,2),mean)

dim(pre_summer_ave)
dim(warm_pre_ave)
dim(cold_pre_ave)
Out[132]:
  1. 45
  2. 41
Out[132]:
  1. 45
  2. 41
Out[132]:
  1. 45
  2. 41

Ok, let's plot the anomalies.

In [ ]:
par(mfrow=c(1,2))
mypalette<-brewer.pal(9,"BrBG")


image.plot(lon_india,lat_india,warm_pre_ave-pre_summer_ave, col = mypalette,
 xlab ="Longitude", ylab="Latitude",lab.breaks=seq(-125,100,by=25),
 main ="During El Nino")

plot(wrld_simpl,add=TRUE) # overlay world map


image.plot(lon_india,lat_india,cold_pre_ave-pre_summer_ave, col = mypalette,
 xlab ="Longitude", ylab="",lab.breaks=seq(-125,100,by=25),
 main ="During La Nina",legend.width=0.001, legend.shrink=.001,
 legend.mar=0.001,legend.cex=0.0,horizontal=T)

plot(wrld_simpl,add=TRUE) # overlay world map

Summary

We can see from the composite maps that Indian summer rainfall is influenced by SST over the eastern Pacific Ocean. When the eastern Pacific Ocean is anomalously warm (called El Nino), large parts of India get less rainfall. On the other hand, when the eastern Pacific Ocean is anomalously cold (called La Nina), large parts of India get enhanced rainfall. Further, we can use significance tests to support our conclusions.

comments powered by Disqus