443-970-2353 [email protected] CV Resume

In [504]:

```
anscombe
```

Create four groups: setA, setB, setC and setD.

In [498]:

```
library(ggplot2)
library(dplyr)
library(reshape2)
```

In [500]:

```
setA=select(anscombe, x=x1,y=y1)
setB=select(anscombe, x=x2,y=y2)
setC=select(anscombe, x=x3,y=y3)
setD=select(anscombe, x=x4,y=y4)
```

Add a third column which can help us to identify the four groups.

In [516]:

```
setA$group ='SetA'
setB$group ='SetB'
setC$group ='SetC'
setD$group ='SetD'
head(setA,4) # showing sample data points from setA
```

Now, let's merge the four datasets.

In [515]:

```
all_data=rbind(setA,setB,setC,setD) # merging all the four data sets
all_data[c(1,13,23,43),] # showing sample
```

In [518]:

```
summary_stats =all_data%>%group_by(group)%>%summarize("mean x"=mean(x),
"Sample variance x"=var(x),
"mean y"=round(mean(y),2),
"Sample variance y"=round(var(y),1),
'Correlation between x and y '=round(cor(x,y),2)
)
models = all_data %>%
group_by(group) %>%
do(mod = lm(y ~ x, data = .)) %>%
do(data.frame(var = names(coef(.$mod)),
coef = round(coef(.$mod),2),
group = .$group)) %>%
dcast(., group~var, value.var = "coef")
summary_stats_and_linear_fit = cbind(summary_stats, data_frame("Linear regression" =
paste0("y = ",models$"(Intercept)"," + ",models$x,"x")))
summary_stats_and_linear_fit
```

**If we look only at the simple summary statistics shown above, we would conclude that these four data sets are identical**.

In [503]:

```
ggplot(all_data, aes(x=x,y=y)) +geom_point(shape = 21, colour = "red", fill = "orange", size = 3)+
ggtitle("Anscombe's data sets")+geom_smooth(method = "lm",se = FALSE,color='blue') +
facet_wrap(~group, scales="free")
```