443-970-2353 [email protected] CV Resume

First, let’s clear the workspace and load a library that helps to add hour and minute

In [2]:

```
rm(list=ls()) # clear workspace
library(dplyr)
```

Then, let’s load the data and understand its contents

In [4]:

```
data<-read.csv("activity.csv", sep=',', header=TRUE) # loading data
# Let us have a look at the data dimensions, variables
names(data)
```

Out[4]:

In [5]:

```
dim(data)
```

Out[5]:

In [7]:

```
str(data)
```

In [8]:

```
head(data)
```

Out[8]:

In [9]:

```
tail(data)
```

Out[9]:

In [10]:

```
# Let's add hour and minute
data <- mutate(data, hour = interval %/% 100, minute = interval %% 100)
```

For this part of the assignment, missing values will be ignored!

**Calculate the total number of steps taken per day**

In [15]:

```
daily<-c() # This will be the total number of steps taken per day
for (i in 1:61){ # total number of days in October and November is 31+30=61
start<-(i-1)*288+1 # 288 five-minute steps in a day; 24*60/5=288
last<-(i-1)*288+288
temp<-data[start:last,1] # extracting all 5-minute steps for each day
daily<-c(daily,sum(temp)) # concatenating the daily totals
}
```

**Make a histogram of the total number of steps taken each day**

In [18]:

```
daily_noNA<-daily[!is.na(daily)] # 8 NA's are removed
hist(daily_noNA, xlab="steps",ylab="Frequency",col="skyblue",border="red",
main="Histogram of the total number of steps taken each day")
```

**Calculate and report the mean and median of the total number of steps taken per day**

The mean of total number of steps taken per day is:

In [19]:

```
mean(daily,na.rm=T)
```

Out[19]:

The median of total number of steps taken per day is:

In [20]:

```
median(daily,na.rm=T)
```

Out[20]:

In [23]:

```
x<-data[,1] # number of steps in 5-minute intevals
y<-matrix(x,288,61) # so as to get average of 5-minute intevals across all days
five_average<-apply(y,1,mean,na.rm=TRUE) # 5-minute interval average number of steps taken,
# averaged across all days
plot(data$interval[1:288],five_average, type='l',col='darkred',
xlab='Intervals',lwd=3,
ylab='Average number of steps',
main ='Average number of steps taken in 5-minute interval, averaged across all days')
```

In [24]:

```
hr<-data$hour[1:288]
min<-data$minute[1:288]
hr_max<-hr[which(five_average==max(five_average))]
min_max<-min[which(five_average==max(five_average))]
cat('The maximum number of steps occurs at',hr_max,':',min_max,'AM')
```

The total number of missing values is:

In [25]:

```
sum(is.na(data[,1]))
```

Out[25]:

I will fill in missing values using the mean of the 5-minute interval

In [26]:

```
# five_average is the 5-minute average across all days as shown in plotting the histogram above
# Then we can fill in all mising values with the average for that 5-minute interval across all days
# Let us replicate the 5-minute interval average over the number of days
five_average_rep<- rep(five_average,61)
data1<-data # creating a copy of the datset so as to not mess up the original data
for (i in 1:length(data1[,1])){ # there are 61 days
if(is.na(data1[i,1])==TRUE){
data1[i,1]= five_average_rep[i] # missing values replaced
}}
```

**Create a new dataset that is equal to the original dataset but with the missing data filled in.**

In [27]:

```
# Calculate the total number of steps taken per day using the data with filled NA's
daily1<-c()
for (i in 1:61){ # the total number of days in October and November is 31+30=61
start<-(i-1)*288+1 # there are 288 five-minute steps in a day; 24*60/5=288
last<-(i-1)*288+288
temp<-data1[start:last,1] # extracting all 5-minute steps for each day
daily1<-c(daily1,sum(temp)) # concatenating the daily totals
}
```

In [33]:

```
par(mfrow=c(2,1))
hist(daily1, xlab="steps",ylab="Frequency",
main="Data with NA's filled in",border='green',col="skyblue")
hist(daily_noNA, xlab="steps",ylab="Frequency",
main="NA's not filled in",border='purple',col="gray70",)
```

In [34]:

```
# The mean of total number of steps taken per day is:
mean(daily1)
```

Out[34]:

In [35]:

```
# The median of total number of steps taken per day is:
median(daily1)
```

Out[35]:

In [37]:

```
data1$date<-as.Date(data1$date)
data1$day<-weekdays(data1$date)
data1_weekdays<-data1[(!data1$day %in% c("Saturday","Sunday")),] # weekdays
data1_weekend<-data1[(data1$day %in% c("Saturday","Sunday")),] # weekend
weekday_steps<-data1_weekdays[,1]
temp<-matrix(weekday_steps,nrow=288)
weekday_steps_average<-apply(temp,1,mean)
weekend_steps<-data1_weekend[,1]
temp<-matrix(weekend_steps,nrow=288)
weekend_steps_average<-apply(temp,1,mean)
```