443-970-2353
[email protected]
CV Resume
In this analysis, inferential statistics is employed to investigate if the length of odontoblasts (cells responsible for tooth growth) in guinea pigs is influenced by the dose levels of vitamin C (0.5, 1, and 2 mg/day). Moreover, the impacts of two delivery methods (orange juice or ascorbic acid) on the length of odontoblasts are studied.
data(ToothGrowth)
str(ToothGrowth)
library(ggplot2)
We see that the data frame has 60 observations on 3 variables: len, supp and dose. From the documentation, "len" is length of odontoblasts,"supp" is supplement type (ascorbic acid coded as VC or orange juice coded as OJ) and "dose" is dose in milligrams/day.
message('Table 1: Sample Size')
size=list()
size$dose0.5=sum(ToothGrowth$dose==0.5)
size$dose1=sum(ToothGrowth$dose==1)
size$dose2=sum(ToothGrowth$dose==2)
size$OJ=sum(ToothGrowth$supp=="OJ")
size$VC=sum(ToothGrowth$supp=="VC")
size=as.data.frame(size)
row.names(size)=c("Sample Size")
names(size)=c('0.5 mg/day','1 mg/day','2 mg/day','ascorbic acid (VC)','orange juice (OJ)')
size
As shown in Table 1, we have equal sample size from with each dose amount and supplement type.
Let's viziualize the data using ggplot2.
ToothGrowth$dose=as.factor(ToothGrowth$dose)
ggplot(ToothGrowth, aes(x=dose,y=len))+geom_boxplot(aes(fill = dose))+
ggtitle('Fig.1. Tooth Growth Dependence on Dose and Supplement ')+ facet_grid(.~supp)+
theme(axis.title.y = element_text(colour="gray20",size=12,angle=90,hjust=.5,vjust=1),
axis.title.x = element_text(colour="gray20"),
plot.title = element_text(vjust=1.5,size = 12,colour="purple"),
axis.text.x = element_text(colour="red",size=10,angle=45,hjust=.5, vjust=.5))
From Fig.1., we observe that teeth length increases with the amount of dose for all dose levels. Further, we see that the supplement type influences teeth length when the doses are 0.5 and 1.0 mg/day. However, with a dose of 2.0 mg/day, the teeth length for the two supplement types seem similar. We will investigate if the summaries we drew from Fig.1. are statistically significant using hypothesis test.
Let's use hypothesis test to explore the sifnificance of the impacts of doze amount and supplement type on teeth growth. Let's assume unequal variances because that is safer when in doubt. Moreover, we assume that the populations are independent. So, we will use Welch Two Sample t-test. The null hypothesis of this test is that there is no difference between the population means. If the p-value of this test is smaller than 0.05, we will reject the null hypothesis and accept the alternative hypothesis, which states that the two means are different.
message('Table 2: 95% confidence interval for mean teeth \n length for each dose amount and supplement')
conf_int95=list()
conf_int95$len_dose0.5=round(t.test(ToothGrowth$len[ToothGrowth$dose==0.5])$conf.int,2)
conf_int95$len_dose1=round(t.test(ToothGrowth$len[ToothGrowth$dose==1.0])$conf.int,2)
conf_int95$len_dose2=round(t.test(ToothGrowth$len[ToothGrowth$dose==2.0])$conf.int,2)
conf_int95$OJ=round(t.test(ToothGrowth$len[ToothGrowth$supp=='OJ'])$conf.int,2)
conf_int95$VC=round(t.test(ToothGrowth$len[ToothGrowth$supp=='VC'])$conf.int,2)
conf_int95=as.data.frame(conf_int95)
row.names(conf_int95)=c('Lower Limit','Upper Limit')
names(conf_int95)=c('0.5 mg/day','1.0 mg/day','2.0 mg/day','OJ','VC')
conf_int95
From the 95% confidence intervals for the means shown in Table 2 above, we see that the upper limit and lower limit for each dose amount do not overlap, i.e, as dose amount increases, length increases significantly. However, with the supplement type, the ranges of the means overlap and this shows that length is not significantly different for the two supplment types. However, the supplment type has different impacts for the various doses as shown in Table 3. The supplement type has significant impact on teeth length for doses of 0.5 and 1.0 mg/day but not on 2.0 mg/day. With doses of 0.5, and 1.0 mg/day, the length of teeth with supplment Orange Juice is significantly longer than the teeth length with ascorbic acid supplement.
message('Table 3: 95% confidence interval for mean\n teeth length for each supplement and dose combination')
conf_int95=list()
conf_int95$OJ0.5=round(t.test(ToothGrowth$len[ToothGrowth$supp=="OJ" & ToothGrowth$dose==0.5])$conf.int,2)
conf_int95$OJ1=round(t.test(ToothGrowth$len[ToothGrowth$supp=="OJ" & ToothGrowth$dose==1])$conf.int,2)
conf_int95$OJ2=round(t.test(ToothGrowth$len[ToothGrowth$supp=="OJ" & ToothGrowth$dose==2])$conf.int,2)
conf_int95$VJ0.5=round(t.test(ToothGrowth$len[ToothGrowth$supp=="VC" & ToothGrowth$dose==0.5])$conf.int,2)
conf_int95$VJ1=round(t.test(ToothGrowth$len[ToothGrowth$supp=="VC" & ToothGrowth$dose==1])$conf.int,2)
conf_int95$VJ2=round(t.test(ToothGrowth$len[ToothGrowth$supp=="VC" & ToothGrowth$dose==2])$conf.int,2)
conf_int95=as.data.frame(conf_int95)
row.names(conf_int95)=c('Lower Limit','Upper Limit')
names(conf_int95)=c('OJ and 0.5 mg/day','OJ and 1 mg/day','OJ and 2 mg/day','VC and 0.5 mg/day','VC and 1 mg/day','VC and 2 mg/day')
conf_int95
message("Table 4: P-values for Welch's Two Sample T-test\n teeth length for each supplement and dose combination")
p_values=list()
p_values$dose0.5vs1=t.test(ToothGrowth$len[ToothGrowth$dose==0.5],ToothGrowth$len[ToothGrowth$dose==1])$p.value
p_values$dose0.5vs2=t.test(ToothGrowth$len[ToothGrowth$dose==0.5],ToothGrowth$len[ToothGrowth$dose==2])$p.value
p_values$dose1vs2=t.test(ToothGrowth$len[ToothGrowth$dose==1.0],ToothGrowth$len[ToothGrowth$dose==2])$p.value
p_values$VCvsOJ=t.test(ToothGrowth$len[ToothGrowth$supp=="VC"],ToothGrowth$len[ToothGrowth$supp=="OJ"])$p.value
p_values=as.data.frame(p_values)
row.names(p_values)=c('P-value')
names(p_values)=c('0.5 mg/day vs 1.0 mg/day','0.5 mg/day vs 2.0 mg/day','1.0 mg/day vs 2.0 mg/day','VC vs OJ')
p_values
message("Table 5: P-values for Welch's Two Sample T-test\n teeth length for each supplement and dose combination")
p_values=list()
p_values$OJ_VC_0.5 = t.test(ToothGrowth$len[ToothGrowth$supp=="OJ" & ToothGrowth$dose==0.5],ToothGrowth$len[ToothGrowth$supp=="VC" & ToothGrowth$dose==0.5])$p.value
p_values$OJ_VC_1 = t.test(ToothGrowth$len[ToothGrowth$supp=="OJ" & ToothGrowth$dose==1],ToothGrowth$len[ToothGrowth$supp=="VC" & ToothGrowth$dose==1])$p.value
p_values$OJ_VC_2 = t.test(ToothGrowth$len[ToothGrowth$supp=="OJ" & ToothGrowth$dose==2],ToothGrowth$len[ToothGrowth$supp=="VC" & ToothGrowth$dose==2])$p.value
p_values=as.data.frame(p_values)
row.names(p_values)=c('P-value')
names(p_values)=c('VC vs OJ with 0.5 mg/day','VC vs OJ with 1.0 mg/day','VC vs OJ with 2.0 mg/day')
p_values
Similar to the confidence intervals for the means shown in Table 2, the p-values are very small (less than 0.05) for the dose combinations (Table 4). This shows that teeth length is positively correlated with dose amount. If we see the p-value for the supplment types in Table 4, since it is greater than 0.05, we can not reject the null hypothesis. But from Table 5, we observe that supplement type has significant impact on dose amounts 0.5 mg/day and 1.0 mg/day but not on 2.0 mg/day.
In this analysis, basic statistical inference is used to investigate impacts of different doses of vitamin C (0.5, 1.0, and 2.0 mg/day) and supplements (Orange Juice and ascorbic acid ) on teeth length. Unequal variance Welch's Two Sample t-test is used for hypothesis test. Unequal variance is used because that is better and safer assumption when in doubt. Moreover, we assume that the populations are independent. The basic conclusion from this analysis are:
. Dose amount has significant impact on teeth length. As the dose increases, the teeth length increases.
. The true difference of the means of teeth length for the two suppments is not significantly different from zero at the 95% confidence level.
. The Supplement type has significant impact on teeth length for doses of 0.5 and 1.0 mg/day. With doses of 0.5, and 1.0 mg/day, the length of teeth with supplment Orange Juice is significantly longer than the teeth length with ascorbic acid supplement.
. However, with dose of 2.0 mg/day, the true means of teeth length with both supplemnets are not significantly different from each other, that is as the p-value is above the critical value, we do do not reject the null hypothesis, which states that the difference of the true means is zero.