This tutorial will cover the repeated measures analysis of variance (ANOVA) test, which has traditionally been used as the multivariate (or multi-group) alternative to the paired samples t-test. Note that newer, arguably better methods are currently being used as well, namely linear mixed-effects models (these will be covered in the next tutorial). Accordingly, this tutorial will be rather brief.
This data comprises L2 English essays written over a two year period by nine middle-school aged Dutch children studying at an English/Dutch bilingual school in the Netherlands. Essays were collected three times a year (roughly every four months) over two academic years. Included in the dataset are holistic scores for each essay (“Score”) and mean length of T-unit (MLT) values. In this tutorial, we will explore the relationship between holistic scores and time spent studying English, with the alternative hypothesis that holistic essay scores will increase as a function of time. For further reference, see Kyle (2016).
mydata <- read.csv("data/RM_sample.csv", header = TRUE) #First, we create a new variable that is the categorical version of Time mydata$FTime <- factor(mydata$Time) summary(mydata)
## Participant Time Score MLT FTime ## Length:54 Min. :1.0 Min. :1.00 Min. : 6.895 1:9 ## Class :character 1st Qu.:2.0 1st Qu.:3.00 1st Qu.: 9.438 2:9 ## Mode :character Median :3.5 Median :4.00 Median :10.976 3:9 ## Mean :3.5 Mean :4.13 Mean :11.517 4:9 ## 3rd Qu.:5.0 3rd Qu.:5.00 3rd Qu.:12.906 5:9 ## Max. :6.0 Max. :7.00 Max. :18.889 6:9
First, we can look at the means at each time point.
library(ggplot2) ggplot(data = mydata, aes(x = FTime, y = Score, group = Time)) + geom_boxplot()