This tutorial will cover the repeated measures analysis of variance (ANOVA) test, which has traditionally been used as the multivariate (or multi-group) alternative to the paired samples t-test. Note that newer, arguably better methods are currently being used as well, namely linear mixed-effects models (these will be covered in the next tutorial). Accordingly, this tutorial will be rather brief.
This data comprises L2 English essays written over a two year period by nine middle-school aged Dutch children studying at an English/Dutch bilingual school in the Netherlands. Essays were collected three times a year (roughly every four months) over two academic years. Included in the dataset are holistic scores for each essay (“Score”) and mean length of T-unit (MLT) values. In this tutorial, we will explore the relationship between holistic scores and time spent studying English, with the alternative hypothesis that holistic essay scores will increase as a function of time. For further reference, see Kyle (2016).
mydata <- read.csv("data/RM_sample.csv", header = TRUE)
#First, we create a new variable that is the categorical version of Time
mydata$FTime <- factor(mydata$Time)
summary(mydata)
## Participant Time Score MLT FTime
## Length:54 Min. :1.0 Min. :1.00 Min. : 6.895 1:9
## Class :character 1st Qu.:2.0 1st Qu.:3.00 1st Qu.: 9.438 2:9
## Mode :character Median :3.5 Median :4.00 Median :10.976 3:9
## Mean :3.5 Mean :4.13 Mean :11.517 4:9
## 3rd Qu.:5.0 3rd Qu.:5.00 3rd Qu.:12.906 5:9
## Max. :6.0 Max. :7.00 Max. :18.889 6:9
First, we can look at the means at each time point.
library(ggplot2)
ggplot(data = mydata, aes(x = FTime, y = Score, group = Time)) +
geom_boxplot()