This tutorial will cover the repeated measures analysis of variance (ANOVA) test, which has traditionally been used as the multivariate (or multi-group) alternative to the paired samples t-test. Note that newer, arguably better methods are currently being used as well, namely linear mixed-effects models (these will be covered in the next tutorial). Accordingly, this tutorial will be rather brief.
This data comprises L2 English essays written over a two year period by nine middle-school aged Dutch children studying at an English/Dutch bilingual school in the Netherlands. Essays were collected three times a year (roughly every four months) over two academic years. Included in the dataset are holistic scores for each essay (“Score”) and mean length of T-unit (MLT) values. In this tutorial, we will explore the relationship between holistic scores and time spent studying English, with the alternative hypothesis that holistic essay scores will increase as a function of time. For further reference, see Kyle (2016).
mydata <- read.csv("data/RM_sample.csv", header = TRUE)
#First, we create a new variable that is the categorical version of Time
mydata$FTime <- factor(mydata$Time)
summary(mydata)
## Participant Time Score MLT FTime
## Length:54 Min. :1.0 Min. :1.00 Min. : 6.895 1:9
## Class :character 1st Qu.:2.0 1st Qu.:3.00 1st Qu.: 9.438 2:9
## Mode :character Median :3.5 Median :4.00 Median :10.976 3:9
## Mean :3.5 Mean :4.13 Mean :11.517 4:9
## 3rd Qu.:5.0 3rd Qu.:5.00 3rd Qu.:12.906 5:9
## Max. :6.0 Max. :7.00 Max. :18.889 6:9
First, we can look at the means at each time point.
library(ggplot2)
library(viridis) #color-friendly palettes
g1 <- ggplot(data = mydata, aes(x = FTime, y = Score, group = Time)) +
geom_boxplot()
#print(g1)
Then, we can get a clearer view by looking at individual trajectories.
g2 <- ggplot(data = mydata, aes(x = FTime, y = Score, group = Participant)) +
geom_boxplot(aes(group=Time)) +
geom_line(aes(color=Participant)) +
geom_point(aes(color=Participant)) +
scale_color_viridis(discrete = TRUE)
#print(g2)
Sometimes, we get a clearer view of individual trajectories when we use facet wrap:
# customizing colors
colors <- c("orange", "#56B4E9", "#009E73", "gold2", "#0072B2", "purple", "#CC79A7", "#999999", "#000000")
g3 <- ggplot(data = mydata, aes(x = FTime, y = Score, group = Participant)) +
geom_line(aes(color=Participant)) +
geom_point(aes(color=Participant)) +
scale_color_manual(values = colors) +
facet_wrap(~Participant)
#print(g3)
Below, we conduct the repeated measures ANOVA. Note that this is almost the same as an independent one-way ANOVA, but there is one addition. We account for that fact that each participant wrote multiple essays in our sample by adding an “Error” term (in this case, “Participant”). As we can see, there are significant differences between at least two of our groups (p = 7.2e-09).
rm_anova <- aov(Score~FTime + Error(Participant), data=mydata)
summary(rm_anova)
##
## Error: Participant
## Df Sum Sq Mean Sq F value Pr(>F)
## Residuals 8 2.593 0.3241
##
## Error: Within
## Df Sum Sq Mean Sq F value Pr(>F)
## FTime 5 50.98 10.196 16.63 7.2e-09 ***
## Residuals 40 24.52 0.613
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
For this analysis, we will cheat just a bit with the effect size, and use R^2 (which, as we recall is the same as eta squared). As we see below, our R^2 value (and our eta-squared value) indicates a large (R^2 = .581) effect. There may be more appropriate effect sizes to run here, but RM ANOVAs are one of the few analyses that are difficult to run in R. The good news is that there is a reason for this, namely that linear mixed effects models are superior in most cases to RM ANOVAs, and are extensively supported in R.
cor(mydata$Time, mydata$Score)^2
## [1] 0.5805278
We will have to do a fair bit of work here to get all of our pairwise comparisons. Below is the pairwise contrast between Time 1 and 2. We see that there is a significant difference (p = 0.012) in Score between Time 1 and 2 and that there is a large effect (d = 1.8). Note that if we do multiple pairwise comparisons, we will need to adjust our alpha for the number of tests conducted. Because we are going to look at a better method (linear mixed effects models) soon, we will skip the extra work here.
library(dplyr)
library(psych)
#create new df with only Time 1 and 2
Time_12 <- mydata %>%
filter(FTime == "1" | FTime == "2") %>%
select(FTime, Score, Participant)
summary(Time_12)
## FTime Score Participant
## 1:9 Min. :1.000 Length:18
## 2:9 1st Qu.:2.250 Class :character
## 3:0 Median :3.000 Mode :character
## 4:0 Mean :3.167
## 5:0 3rd Qu.:4.000
## 6:0 Max. :5.000
# Conduct paired samples t-test:
t.test(Score~FTime, paired = TRUE, data = Time_12)
##
## Paired t-test
##
## data: Score by FTime
## t = -3.25, df = 8, p-value = 0.0117
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## -2.4693352 -0.4195537
## sample estimates:
## mean of the differences
## -1.444444
# Get effect size
cohen.d(Time_12, "FTime")
## Call: cohen.d(x = Time_12, group = "FTime")
## Cohen d statistic of difference between two means
## lower effect upper
## Score 0.53 1.84 3.09
## Participant -0.92 0.00 0.92
##
## Multivariate (Mahalanobis) distance between groups
## [1] 1.9
## r equivalent of difference between two means
## Score Participant
## 0.68 0.00
We will now use some sample data to conduct a two way repeated measures ANOVA. In the sample data, we have two groups (A and B) and vocabulary test scores that represent pre, post, and delayed post tests.
mydata2 <- read.csv("data/factorial_rm_anova_sample.csv", header = TRUE)
#First, we create a new variable that is the categorical version of Time
mydata2$Participant_ID <- factor(mydata2$Participant_ID)
# then, we will order our tests (pre, post, delayed-post)
mydata2$Time = factor(mydata2$Time, levels = c("Pre","Post","D_Post"))
summary(mydata2)
## Participant_ID Group Time V_Test
## 1 : 3 Length:900 Pre :300 Min. : 6.181
## 2 : 3 Class :character Post :300 1st Qu.:16.382
## 3 : 3 Mode :character D_Post:300 Median :20.502
## 4 : 3 Mean :20.261
## 5 : 3 3rd Qu.:24.177
## 6 : 3 Max. :32.204
## (Other):882
First, we can create boxplots.
library("ggpubr")
g4 <- ggboxplot(mydata2, x = "Time", y = "V_Test", color = "Group",
palette = c("#377eb8", "orange"))
#print(g4)
We can also visualize the data per participant:
library("ggplot2")
g5 <- ggplot(mydata2, aes(x=Time, y=V_Test)) +
geom_boxplot() +
geom_point(aes(color = Participant_ID), show.legend = FALSE) +
geom_line(aes(group = Participant_ID, color = Participant_ID), show.legend = FALSE) +
scale_color_viridis(discrete = TRUE) +
facet_wrap(~Group)
#print(g5)
<img src=“plot5.png” alt=A line plot with boxplot showing three time points of Pre, Post, and D_Post. Two treatment groups are shown: Treatment_A on the leftand Treatment_B on the right. Each line represents an individual progression over time, with points indicating their scores at each time point. The boxplots overlay the lines, summarizing the distribution of scores with medians, interquartile ranges, and outliers.”>
Finally, we will run the analysis (and interpret our results).
library(car)
rmf_anova <- aov(V_Test ~ Time * Group + Error(Participant_ID/(Time * Group)), data = mydata2)
summary(rmf_anova)
##
## Error: Participant_ID
## Df Sum Sq Mean Sq F value Pr(>F)
## Group 1 1493 1493.4 50.89 7.45e-12 ***
## Residuals 298 8745 29.3
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Error: Participant_ID:Time
## Df Sum Sq Mean Sq F value Pr(>F)
## Time 2 12352 6176 54922 <2e-16 ***
## Time:Group 2 1184 592 5264 <2e-16 ***
## Residuals 596 67 0
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#Anova(rmf_anova, type = "III")
So, as we can see, there is a significant effect for participants within each group (they seem to behave differently), a significant main effect for Time, and a significant interaction between Time and Group. In other words, there are significant differences between the tests, and that there are differences between the groups (somewhere).
Because the general consensus has been to use linear mixed effects models, we are not going to go into the post-hoc analyses here. We could do it (e.g., with dplyr and by calculating our own p-value corrections), but it would be fairly burdensome.
See the following tutorials for more information about completing a two-way repeated measures ANOVA: https://datascienceplus.com/two-way-anova-with-repeated-measures/ https://sapa-project.org/blog/2013/06/28/repeated-measures-anova-in-r/