Back to Homepage

Repeated Measures ANOVA

This tutorial will cover the repeated measures analysis of variance (ANOVA) test, which has traditionally been used as the multivariate (or multi-group) alternative to the paired samples t-test. Note that newer, arguably better methods are currently being used as well, namely linear mixed-effects models (these will be covered in the next tutorial). Accordingly, this tutorial will be rather brief.

Description of sample data

This data comprises L2 English essays written over a two year period by nine middle-school aged Dutch children studying at an English/Dutch bilingual school in the Netherlands. Essays were collected three times a year (roughly every four months) over two academic years. Included in the dataset are holistic scores for each essay (“Score”) and mean length of T-unit (MLT) values. In this tutorial, we will explore the relationship between holistic scores and time spent studying English, with the alternative hypothesis that holistic essay scores will increase as a function of time. For further reference, see Kyle (2016).

mydata <- read.csv("data/RM_sample.csv", header = TRUE)
#First, we create a new variable that is the categorical version of Time
mydata$FTime <- factor(mydata$Time)
summary(mydata)
##  Participant             Time         Score           MLT         FTime
##  Length:54          Min.   :1.0   Min.   :1.00   Min.   : 6.895   1:9  
##  Class :character   1st Qu.:2.0   1st Qu.:3.00   1st Qu.: 9.438   2:9  
##  Mode  :character   Median :3.5   Median :4.00   Median :10.976   3:9  
##                     Mean   :3.5   Mean   :4.13   Mean   :11.517   4:9  
##                     3rd Qu.:5.0   3rd Qu.:5.00   3rd Qu.:12.906   5:9  
##                     Max.   :6.0   Max.   :7.00   Max.   :18.889   6:9

Visualizing the data

First, we can look at the means at each time point.

library(ggplot2)
library(viridis) #color-friendly palettes

g1 <- ggplot(data = mydata, aes(x = FTime, y = Score, group = Time)) +
  geom_boxplot()

#print(g1)

Boxplot showing score distribution across six time points. Each box represents the interquartile range, which contains the horizontal line within each box marks the median score. The whiskers extend to the furthest points that are not considered outliers. Potential outliers are represented as individual black dots outside the whiskers.

Then, we can get a clearer view by looking at individual trajectories.

g2 <- ggplot(data = mydata, aes(x = FTime, y = Score, group = Participant)) +
  geom_boxplot(aes(group=Time)) +
  geom_line(aes(color=Participant)) +
  geom_point(aes(color=Participant)) +
  scale_color_viridis(discrete = TRUE)

#print(g2)

A multi-colored line and boxplot chart representing scores over six time points for nine participant labeled EFL_1 to EFL_9. Each participant's scores are connected by lines, with distinct colors. The boxplot overlaid on the line chart shows the distribution of scores at each time point, with the median score indicated by a horizontal line within each box. The whiskers extend to the furthest points that are not considered outliers. Potential outliers are represented as individual dots outside the whiskers.

Sometimes, we get a clearer view of individual trajectories when we use facet wrap:

# customizing colors
colors <- c("orange", "#56B4E9", "#009E73", "gold2", "#0072B2", "purple", "#CC79A7", "#999999", "#000000") 

g3 <- ggplot(data = mydata, aes(x = FTime, y = Score, group = Participant)) +
  geom_line(aes(color=Participant)) +
  geom_point(aes(color=Participant)) +
  scale_color_manual(values = colors) +    
  facet_wrap(~Participant)

#print(g3)

A faceted line chart displaying individual score trajectories for nine participants across six time points. The chart is divided into nine panels, one for each participant, showing the progression of scores from time point 1 to 6. The trend in the scores is mostly upwards as time progresses.

Conducting repeated measures ANOVA

Below, we conduct the repeated measures ANOVA. Note that this is almost the same as an independent one-way ANOVA, but there is one addition. We account for that fact that each participant wrote multiple essays in our sample by adding an “Error” term (in this case, “Participant”). As we can see, there are significant differences between at least two of our groups (p = 7.2e-09).

rm_anova <- aov(Score~FTime + Error(Participant), data=mydata)
summary(rm_anova)
## 
## Error: Participant
##           Df Sum Sq Mean Sq F value Pr(>F)
## Residuals  8  2.593  0.3241               
## 
## Error: Within
##           Df Sum Sq Mean Sq F value  Pr(>F)    
## FTime      5  50.98  10.196   16.63 7.2e-09 ***
## Residuals 40  24.52   0.613                    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

For this analysis, we will cheat just a bit with the effect size, and use R^2 (which, as we recall is the same as eta squared). As we see below, our R^2 value (and our eta-squared value) indicates a large (R^2 = .581) effect. There may be more appropriate effect sizes to run here, but RM ANOVAs are one of the few analyses that are difficult to run in R. The good news is that there is a reason for this, namely that linear mixed effects models are superior in most cases to RM ANOVAs, and are extensively supported in R.

cor(mydata$Time, mydata$Score)^2
## [1] 0.5805278

Post-hoc analyses

We will have to do a fair bit of work here to get all of our pairwise comparisons. Below is the pairwise contrast between Time 1 and 2. We see that there is a significant difference (p = 0.012) in Score between Time 1 and 2 and that there is a large effect (d = 1.8). Note that if we do multiple pairwise comparisons, we will need to adjust our alpha for the number of tests conducted. Because we are going to look at a better method (linear mixed effects models) soon, we will skip the extra work here.

library(dplyr)
library(psych)
#create new df with only Time 1 and 2
Time_12 <- mydata %>%
  filter(FTime == "1" | FTime == "2") %>%
  select(FTime, Score, Participant)
summary(Time_12)
##  FTime     Score       Participant       
##  1:9   Min.   :1.000   Length:18         
##  2:9   1st Qu.:2.250   Class :character  
##  3:0   Median :3.000   Mode  :character  
##  4:0   Mean   :3.167                     
##  5:0   3rd Qu.:4.000                     
##  6:0   Max.   :5.000
# Conduct paired samples t-test:
t.test(Score~FTime, paired = TRUE, data = Time_12)
## 
##  Paired t-test
## 
## data:  Score by FTime
## t = -3.25, df = 8, p-value = 0.0117
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -2.4693352 -0.4195537
## sample estimates:
## mean of the differences 
##               -1.444444
# Get effect size
cohen.d(Time_12, "FTime")
## Call: cohen.d(x = Time_12, group = "FTime")
## Cohen d statistic of difference between two means
##             lower effect upper
## Score        0.53   1.84  3.09
## Participant -0.92   0.00  0.92
## 
## Multivariate (Mahalanobis) distance between groups
## [1] 1.9
## r equivalent of difference between two means
##       Score Participant 
##        0.68        0.00

Two-Way RM ANOVA

We will now use some sample data to conduct a two way repeated measures ANOVA. In the sample data, we have two groups (A and B) and vocabulary test scores that represent pre, post, and delayed post tests.

mydata2 <- read.csv("data/factorial_rm_anova_sample.csv", header = TRUE)
#First, we create a new variable that is the categorical version of Time
mydata2$Participant_ID <- factor(mydata2$Participant_ID)
# then, we will order our tests (pre, post, delayed-post)
mydata2$Time = factor(mydata2$Time, levels = c("Pre","Post","D_Post"))
summary(mydata2)
##  Participant_ID    Group               Time         V_Test      
##  1      :  3    Length:900         Pre   :300   Min.   : 6.181  
##  2      :  3    Class :character   Post  :300   1st Qu.:16.382  
##  3      :  3    Mode  :character   D_Post:300   Median :20.502  
##  4      :  3                                    Mean   :20.261  
##  5      :  3                                    3rd Qu.:24.177  
##  6      :  3                                    Max.   :32.204  
##  (Other):882

Visualizing the data

First, we can create boxplots.

library("ggpubr")

g4 <- ggboxplot(mydata2, x = "Time", y = "V_Test", color = "Group",
          palette = c("#377eb8", "orange"))

#print(g4)

Boxplot visualization representing three time points of Pre, Post, and D_Post. Two groups are depicted, Treatment_A in on left and Treatment_B on right. Each time point displays a pair of boxplots corresponding to the two treatments. The boxplots illustrate the median, interquartile range, and any outliers for the V_Test. Outliers are depicted as points beyond the whiskers.

We can also visualize the data per participant:

library("ggplot2")

g5 <- ggplot(mydata2, aes(x=Time, y=V_Test)) +
  geom_boxplot() +
  geom_point(aes(color = Participant_ID), show.legend = FALSE) +
  geom_line(aes(group = Participant_ID, color = Participant_ID), show.legend = FALSE) +
  scale_color_viridis(discrete = TRUE) +
  facet_wrap(~Group)

#print(g5)

<img src=“plot5.png” alt=A line plot with boxplot showing three time points of Pre, Post, and D_Post. Two treatment groups are shown: Treatment_A on the leftand Treatment_B on the right. Each line represents an individual progression over time, with points indicating their scores at each time point. The boxplots overlay the lines, summarizing the distribution of scores with medians, interquartile ranges, and outliers.”>

Finally, we will run the analysis (and interpret our results).

library(car)
rmf_anova <- aov(V_Test ~ Time * Group + Error(Participant_ID/(Time * Group)), data = mydata2)
summary(rmf_anova)
## 
## Error: Participant_ID
##            Df Sum Sq Mean Sq F value   Pr(>F)    
## Group       1   1493  1493.4   50.89 7.45e-12 ***
## Residuals 298   8745    29.3                     
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Error: Participant_ID:Time
##             Df Sum Sq Mean Sq F value Pr(>F)    
## Time         2  12352    6176   54922 <2e-16 ***
## Time:Group   2   1184     592    5264 <2e-16 ***
## Residuals  596     67       0                   
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#Anova(rmf_anova, type = "III")

So, as we can see, there is a significant effect for participants within each group (they seem to behave differently), a significant main effect for Time, and a significant interaction between Time and Group. In other words, there are significant differences between the tests, and that there are differences between the groups (somewhere).

For later…

Because the general consensus has been to use linear mixed effects models, we are not going to go into the post-hoc analyses here. We could do it (e.g., with dplyr and by calculating our own p-value corrections), but it would be fairly burdensome.

See the following tutorials for more information about completing a two-way repeated measures ANOVA: https://datascienceplus.com/two-way-anova-with-repeated-measures/ https://sapa-project.org/blog/2013/06/28/repeated-measures-anova-in-r/