Linear mixed effects models are mathematically and conceptually related to a linear regression (and accordingly to an ANOVA). The biggest difference between and LME and a linear regression is that an LME can adjust the line of best fit based on trajectories of particular individuals (or groups). In this tutorial, we will use linear mixed-effects models to examine the relationship between time spent learning English as an L2 and writing development (measured via an index of syntactic complexity).
We will first treat Time as a categorical variable to see the degree to which changes in syntactic complexity occur between data collection points. This analysis is related to a repeated measures ANOVA both conceptually and mathematically. However, an LME is arguably more precise than an ANOVA because it takes into account the individual trajectories of each participant.
We will then treat Time as a continuous variable to see the degree to which changes in syntactic complexity are linear (i.e., follow a reasonably consistent trajectory).
This data comprises L2 English essays written over a two year period by nine middle-school aged Dutch children studying at an English/Dutch bilingual school in the Netherlands. Essays were collected three times a year (every four months) over two academic years. Included in the dataset are holistic scores for each essay (“Score”) and mean length of T-unit (MLT) values. In this tutorial, we will explore the relationship between MLT and time spent studying English, with the alternative hypothesis that MLT scores will increase as a function of time. For further reference, see Kyle, Crossley, & Verspoor (2021) and Kyle (2016). In other words, we will be attempting to determine whether (and the degree to which) Dutch EFL middle school students write more words per T-unit (a T-unit is an independent clause and all connected depedent clauses) as a function of the time they spend studying English.
mydata <- read.csv("data/RM_sample.csv", header = TRUE) #First, we create a new variable that is the categorical version of Time mydata$FTime <- factor(mydata$Time) summary(mydata)
## Participant Time Score MLT FTime ## Length:54 Min. :1.0 Min. :1.00 Min. : 6.895 1:9 ## Cl