Description of sample data

This data comprises L2 English essays written over a two year period by nine middle-school aged Dutch children studying at an English/Dutch bilingual school in the Netherlands. Essays were collected three times a year (roughly every four months) over two academic years. Included in the dataset are holistic scores for each essay (“Score”) and mean length of T-unit (MLT) values. In this tutorial, we will explore the relationship between MLT and time spent studying English, with the alternative hypothesis that MLT scores will increase as a function of time. For further reference, see Kyle (2016). In other words, we will be attempting to determine whether (and the degree to which) Dutch EFL middle school students write more words per T-unit (a T-unit is an independent clause and all connected depedent clauses) as a function of the time they spend studying English.

mydata <- read.csv("data/RM_sample.csv", header = TRUE)
#First, we create a new variable that is the categorical version of Time
mydata$FTime <- factor(mydata$Time)
summary(mydata)
##   Participant      Time         Score           MLT         FTime
##  EFL_1  : 6   Min.   :1.0   Min.   :1.00   Min.   : 6.895   1:9  
##  EFL_2  : 6   1st Qu.:2.0   1st Qu.:3.00   1st Qu.: 9.438   2:9  
##  EFL_3  : 6   Median :3.5   Median :4.00   Median :10.976   3:9  
##  EFL_4  : 6   Mean   :3.5   Mean   :4.13   Mean   :11.517   4:9  
##  EFL_5  : 6   3rd Qu.:5.0   3rd Qu.:5.00   3rd Qu.:12.906   5:9  
##  EFL_6  : 6   Max.   :6.0   Max.   :7.00   Max.   :18.889   6:9  
##  (Other):18