For this exercise you will need the lme4 and ggplot2 packages.

library(lme4)
library(ggplot2)

The sleepstudy data

We will use a dataset, sleepstudy, that is available from the package lme4. The sleepstudy data set is from an experiment that examined reaction times of sleep deprived individuals over the course of a few days (individuals only got 3h of sleep each day). You can get an overview of the data using the function str(), looking at the first few rows of the data frame with head(), and by plotting it with ggplot().

str(sleepstudy)
## 'data.frame':    180 obs. of  3 variables:
##  $ Reaction: num  250 259 251 321 357 ...
##  $ Days    : num  0 1 2 3 4 5 6 7 8 9 ...
##  $ Subject : Factor w/ 18 levels "308","309","310",..: 1 1 1 1 1 1 1 1 1 1 ...
head(sleepstudy,20)
##    Reaction Days Subject
## 1  249.5600    0     308
## 2  258.7047    1     308
## 3  250.8006    2     308
## 4  321.4398    3     308
## 5  356.8519    4     308
## 6  414.6901    5     308
## 7  382.2038    6     308
## 8  290.1486    7     308
## 9  430.5853    8     308
## 10 466.3535    9     308
## 11 222.7339    0     309
## 12 205.2658    1     309
## 13 202.9778    2     309
## 14 204.7070    3     309
## 15 207.7161    4     309
## 16 215.9618    5     309
## 17 213.6303    6     309
## 18 217.7272    7     309
## 19 224.2957    8     309
## 20 237.3142    9     309
ggplot(sleepstudy, aes(Days, Reaction, group=Subject, colour=Subject)) +
  geom_point() + 
  facet_wrap(~Subject, ncol=9) + 
  scale_x_continuous(limits=c(0, 10),breaks=c(0,10)) +
  theme_minimal()

From the look of it, people show slower reaction times as they became increasingly sleep deprived. But how should we quantify the effects of sleep deprivation and possible individual differences?

The relation between reaction times and days looks like they could be captured in some regression model. We will try out different ones sequentially to see how different assumptions capture the link between sleep deprivation and reaction times as well as any individual differences thereof.

A bunch of models…

The first model below assumes that people are all the same and that there is no relation between Days and Reaction Times. Take a moment to look at the equation below and note that this model includes only an intercept, 1. Consequently, in the figure below, all participants are modeled with the same single value, the overall average reaction time.

model1 <- lm(Reaction ~ 1, sleepstudy)
summary(model1)
## 
## Call:
## lm(formula = Reaction ~ 1, data = sleepstudy)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -104.176  -43.132   -9.857   38.244  167.846 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  298.508      4.198    71.1   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 56.33 on 179 degrees of freedom
datafit=fitted(model1)
ggplot(sleepstudy, aes(Days, Reaction, group=Subject, colour=Subject)) + 
  geom_point() + 
  geom_line(aes(y=datafit), linetype=2) + 
  facet_wrap(~Subject, ncol=9) + 
  scale_x_continuous(limits=c(0, 10),breaks=c(0,10)) +
  theme_minimal()

The second model below assumes that people differ in their initial reaction times but still, like the previous model, that there is no relation between Days and Reaction Times.

model2 <- lmer(Reaction ~ 1 + (1 | Subject), sleepstudy)
summary(model2)
## Linear mixed model fit by REML ['lmerMod']
## Formula: Reaction ~ 1 + (1 | Subject)
##    Data: sleepstudy
## 
## REML criterion at convergence: 1904.3
## 
## Scaled residuals: 
##     Min      1Q  Median      3Q     Max 
## -2.4983 -0.5501 -0.1476  0.5123  3.3446 
## 
## Random effects:
##  Groups   Name        Variance Std.Dev.
##  Subject  (Intercept) 1278     35.75   
##  Residual             1959     44.26   
## Number of obs: 180, groups:  Subject, 18
## 
## Fixed effects:
##             Estimate Std. Error t value
## (Intercept)   298.51       9.05   32.98

One thing you see in the model summary above (sometimes called an unconditional means model) is that the between-person variance, 1278, is about 39% of the total variance (i.e., the sum of the between and within-person variance, 1278 + 1959). This ratio is called the intraclass correlation coefficient (ICC) and it suggests that a significant portion of the variance is found within participants.

[NOTE: ICC can help you determine whether or not a mixed model is warranted as well as the nature of your construct. If you find that the ICC is close to 0, the observations within clusters are no more similar than observations from different clusters so perhaps you could ignore the clustering altogether. In contrast, if you find that ICC is close to 1, this suggests that a construct varies between people but not within a person and you can think of it as a trait (e.g., personality characteristic).

Note that, in the figure below, each participant now gets his/her own intercept (but still no effect of Days).

An obvious extension of the previous models is to consider the effect of Days. Please note that the model below considers and overall effect of Days that does NOT differ between participants…

model3 <- lmer(Reaction ~ 1 + Days + (1 | Subject), sleepstudy)
summary(model3)
## Linear mixed model fit by REML ['lmerMod']
## Formula: Reaction ~ 1 + Days + (1 | Subject)
##    Data: sleepstudy
## 
## REML criterion at convergence: 1786.5
## 
## Scaled residuals: 
##     Min      1Q  Median      3Q     Max 
## -3.2257 -0.5529  0.0109  0.5188  4.2506 
## 
## Random effects:
##  Groups   Name        Variance Std.Dev.
##  Subject  (Intercept) 1378.2   37.12   
##  Residual              960.5   30.99   
## Number of obs: 180, groups:  Subject, 18
## 
## Fixed effects:
##             Estimate Std. Error t value
## (Intercept) 251.4051     9.7467   25.79
## Days         10.4673     0.8042   13.02
## 
## Correlation of Fixed Effects:
##      (Intr)
## Days -0.371
datafit=fitted(model3)
ggplot(sleepstudy, aes(Days, Reaction, group=Subject, colour=Subject)) + 
  geom_point() + 
  geom_line(aes(y=datafit), linetype=2) + 
  facet_wrap(~Subject, ncol=9) + 
  scale_x_continuous(limits=c(0, 10),breaks=c(0,10)) +
  theme_minimal()

Naturally, we may expect that the effect of Days on reaction times may be different for each participant (we get that feeling from the raw data, check out participant 335). Model 4 below thus includes the effect of days as a fixed effect (across participants) but also asks the model to estimate differences bewteen participants (Days is introduced as a fixed AND as a random effect).

model4 <- lmer(Reaction ~ 1 + Days + (1 + Days | Subject), sleepstudy)
summary(model4)
## Linear mixed model fit by REML ['lmerMod']
## Formula: Reaction ~ 1 + Days + (1 + Days | Subject)
##    Data: sleepstudy
## 
## REML criterion at convergence: 1743.6
## 
## Scaled residuals: 
##     Min      1Q  Median      3Q     Max 
## -3.9536 -0.4634  0.0231  0.4634  5.1793 
## 
## Random effects:
##  Groups   Name        Variance Std.Dev. Corr
##  Subject  (Intercept) 612.10   24.741       
##           Days         35.07    5.922   0.07
##  Residual             654.94   25.592       
## Number of obs: 180, groups:  Subject, 18
## 
## Fixed effects:
##             Estimate Std. Error t value
## (Intercept)  251.405      6.825  36.838
## Days          10.467      1.546   6.771
## 
## Correlation of Fixed Effects:
##      (Intr)
## Days -0.138

It is likely helpful to see how the fixed or random effects coefficients look like for this model…

fixef(model4)
## (Intercept)        Days 
##   251.40510    10.46729
ranef(model4)
## $Subject
##     (Intercept)        Days
## 308   2.2585509   9.1989758
## 309 -40.3987381  -8.6196806
## 310 -38.9604090  -5.4488565
## 330  23.6906196  -4.8143503
## 331  22.2603126  -3.0699116
## 332   9.0395679  -0.2721770
## 333  16.8405086  -0.2236361
## 334  -7.2326151   1.0745816
## 335  -0.3336684 -10.7521652
## 337  34.8904868   8.6282652
## 349 -25.2102286   1.1734322
## 350 -13.0700342   6.6142178
## 351   4.5778642  -3.0152621
## 352  20.8636782   3.5360011
## 369   3.2754656   0.8722149
## 370 -25.6129993   4.8224850
## 371   0.8070461  -0.9881562
## 372  12.3145921   1.2840221
## 
## with conditional variances for "Subject"

Comparing Mixed effects vs. No-pooling: Shrinkage

Does model 4 (random intercepts and slopes) simply provide estimates that are the same as computing individual correlations/regression for each person? Not quite… as you can see below the regression lines done for each single individual (solid lines) differ from those of Model 4 (dotted lines). For example, for subject 335, the mixed-effects model seems to be less willing to expect reaction times to drop across days - the upward patterns for other subjects influence the model’s fit to the 335’s data. This statistical phenomenon is called “shrinkage”.

ggplot(sleepstudy, aes(Days, Reaction, group=Subject, colour=Subject)) + 
  geom_point() + 
  geom_line(aes(y=datafit), linetype=2) + 
  facet_wrap(~Subject, ncol=9) + 
  scale_x_continuous(limits=c(0, 10),breaks=c(0,10)) + 
  geom_smooth(method=lm,se=FALSE)+
  theme_minimal()
## `geom_smooth()` using formula 'y ~ x'

To further illustrate the principle of shrinkage (or borrowing strength) one can compare the estimates from a model that fits each participant separetly (the solid lines in the figure above, model 1 below) and a mixed effects model (the dashed lines in the figure above, model 2 below)

model0=lmList(Reaction ~Days |Subject, sleepstudy)
#model2=lmer(Reaction ~ 1 + Days + (1+Days|Subject), sleepstudy)

coefs_avg=fixef(model0)
coefs_m1=coef(model0)
coefs_m2=ranef(model4)[[1]]

coefs_m1
##     (Intercept)      Days
## 308    244.1927 21.764702
## 309    205.0549  2.261785
## 310    203.4842  6.114899
## 330    289.6851  3.008073
## 331    285.7390  5.266019
## 332    264.2516  9.566768
## 333    275.0191  9.142045
## 334    240.1629 12.253141
## 335    263.0347 -2.881034
## 337    290.1041 19.025974
## 349    215.1118 13.493933
## 350    225.8346 19.504017
## 351    261.1470  6.433498
## 352    276.3721 13.566549
## 369    254.9681 11.348109
## 370    210.4491 18.056151
## 371    253.6360  9.188445
## 372    267.0448 11.298073
coefs_m2
##     (Intercept)        Days
## 308   2.2585509   9.1989758
## 309 -40.3987381  -8.6196806
## 310 -38.9604090  -5.4488565
## 330  23.6906196  -4.8143503
## 331  22.2603126  -3.0699116
## 332   9.0395679  -0.2721770
## 333  16.8405086  -0.2236361
## 334  -7.2326151   1.0745816
## 335  -0.3336684 -10.7521652
## 337  34.8904868   8.6282652
## 349 -25.2102286   1.1734322
## 350 -13.0700342   6.6142178
## 351   4.5778642  -3.0152621
## 352  20.8636782   3.5360011
## 369   3.2754656   0.8722149
## 370 -25.6129993   4.8224850
## 371   0.8070461  -0.9881562
## 372  12.3145921   1.2840221

As you can see above, the coefs from model 1 do not have the same meaning as those of model 2 - keep in mind that the latter represent deviations from the sample mean (fixed effects) so we’ll have to add the fixed effects to these coefficents before comparing the estimates from the two models…

coefs_m2[,1]=coefs_m2[,1]+fixef(model4)[1]
coefs_m2[,2]=coefs_m2[,2]+fixef(model4)[2]

We can now plot these values to see how the estimates are different for each person. In the figure below, each pair of points connected by an arrow indicates estimates for the same individual, and the arrow head points towards the estimate from the mixed-effects model. What can you conclude?

Model Comparison

Overall, Model 4 seems to capture the data well (and better than simpler ones). However, the models above differ not only in how well they fit the data but also in their complexity (i.e., number of parameters). At every step of model building we may want to test the assumption that the increased complexity (i.e., the addition of a parameter) is justified. In order to do this, we may want to test whether two models differ in their fit to the data (given their complexity). The anova() command shows a number of fit indices that do just that (e.g., AIC, BIC) and a statistical test of the difference between models. For example, as would be expected from looking at the model fits in the figure, model 4 provides a significantly better fit relative to the simpler models 2 and 3 (has lower AIC and BIC values). Note that we can’t compare different families of models (lm vs. lmer), which excludeds model1.

anova(model2,model3,model4)
## refitting model(s) with ML (instead of REML)
## Data: sleepstudy
## Models:
## model2: Reaction ~ 1 + (1 | Subject)
## model3: Reaction ~ 1 + Days + (1 | Subject)
## model4: Reaction ~ 1 + Days + (1 + Days | Subject)
##        npar    AIC    BIC  logLik deviance   Chisq Df Pr(>Chisq)    
## model2    3 1916.5 1926.1 -955.27   1910.5                          
## model3    4 1802.1 1814.8 -897.04   1794.1 116.462  1  < 2.2e-16 ***
## model4    6 1763.9 1783.1 -875.97   1751.9  42.139  2  7.072e-10 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1