Multilevel models

# Multilevel models
## Estimation + Interpretation
### 🥧 day 2022

---

## [Click here for PDF of slides](17-multilevel-models-pt3.pdf)

---

## Announcements

- Quiz 03: Wed, Mar 16 - Fri, Mar 18 
  - Feb 21 - Mar 14 lectures

- DataFest: April 1 - 3 in Penn Pavilion 
  - [Click here](https://www2.stat.duke.edu/datafest/signup.html) to sign up

---

## Mini-project 02

[Click here](https://sta310-sp22.github.io/assignments/projects/mini-project-02.html) for project instructions

- **Draft for peer review**: due Thu, Mar 24 at 12pm (noon)
- **Peer reviews**: due Thu, Mar 24 at 11:59pm
- **Presentation**: due Mon, Mar 28 at 3:30pm
- **Written report**: due Mon, Mar 28 at 11:59pm

---

## Final project

[Click here](https://sta310-sp22.github.io/assignments/projects/final-project.html) for project instructions

- **Round 1 submission (optional)**: Wednesday, April 13 at 11:59pm
- **Final submission**: Wednesday, April 27 at 11:59pm

---

## Mid-semester feedback

Thank you to everyone who filled out the mid-semester survey!

**What has helped with learning**

- Working through interpretations and code in class
- HW and projects to extend learning beyond lectures

**How can teaching team help with learning**

- More practice!
- Use some lab time for review and additional practice
]

- Engaging with the textbook
- Attending office hours
- Asking questions! 
]
  
---

## Learning goals

- Fit and interpret multilevel models

- Compare maximum likelihood (ML) and restricted maximum likelihood (REML) estimation approaches

- Understand general process for fitting and comparing multilevel models

---

## Data: Music performance anxiety

.midi[The data [`musicdata.csv`](data/musicdata.csv) come from the Sadler and Miller (2010) study of the emotional state of musicians before performances. The dataset contains information collected from 37 undergraduate music majors who completed the Positive Affect Negative Affect Schedule (PANAS), an instrument produces a measure of anxiety (negative affect) and a measure of happiness (positive affect). This analysis will focus on negative affect as a measure of performance anxiety.]

The primary variables we'll use are

- **`na`**: negative affect score on PANAS (the response variable)
- **`LargeEnsemble`**: 1: performance is large ensemble; 0: performance is solo or small ensemble
- **`Orchestra`**: 1: orchestral instrument; 0: voice or piano
- **`mpqnem`**: negative emotionality (NEM) composite scale from MPQ

---

## Look at data

.small[
<table>
 <thead>
  <tr>
   <th style="text-align:right;"> id </th>
   <th style="text-align:right;"> diary </th>
   <th style="text-align:left;"> large_ensemble </th>
   <th style="text-align:right;"> mpqnem </th>
   <th style="text-align:left;"> orchestra </th>
  </tr>
 </thead>
<tbody>
  <tr>
   <td style="text-align:right;"> 1 </td>
   <td style="text-align:right;"> 1 </td>
   <td style="text-align:left;"> 0 </td>
   <td style="text-align:right;"> 16 </td>
   <td style="text-align:left;"> 0 </td>
  </tr>
  <tr>
   <td style="text-align:right;"> 1 </td>
   <td style="text-align:right;"> 2 </td>
   <td style="text-align:left;"> 1 </td>
   <td style="text-align:right;"> 16 </td>
   <td style="text-align:left;"> 0 </td>
  </tr>
  <tr>
   <td style="text-align:right;"> 1 </td>
   <td style="text-align:right;"> 3 </td>
   <td style="text-align:left;"> 1 </td>
   <td style="text-align:right;"> 16 </td>
   <td style="text-align:left;"> 0 </td>
  </tr>
  <tr>
   <td style="text-align:right;"> 43 </td>
   <td style="text-align:right;"> 1 </td>
   <td style="text-align:left;"> 0 </td>
   <td style="text-align:right;"> 17 </td>
   <td style="text-align:left;"> 0 </td>
  </tr>
  <tr>
   <td style="text-align:right;"> 43 </td>
   <td style="text-align:right;"> 2 </td>
   <td style="text-align:left;"> 0 </td>
   <td style="text-align:right;"> 17 </td>
   <td style="text-align:left;"> 0 </td>
  </tr>
  <tr>
   <td style="text-align:right;"> 43 </td>
   <td style="text-align:right;"> 3 </td>
   <td style="text-align:left;"> 0 </td>
   <td style="text-align:right;"> 17 </td>
   <td style="text-align:left;"> 0 </td>
  </tr>
</tbody>
</table>
]

- `diary`, `na`, `large_ensemble` are **Level One** variables (performance-level variables)
- `orchestra`, `mpqnem` are **Level Two** variables (musician-level variables)

---

## The model

### Model using `Orchestra` and `LargeEnsemble` to understand variability in performance anxiety (`na`)

---

## Two-stage model

**Level One Model**

`$$na_{ij} = a_i + b_i ~ LargeEnsemble_{ij} + \epsilon_{ij}, \hspace{5mm} \epsilon_{ij} \sim N(0,\sigma^2)$$`

**Level Two Models**

`$$\begin{aligned}&a_i = \alpha_0 + \alpha_1 ~ Orchestra_i + u_i \\
&b_i = \beta_0 + \beta_1 ~ Orchestra_i + v_i\end{aligned}$$`

---

## Composite model

Plug in the equations for `$a_i$` and `$b_i$` to get the **composite model**

.eq[
`$$\begin{aligned}Y_{ij} &= (\alpha_0 + \alpha_1 ~ Orchestra_i + \beta_0 ~ LargeEnsemble_{ij} \\ 
&+ \beta_1 ~ Orchestra_i:LargeEnsemble_{ij})\\
&+ (u_i + v_i ~ LargeEnsemble_{ij} + \epsilon_{ij})\end{aligned}$$`
]

- The **fixed effects** to estimate are `$\alpha_0, \alpha_1, \beta_0, \beta_1$`
- The **error terms** are `$u_i, v_i, \epsilon_{ij}$`
  - These describe the behavior of the random effects
--

When we use the composite model, we no longer need to estimate `$a_i$` and `$b_i$` directly as we did with the two-stage approach.

---

## Notation

- Greek letters denote the fixed effect model parameters that will be estimated
  - e.g., `$\alpha_0, \alpha_1, \beta_0, \beta_1$`
  
- English letters denote the preliminary effects at lower levels that will not be estimated directly
  - e.g. `$a_i, b_i$`

- `$\sigma$` and `$\rho$` denote variance components that will be estimated

- `$\epsilon_{ij}, u_i, v_i$` denote error terms

---

## Error terms

- Assume random effects are normally distributed with mean 0 and variance that will be estimated

- **Level One**: Errors associated with each performance of a given musician are are normally distributed, i.e., `$\epsilon_{ij} \sim N(0, \sigma^2)$`

- **Level Two**: Errors are 
  - `$u_i$`: deviation of musician `$i$` from the mean performance anxiety before solos and small ensembles after accounting for the instrument (musician-to-musician differences in intercept)
  - `$v_i$`: deviance of musician `$i$` from the mean difference in performance anxiety between large ensembles and other performance types after accounting for instrument (musician-to-musician differences in slope)

- Need to account for correlation between `$u_i$` and `$v_i$` for the `$i^{th}$` musician

---

<div class="figure" style="text-align: center">
<img src="data:image/png;base64,#17-multilevel-models-pt3_files/figure-html/unnamed-chunk-3-1.png" alt="Recreated from Figure 8.11" width="70%" />
<p class="caption">Recreated from Figure 8.11</p>
</div>

---

## Distribution of Level Two errors

The Level Two error terms follow a **multivariate normal** distribution

.eq[
`$$\left[ \begin{array}{c}
            u_{i} \\ v_{i}
          \end{array}  \right] \sim N \left( \left[
          \begin{array}{c}
            0 \\ 0
          \end{array} \right], \left[
          \begin{array}{cc}
            \sigma_{u}^{2} & \rho_{uv}\sigma_{u}\sigma_v \\
            \rho_{uv}\sigma_{u}\sigma_v & \sigma_{v}^{2}
          \end{array} \right] \right)$$`
          
where `$\sigma^2_u$` and `$\sigma^2_v$` are the variances of `$u_i$`'s and `$v_i$`'s respectively, `$\rho_{uv}$` is the correlation between `$u_i$` and `$v_i$` 
]

Covariance:  `$\sigma_{uv} =  \rho_{uv}\sigma_{u}\sigma_v$`

- Indicates how those terms vary together
---

## Visualizing multivariate normal distribution

<div class="figure" style="text-align: center">
<img src="data:image/png;base64,#17-multilevel-models-pt3_files/figure-html/contour-boundary-1.png" alt="Recreated from Figure 8.12" width="60%" />
<p class="caption">Recreated from Figure 8.12</p>
</div>

---

## Fit the model in R

Fit multilevel model using the `lmer` ("linear mixed effects in R") function from the **lme4** package.

```r
library(lme4)
music_model <- lmer(na ~ orchestra + large_ensemble + 
                      orchestra:large_ensemble + (large_ensemble|id), 
                    data = music)
```

`na ~ orchestra + large_ensemble + orchestra:large_ensemble`: Represents the fixed effects + intercept

`(large_ensemble|id)`: Represents the error terms and associated variance components
  - Specifies two error terms: `$u_i$` corresponding to the intercepts, `$v_i$` corresponding to effect of large ensemble
  
---

## Tidy output

Display results using the `tidy` function from the **broom.mixed** package.

```r
library(broom.mixed)
tidy(music_model) 
```

- Get fixed effects only

`tidy(music_model) %>% filter(effect == "fixed")`

- Get errors and variance components only

`tidy(music_model) %>% filter(effect == "ran_pars")`

---

## Model output: Fixed effects

<table>
 <thead>
  <tr>
   <th style="text-align:left;"> effect </th>
   <th style="text-align:left;"> group </th>
   <th style="text-align:left;"> term </th>
   <th style="text-align:right;"> estimate </th>
   <th style="text-align:right;"> std.error </th>
   <th style="text-align:right;"> statistic </th>
  </tr>
 </thead>
<tbody>
  <tr>
   <td style="text-align:left;"> fixed </td>
   <td style="text-align:left;"> NA </td>
   <td style="text-align:left;"> (Intercept) </td>
   <td style="text-align:right;"> 15.930 </td>
   <td style="text-align:right;"> 0.641 </td>
   <td style="text-align:right;"> 24.833 </td>
  </tr>
  <tr>
   <td style="text-align:left;"> fixed </td>
   <td style="text-align:left;"> NA </td>
   <td style="text-align:left;"> orchestra1 </td>
   <td style="text-align:right;"> 1.693 </td>
   <td style="text-align:right;"> 0.945 </td>
   <td style="text-align:right;"> 1.791 </td>
  </tr>
  <tr>
   <td style="text-align:left;"> fixed </td>
   <td style="text-align:left;"> NA </td>
   <td style="text-align:left;"> large_ensemble1 </td>
   <td style="text-align:right;"> -0.911 </td>
   <td style="text-align:right;"> 0.845 </td>
   <td style="text-align:right;"> -1.077 </td>
  </tr>
  <tr>
   <td style="text-align:left;"> fixed </td>
   <td style="text-align:left;"> NA </td>
   <td style="text-align:left;"> orchestra1:large_ensemble1 </td>
   <td style="text-align:right;"> -1.424 </td>
   <td style="text-align:right;"> 1.099 </td>
   <td style="text-align:right;"> -1.295 </td>
  </tr>
</tbody>
</table>

---

## Model output: Random effects

<table>
 <thead>
  <tr>
   <th style="text-align:left;"> effect </th>
   <th style="text-align:left;"> group </th>
   <th style="text-align:left;"> term </th>
   <th style="text-align:right;"> estimate </th>
   <th style="text-align:right;"> std.error </th>
   <th style="text-align:right;"> statistic </th>
  </tr>
 </thead>
<tbody>
  <tr>
   <td style="text-align:left;"> ran_pars </td>
   <td style="text-align:left;"> id </td>
   <td style="text-align:left;"> sd__(Intercept) </td>
   <td style="text-align:right;"> 2.378 </td>
   <td style="text-align:right;"> NA </td>
   <td style="text-align:right;"> NA </td>
  </tr>
  <tr>
   <td style="text-align:left;"> ran_pars </td>
   <td style="text-align:left;"> id </td>
   <td style="text-align:left;"> cor__(Intercept).large_ensemble1 </td>
   <td style="text-align:right;"> -0.635 </td>
   <td style="text-align:right;"> NA </td>
   <td style="text-align:right;"> NA </td>
  </tr>
  <tr>
   <td style="text-align:left;"> ran_pars </td>
   <td style="text-align:left;"> id </td>
   <td style="text-align:left;"> sd__large_ensemble1 </td>
   <td style="text-align:right;"> 0.672 </td>
   <td style="text-align:right;"> NA </td>
   <td style="text-align:right;"> NA </td>
  </tr>
  <tr>
   <td style="text-align:left;"> ran_pars </td>
   <td style="text-align:left;"> Residual </td>
   <td style="text-align:left;"> sd__Observation </td>
   <td style="text-align:right;"> 4.670 </td>
   <td style="text-align:right;"> NA </td>
   <td style="text-align:right;"> NA </td>
  </tr>
</tbody>
</table>

---

## Interpret the effects

- Split into 4 groups.

- Each group will write the interpretation for one main effect and one variance component. The terms on each slide do not necessarily have a direct correspondence to each other.

- One person write the group's interpretations on the slide.

- [Click here](https://docs.google.com/presentation/d/17XvVYSG2NvhS-PJ8dwjdN8OodvmuCLE52r9-aGIJETw/edit?usp=sharing) for the slides.

---

## Fitting the model

---

## Comparing fixed effects estimates

Below are the estimated coefficients and standard errors for four modeling approaches.

<table>
<caption>from BMLR Table 8.3</caption>
 <thead>
  <tr>
   <th style="text-align:left;"> Variable </th>
   <th style="text-align:left;"> Independence </th>
   <th style="text-align:left;"> Two.Stage </th>
   <th style="text-align:left;"> LVCF </th>
   <th style="text-align:left;"> Multilevel </th>
  </tr>
 </thead>
<tbody>
  <tr>
   <td style="text-align:left;"> Intercept </td>
   <td style="text-align:left;"> 15.72(0.36) </td>
   <td style="text-align:left;"> 16.28(0.67) </td>
   <td style="text-align:left;"> 15.20(1.25) </td>
   <td style="text-align:left;"> 15.93(0.64) </td>
  </tr>
  <tr>
   <td style="text-align:left;"> Orch </td>
   <td style="text-align:left;"> 1.79(0.55) </td>
   <td style="text-align:left;"> 1.41(0.99) </td>
   <td style="text-align:left;"> 1.45(1.84) </td>
   <td style="text-align:left;"> 1.69(0.95) </td>
  </tr>
  <tr>
   <td style="text-align:left;"> Large </td>
   <td style="text-align:left;"> -0.28(0.79) </td>
   <td style="text-align:left;"> -0.77(0.85) </td>
   <td style="text-align:left;"> - </td>
   <td style="text-align:left;"> -0.91(0.85) </td>
  </tr>
  <tr>
   <td style="text-align:left;"> Orch*Large </td>
   <td style="text-align:left;"> -1.71(1.06) </td>
   <td style="text-align:left;"> -1.41(1.20) </td>
   <td style="text-align:left;"> - </td>
   <td style="text-align:left;"> -1.42(1.10) </td>
  </tr>
</tbody>
</table>

---

## ML and REML

Maximum Likelihood (ML) and Restricted (Residual) Maximum Likelihood (REML) are the two most common methods for estimating the fixed effects and variance components

**Maximum Likelihood (ML)**

- Jointly estimate the fixed effects and variance components using all the <u>sample data</u>
- Can be used to draw conclusions about fixed and random effects
- *Issue*: Fixed effects are treated as known values when estimating variance components
  - Results in biased estimates of variance components (especially when sample size is small)

---

## ML and REML

**Restricted Maximum Likelihood (REML)**

- Estimate the variance components using the <u>sample residuals</u> not the sample data
- It is conditional on the fixed effects, so it accounts for uncertainty in fixed effects estimates. This results in unbiased estimates of variance components.

.footnote[.small[See the post [Maximum Likelihood (ML) vs. REML](https://towardsdatascience.com/maximum-likelihood-ml-vs-reml-78cf79bef2cf) for details and illustration of ML vs. REML.]]

---

## ML or REML?

- Research has not determined one method absolutely superior to the other

- **REML** (`REML = TRUE`; default in `lmer`) is preferable when 
  - the number of parameters is large or primary, or 
  - primary objective is to obtain estimates of the model parameters

- **ML** (`REML = FALSE`) <u>must</u> be used if you want to compare nested fixed effects models using a likelihood ratio test (e.g., a drop-in-deviance test).
  - For REML, the goodness-of-fit and likelihood ratio tests can only be used to draw conclusions about variance components

<br>

.footnote[.small[Source: Singer, J. D. & Willett, J. B. (2003). *Applied longitudinal data analysis: Modeling change and event occurrence*. Oxford university press.]]
---

## Comparing ML and REML

---

## Strategy for building multilevel models

- Conduct exploratory data analysis for Level One and Level Two variables

- Fit model with no covariates to assess variability at each level

- Create Level One models. Start with a single term, then add terms as needed.

- Create Level Two models. Start with a single term, then add terms as needed. Start with equation for intercept term.

- Begin with the full set of variance components, then remove variance terms as needed.

*Alternate model building strategy in BMLR Section 8.6*

---

See BMLR Sections 8.6 - 8.11 for full step-by-step analysis.
---

## Acknowledgements

The content in the slides is from 
 
- [BMLR: Chapter 8 - Introduction to Multilevel Models](https://bookdown.org/roback/bookdown-BeyondMLR/ch-multilevelintro.html)
  - Sections 8.5 - 8.12
  
 - Singer, J. D. & Willett, J. B. (2003). *Applied longitudinal data analysis: Modeling change and event occurrence*. Oxford university press.