Modeling two-level longitudinal data

# Modeling two-level longitudinal data
### 03.16.22

---

## [Click here for PDF of slides](18-longitudinal.pdf)

---

## Announcements

- Quiz 03: Wed, Mar 16 at 5pm - Sun, Mar 20 at 11:59pm
  - Feb 21 - Mar 14 lectures
  - Chapters 6 - 8 of the textbook 
  
- DataFest: April 1 - 3 in Penn Pavilion 
  - [Click here](https://www2.stat.duke.edu/datafest/signup.html) to register

---

## Learning goals

- Compare maximum likelihood (ML) and restricted maximum likelihood (REML) estimation approaches

- Describe general process for fitting and comparing multilevel models

- Fit and interpret multilevel models for longitudinal data
---

## Fitting multilevel models

---

## ML and REML

Maximum Likelihood (ML) and Restricted (Residual) Maximum Likelihood (REML) are the two most common methods for estimating the fixed effects and variance components

**Maximum Likelihood (ML)**

- Jointly estimate the fixed effects and variance components using all the <u>sample data</u>
- *Issue*: Fixed effects are treated as known values when estimating variance components
  - Results in biased estimates of variance components (especially when sample size is small)

.footnote[.small[See the post [Maximum Likelihood (ML) vs. REML](https://towardsdatascience.com/maximum-likelihood-ml-vs-reml-78cf79bef2cf) for details and illustration of ML vs. REML.]]

---

## ML and REML

**Restricted Maximum Likelihood (REML)**

- Estimate the variance components using the <u>sample residuals</u>, resulting in unbiased estimates of the variance components
- .vocab2[Process:] 
  - Fit regression model of fixed effects using ordinary least squares (OLS)
  - Take the residuals from this model and estimate the variance components by maximizing the likelihood of the residuals.
  - Obtain generalized least squares (GLS) estimates for fixed effects (estimates that take into account variance components from previous step). Retain the GLS estimates of the fixed effects. *Note: The GLS and OLS estimates can be equivalent*

## ML vs. REML

Researchers have not definitively determined one method superior to the other

- Use ML estimates to compare models with different fixed effects

- Can get biased estimates of variance components when number of groups is small

.small[Source: [Hierarchical Linear Modeling with Maximum Likelihood, Restricted Maximum Likelihood, and Fully Bayesian Estimation](https://scholarworks.umass.edu/cgi/viewcontent.cgi?article=1353&context=pare) by Peter Boedeker]
]

- Default in `lmer`

- Get more accurate estimates of variance components when there are a small number of groups or large number of parameters

- Can only compare models that have the same fixed effects and only differ in the variance components
]

---

## Comparing ML and REML estimates

Below are estimates for the fixed effects and variance components for the [music model](https://sta310-sp22.github.io/slides/17-multilevel-models-pt3/17-multilevel-models-pt3.html#12) from the previous lecture.

---

## Modeling two-level longitudinal data

---

## Data: Charter schools in MN

.midi[The data set [`charter-long.csv`](data/charter-long.csv) contains standardized test scores and demographic information for schools in Minneapolis, MN from 2008 to 2010. The data were collected by the Minnesota Department of Education. Understanding the effectiveness of charter schools is of particular interest, since they often incorporate unique methods of instruction and learning that differ from public schools.]

- **`MathAvgScore`**: Average MCA-II score for all 6th grade students in a school (response variable)
- **`urban`**: urban (1) or rural (0) location school location
- **`charter`**: charter school (1) or a non-charter public school (0)
- **`schPctfree`**: proportion of students who receive free or reduced lunches in a school (based on 2010 figures).
- **`year08`**: Years since 2008

---

## Data

```r
charter <- read_csv("data/charter-long.csv")
```

<table>
 <thead>
  <tr>
   <th style="text-align:left;"> schoolName </th>
   <th style="text-align:right;"> year08 </th>
   <th style="text-align:right;"> urban </th>
   <th style="text-align:right;"> charter </th>
   <th style="text-align:right;"> schPctfree </th>
   <th style="text-align:right;"> MathAvgScore </th>
  </tr>
 </thead>
<tbody>
  <tr>
   <td style="text-align:left;"> RIPPLESIDE ELEMENTARY </td>
   <td style="text-align:right;"> 0 </td>
   <td style="text-align:right;"> 0 </td>
   <td style="text-align:right;"> 0 </td>
   <td style="text-align:right;"> 0.363 </td>
   <td style="text-align:right;"> 652.8 </td>
  </tr>
  <tr>
   <td style="text-align:left;"> RIPPLESIDE ELEMENTARY </td>
   <td style="text-align:right;"> 1 </td>
   <td style="text-align:right;"> 0 </td>
   <td style="text-align:right;"> 0 </td>
   <td style="text-align:right;"> 0.363 </td>
   <td style="text-align:right;"> 656.6 </td>
  </tr>
  <tr>
   <td style="text-align:left;"> RIPPLESIDE ELEMENTARY </td>
   <td style="text-align:right;"> 2 </td>
   <td style="text-align:right;"> 0 </td>
   <td style="text-align:right;"> 0 </td>
   <td style="text-align:right;"> 0.363 </td>
   <td style="text-align:right;"> 652.6 </td>
  </tr>
  <tr>
   <td style="text-align:left;"> RICHARD ALLEN MATH&amp;SCIENCE ACADEMY </td>
   <td style="text-align:right;"> 0 </td>
   <td style="text-align:right;"> 1 </td>
   <td style="text-align:right;"> 1 </td>
   <td style="text-align:right;"> 0.545 </td>
   <td style="text-align:right;"> NA </td>
  </tr>
  <tr>
   <td style="text-align:left;"> RICHARD ALLEN MATH&amp;SCIENCE ACADEMY </td>
   <td style="text-align:right;"> 1 </td>
   <td style="text-align:right;"> 1 </td>
   <td style="text-align:right;"> 1 </td>
   <td style="text-align:right;"> 0.545 </td>
   <td style="text-align:right;"> NA </td>
  </tr>
  <tr>
   <td style="text-align:left;"> RICHARD ALLEN MATH&amp;SCIENCE ACADEMY </td>
   <td style="text-align:right;"> 2 </td>
   <td style="text-align:right;"> 1 </td>
   <td style="text-align:right;"> 1 </td>
   <td style="text-align:right;"> 0.545 </td>
   <td style="text-align:right;"> 631.2 </td>
  </tr>
</tbody>
</table>

---

## Assess missingness

Missing data is common in longitudinal data. Before starting the analysis, it is important to understand the missing data patterns. Use the `skim` function from the **skimr** R package to get a quick view of the missingness.

```r
library(skimr)
charter %>% skim() %>% select(skim_variable, n_missing, complete_rate)
```

```
## # A tibble: 9 × 3
##   skim_variable n_missing complete_rate
##   <chr>             <int>         <dbl>
## 1 schoolid              0         1    
## 2 schoolName            0         1    
## 3 urban                 0         1    
## 4 charter               0         1    
## 5 schPctnonw            0         1    
## 6 schPctsped            0         1    
## 7 schPctfree            0         1    
## 8 year08                0         1    
## 9 MathAvgScore        121         0.935
```
]

---

## Closer look at missing data pattern
.panelset[
.panel[.panel-name[Output]

<table>
 <thead>
  <tr>
   <th style="text-align:right;"> MathAvgScore0_miss </th>
   <th style="text-align:right;"> MathAvgScore1_miss </th>
   <th style="text-align:right;"> MathAvgScore2_miss </th>
   <th style="text-align:right;"> n </th>
  </tr>
 </thead>
<tbody>
  <tr>
   <td style="text-align:right;"> 0 </td>
   <td style="text-align:right;"> 0 </td>
   <td style="text-align:right;"> 0 </td>
   <td style="text-align:right;"> 540 </td>
  </tr>
  <tr>
   <td style="text-align:right;"> 0 </td>
   <td style="text-align:right;"> 0 </td>
   <td style="text-align:right;"> 1 </td>
   <td style="text-align:right;"> 6 </td>
  </tr>
  <tr>
   <td style="text-align:right;"> 0 </td>
   <td style="text-align:right;"> 1 </td>
   <td style="text-align:right;"> 0 </td>
   <td style="text-align:right;"> 4 </td>
  </tr>
  <tr>
   <td style="text-align:right;"> 0 </td>
   <td style="text-align:right;"> 1 </td>
   <td style="text-align:right;"> 1 </td>
   <td style="text-align:right;"> 7 </td>
  </tr>
  <tr>
   <td style="text-align:right;"> 1 </td>
   <td style="text-align:right;"> 0 </td>
   <td style="text-align:right;"> 0 </td>
   <td style="text-align:right;"> 25 </td>
  </tr>
  <tr>
   <td style="text-align:right;"> 1 </td>
   <td style="text-align:right;"> 0 </td>
   <td style="text-align:right;"> 1 </td>
   <td style="text-align:right;"> 1 </td>
  </tr>
  <tr>
   <td style="text-align:right;"> 1 </td>
   <td style="text-align:right;"> 1 </td>
   <td style="text-align:right;"> 0 </td>
   <td style="text-align:right;"> 35 </td>
  </tr>
</tbody>
</table>
]

```r
charter %>%
  select(schoolid, schoolName, year08, MathAvgScore) %>%
  pivot_wider(id_cols = c(schoolid, schoolName), names_from = year08,
              names_prefix = "MathAvgScore.", values_from = MathAvgScore) %>% 
  mutate(MathAvgScore0_miss = if_else(is.na(MathAvgScore.0), 1, 0),
         MathAvgScore1_miss = if_else(is.na(MathAvgScore.1), 1, 0),
         MathAvgScore2_miss = if_else(is.na(MathAvgScore.2), 1, 0)) %>% 
  count(MathAvgScore0_miss, MathAvgScore1_miss, MathAvgScore2_miss) %>% 
  kable()
```

]
]

---

## Dealing with missing data

- **Complete case analysis**: Only include schools with complete data for all three years. 
  - Would remove 12.6% of observations in this data.

- **Last observation carried forward**: Keep the last observation from each group (school) and conduct analysis for independent observations.

- **Impute missing observations**: "Fill in" values of missing observations using the typical observed trends from groups with similar covariates.

--
- **Apply multilevel methods**: Estimate patterns using available data recognizing that trends for groups with complete data are more precise than for those with fewer measurements. This is under the condition that the probability of missingness does not depend on unobserved predictors or the response.

---

---

## Strategy for building multilevel models

- Conduct exploratory data analysis for Level One and Level Two variables.

- Fit model with no covariates to assess variability at each level.

- Create Level One models. Start with a single term, then add terms as needed.

- Create Level Two models. Start with a single term, then add terms as needed. Start with equation for intercept term.

- Begin with the full set of variance components, then remove variance terms as needed.

*Alternate model building strategies in BMLR Section 8.6*

---

## Exploratory data analysis

Given the longitudinal structure of the data, we are able to answer questions at two levels

- **Level One (within school)**: How did average math scores for a given school change over time?

- **Level Two (between schools)**: What is the effect of school-specific covariates on the average math scores in 2008 and the rate of change from 2008 to 2010?

We can conduct exploratory data analysis at both levels, e.g.,

- Univariate and bivariate EDA
- Lattice plots
- Spaghetti plots

---

See BMLR Section 9.3 for full exploratory data analysis.

---

## Unconditional means model

Start with the **unconditional means model**, a model with no covariates at any level. This is also called the random intercepts model.

**Level One** : `$Y_{ij} = a_{i} + \epsilon_{ij}, \hspace{5mm} \epsilon_{ij} \sim N(0, \sigma^2)$`

**Level Two**: `$a_i = \alpha_0 + u_i, \hspace{5mm} u_{i} \sim N(0, \sigma^2_u)$`

---

## Intraclass correlation

The **intraclass correlation** is the relative variability between groups

`$$\hat{\rho} = \frac{\text{Between variability}}{\text{Total variability}} = \frac{\hat{\sigma}_u^2}{\hat{\sigma}^2_u + \hat{\sigma}^2}$$`
- What is the meaning of `$\hat{\rho}$` close to 0?

- What is the meaning of `$\hat{\rho}$` close to 1?

---

## Unconditional growth model

A next step in the model building is the **unconditional growth model**, a  model with Level One predictors but no Level Two predictors.

**Level One**: `$Y_{ij} = a_i + b_iYear08_{ij} + \epsilon_{ij}$`

**Level Two**: 
  - `$a_i = \alpha_0 + u_i$`
  - `$b_i = \beta_0 + v_i$`

.question[
- Write the composite model.
- What can we learn from this model? 
- Fit the unconditional growth model.
]
---

## Pseudo `$R^2$`

We can use **Pseudo R<sup>2</sup>** to explain changes in variance components between two models

- **Note**: This should only be used when the definition of the variance component is the same between the two models

`$$\text{Pseudo }R^2 = \frac{\hat{\sigma}^2(\text{Model 1})  -  \hat{\sigma}^2(\text{Model 2})}{\hat{\sigma}^2(\text{Model 1})}$$`

.question[
Calculate the `$\text{Pseudo }R^2$` to estimate the change of within school variance between the unconditional means and unconditional growth models.
]

---

## Model with school-level covariates

Fit a model with school-level covariates that takes the following form:

**Level One**

`\begin{equation*}
Y_{ij}= a_{i} + b_{i}Year08_{ij} + \epsilon_{ij}
\end{equation*}`

**Level Two**

`\begin{align*}
a_{i} & = \alpha_{0} + \alpha_{1}Charter_i + \alpha_{2}urban_i + \alpha_{3}schpctfree_i + u_{i} \\
b_{i} & = \beta_{0} + \beta_{1}Charter_i + \beta_{2}urban_i + v_{i}
\end{align*}`

.question[
- Write out the composite model. 
- Fit the model in R. 
- Use the model to describe how the average math scores differed between charter and non-charter schools.

]

---

## Acknowledgements

The content in the slides is from 
 
- [BMLR: Chapter 9 - Two-level Longitudinal Data](https://bookdown.org/roback/bookdown-BeyondMLR/ch-lon.html)
  - Sections 9.1 - 9.6.3

- [Hierarchical Linear Modeling with Maximum Likelihood, Restricted Maximum Likelihood, and Fully Bayesian Estimation](https://scholarworks.umass.edu/cgi/viewcontent.cgi?article=1353&context=pare) by Peter Boedeker

- *Applied longitudinal data analysis: Modeling change and event occurrence*. Oxford university press. by J.D. Singer and J.B. Willett 
  - Online copy available through Duke library