Multilevel Generalized Linear Models

# Multilevel Generalized Linear Models
### 04.11.22

---

## [Click here for PDF of slides](25-multilevel-glms.pdf)

---

## Announcements

- Quiz 04 open **Tue, Apr 12 - Thu, Apr 14**

- Final project - optional draft due 
  - **Fri, Apr 15** 
  - final report due **Wed, Apr 27**

- Please fill out course evaluations!

- [Click here](https://sta310-sp22.github.io/slides/supplemental-notes/multilevel-model-questions.html) for answers to questions about multilevel models submitted on Quiz 03. Thanks to Jose for putting this document together!

---

## Learning goals

- Exploratory data analysis for multilevel data with non-normal response variable

- Use a two-stage modeling approach to understand conclusions at each level 
- Write Level One, Level Two and composite models for mutlilevel GLM

- Fit and interpret multilevel GLM

---

## Data: College Basketball referees

The dataset [`basketball0910.csv`](data/basketball0910.csv) contains data on 4972 fouls in 340 NCAA basketball games from the Big Ten, ACC, and Big East conferences during the 2009-2010 season. The goal is to determine whether the data from this season support a conclusion from [Anderson and Pierce (2009)](https://www.tandfonline.com/doi/abs/10.1080/02640410902729733) that referees tend to "even out" foul calls in a game. The variables we'll focus on are

- **`foul.home`**: foul was called on home team (1: yes, 0: no)
- **`foul.diff`**: difference in fouls before current foul was called (home - visitor)
- **`game`**: Unique game ID number
- **`visitor`**: visiting team abbreviation 
- **`home`**: home team abbreviation

See [BMLR: Section 11.3.1](https://bookdown.org/roback/bookdown-BeyondMLR/ch-GLMM.html#explore-glmm) for full codebook.

---

## Data: College basketball referees

<table>
 <thead>
  <tr>
   <th style="text-align:right;"> game </th>
   <th style="text-align:left;"> visitor </th>
   <th style="text-align:left;"> hometeam </th>
   <th style="text-align:right;"> foul.num </th>
   <th style="text-align:right;"> foul.home </th>
   <th style="text-align:right;"> foul.vis </th>
   <th style="text-align:right;"> foul.diff </th>
   <th style="text-align:left;"> foul.type </th>
   <th style="text-align:right;"> time </th>
  </tr>
 </thead>
<tbody>
  <tr>
   <td style="text-align:right;"> 1 </td>
   <td style="text-align:left;"> IA </td>
   <td style="text-align:left;"> MN </td>
   <td style="text-align:right;"> 1 </td>
   <td style="text-align:right;"> 0 </td>
   <td style="text-align:right;"> 1 </td>
   <td style="text-align:right;"> 0 </td>
   <td style="text-align:left;"> Personal </td>
   <td style="text-align:right;"> 14.167 </td>
  </tr>
  <tr>
   <td style="text-align:right;"> 1 </td>
   <td style="text-align:left;"> IA </td>
   <td style="text-align:left;"> MN </td>
   <td style="text-align:right;"> 2 </td>
   <td style="text-align:right;"> 1 </td>
   <td style="text-align:right;"> 0 </td>
   <td style="text-align:right;"> -1 </td>
   <td style="text-align:left;"> Personal </td>
   <td style="text-align:right;"> 11.433 </td>
  </tr>
  <tr>
   <td style="text-align:right;"> 1 </td>
   <td style="text-align:left;"> IA </td>
   <td style="text-align:left;"> MN </td>
   <td style="text-align:right;"> 3 </td>
   <td style="text-align:right;"> 1 </td>
   <td style="text-align:right;"> 0 </td>
   <td style="text-align:right;"> 0 </td>
   <td style="text-align:left;"> Personal </td>
   <td style="text-align:right;"> 10.233 </td>
  </tr>
  <tr>
   <td style="text-align:right;"> 1 </td>
   <td style="text-align:left;"> IA </td>
   <td style="text-align:left;"> MN </td>
   <td style="text-align:right;"> 4 </td>
   <td style="text-align:right;"> 0 </td>
   <td style="text-align:right;"> 1 </td>
   <td style="text-align:right;"> 1 </td>
   <td style="text-align:left;"> Personal </td>
   <td style="text-align:right;"> 9.733 </td>
  </tr>
  <tr>
   <td style="text-align:right;"> 1 </td>
   <td style="text-align:left;"> IA </td>
   <td style="text-align:left;"> MN </td>
   <td style="text-align:right;"> 5 </td>
   <td style="text-align:right;"> 0 </td>
   <td style="text-align:right;"> 1 </td>
   <td style="text-align:right;"> 0 </td>
   <td style="text-align:left;"> Shooting </td>
   <td style="text-align:right;"> 7.767 </td>
  </tr>
  <tr>
   <td style="text-align:right;"> 1 </td>
   <td style="text-align:left;"> IA </td>
   <td style="text-align:left;"> MN </td>
   <td style="text-align:right;"> 6 </td>
   <td style="text-align:right;"> 0 </td>
   <td style="text-align:right;"> 1 </td>
   <td style="text-align:right;"> -1 </td>
   <td style="text-align:left;"> Shooting </td>
   <td style="text-align:right;"> 5.567 </td>
  </tr>
  <tr>
   <td style="text-align:right;"> 1 </td>
   <td style="text-align:left;"> IA </td>
   <td style="text-align:left;"> MN </td>
   <td style="text-align:right;"> 7 </td>
   <td style="text-align:right;"> 1 </td>
   <td style="text-align:right;"> 0 </td>
   <td style="text-align:right;"> -2 </td>
   <td style="text-align:left;"> Shooting </td>
   <td style="text-align:right;"> 2.433 </td>
  </tr>
  <tr>
   <td style="text-align:right;"> 1 </td>
   <td style="text-align:left;"> IA </td>
   <td style="text-align:left;"> MN </td>
   <td style="text-align:right;"> 8 </td>
   <td style="text-align:right;"> 1 </td>
   <td style="text-align:right;"> 0 </td>
   <td style="text-align:right;"> -1 </td>
   <td style="text-align:left;"> Offensive </td>
   <td style="text-align:right;"> 1.000 </td>
  </tr>
  <tr>
   <td style="text-align:right;"> 2 </td>
   <td style="text-align:left;"> MI </td>
   <td style="text-align:left;"> MIST </td>
   <td style="text-align:right;"> 1 </td>
   <td style="text-align:right;"> 0 </td>
   <td style="text-align:right;"> 1 </td>
   <td style="text-align:right;"> 0 </td>
   <td style="text-align:left;"> Shooting </td>
   <td style="text-align:right;"> 18.983 </td>
  </tr>
  <tr>
   <td style="text-align:right;"> 2 </td>
   <td style="text-align:left;"> MI </td>
   <td style="text-align:left;"> MIST </td>
   <td style="text-align:right;"> 2 </td>
   <td style="text-align:right;"> 1 </td>
   <td style="text-align:right;"> 0 </td>
   <td style="text-align:right;"> -1 </td>
   <td style="text-align:left;"> Personal </td>
   <td style="text-align:right;"> 17.200 </td>
  </tr>
</tbody>
</table>

---

## Modeling approach

- `foul.home` is a binary response variable (1: foul called on home team, 0: foul called on visiting team)
    
    ✅ Use a generalized linear model, specifically one with the logit link function

- Data has a multilevel structure
  - Covariates at individual foul level and game level

✅ Use a multilevel model 
    
We will combine these and fit a **multilevel generalized linear model** with a logit link function

---

## Exploratory data analysis

---

## Exploratory data analysis

**Univariate**

- Visualizations and summary statistics for Level One and Level Two variables

**Bivariate**

- Segmented bar plots or mosaic plots for response vs. categorical predictors

- Conditional density plot for response vs. quantitative predictors

- Empirical logit plot for quantitative predictors 
  - Is relationship reasonably linear? 
  
---

## Complete *Univariate EDA* in `lecture-25.Rmd`

---

## Complete *Bivariate EDA* in `lecture-25.Rmd`

---

## Logistic regression

Start with a logistic regression model treating all observations as independent

Quick analysis to get initial intuition about research question sand important variables, not be used for final conclusions

<table>
 <thead>
  <tr>
   <th style="text-align:left;"> term </th>
   <th style="text-align:right;"> estimate </th>
   <th style="text-align:right;"> std.error </th>
   <th style="text-align:right;"> statistic </th>
   <th style="text-align:right;"> p.value </th>
  </tr>
 </thead>
<tbody>
  <tr>
   <td style="text-align:left;"> (Intercept) </td>
   <td style="text-align:right;"> -0.103 </td>
   <td style="text-align:right;"> 0.101 </td>
   <td style="text-align:right;"> -1.023 </td>
   <td style="text-align:right;"> 0.306 </td>
  </tr>
  <tr>
   <td style="text-align:left;"> foul.diff </td>
   <td style="text-align:right;"> -0.077 </td>
   <td style="text-align:right;"> 0.025 </td>
   <td style="text-align:right;"> -3.078 </td>
   <td style="text-align:right;"> 0.002 </td>
  </tr>
  <tr>
   <td style="text-align:left;"> score.diff </td>
   <td style="text-align:right;"> 0.020 </td>
   <td style="text-align:right;"> 0.007 </td>
   <td style="text-align:right;"> 3.012 </td>
   <td style="text-align:right;"> 0.003 </td>
  </tr>
  <tr>
   <td style="text-align:left;"> lead.home </td>
   <td style="text-align:right;"> -0.093 </td>
   <td style="text-align:right;"> 0.160 </td>
   <td style="text-align:right;"> -0.582 </td>
   <td style="text-align:right;"> 0.560 </td>
  </tr>
  <tr>
   <td style="text-align:left;"> time </td>
   <td style="text-align:right;"> -0.013 </td>
   <td style="text-align:right;"> 0.008 </td>
   <td style="text-align:right;"> -1.653 </td>
   <td style="text-align:right;"> 0.098 </td>
  </tr>
  <tr>
   <td style="text-align:left;"> foul.diff:time </td>
   <td style="text-align:right;"> -0.007 </td>
   <td style="text-align:right;"> 0.003 </td>
   <td style="text-align:right;"> -2.485 </td>
   <td style="text-align:right;"> 0.013 </td>
  </tr>
  <tr>
   <td style="text-align:left;"> lead.home:time </td>
   <td style="text-align:right;"> 0.022 </td>
   <td style="text-align:right;"> 0.011 </td>
   <td style="text-align:right;"> 1.957 </td>
   <td style="text-align:right;"> 0.050 </td>
  </tr>
</tbody>
</table>

---

## Two-stage modeling

---

## Two-stage modeling

For now, let's fit a model focusing on the question: **Do the data provide evidence that referees tend to "even out" fouls?**

To explore this, we will fit a model with the Level One predictor `foul.diff` using a two-stage modeling approach.

**Two-stage modeling approach**

- Fit a separate model of the association between `foul.diff` and `foul.home` for each game (Level One models)

- Fit models to explain game-to-game variability in the estimated slopes and intercepts (Level Two models)

---

## Model set up

- `$Y_{ij}$`: 1 if `$j^{th}$` foul in Game `$i$` was called on home team; 0 otherwise. 
- `$p_{ij}$`: True probability the `$j^{th}$` foul in Game `$i$` was called on home team

`$$Y_{ij} \sim \text{Bernoulli}(p_{ij})$$`
---

## Model Set Up

**Level One models**

`$$\log\Big(\frac{p_{ij}}{1 - p_{ij}}\Big) = a_i + b_i ~ \text{foul.diff}_{ij}$$`
**Level Two models**

`$$a_i = \alpha_0 + u_i\hspace{20mm} b_i = \beta_0 + v_i$$`

`$$\left[ \begin{array}{c}
            u_i \\ v_i
          \end{array}  \right] \sim N \left( \left[
          \begin{array}{c}
            0 \\ 0
          \end{array} \right], \left[
          \begin{array}{cc}
            \sigma_u^{2} & \sigma_{uv} \\
            \sigma_{uv} & \sigma_{v}^{2}
          \end{array} \right] \right)$$`

---

## Level One Models

.pull-left[
<img src="data:image/png;base64,#25-multilevel-glms_files/figure-html/unnamed-chunk-7-1.png" width="70%" style="display: block; margin: auto;" /><table>
 <thead>
  <tr>
   <th style="text-align:right;"> alpha0 </th>
   <th style="text-align:right;"> sigma2_u </th>
  </tr>
 </thead>
<tbody>
  <tr>
   <td style="text-align:right;"> -0.301 </td>
   <td style="text-align:right;"> 128.858 </td>
  </tr>
</tbody>
</table>
]

.pull-right[
<img src="data:image/png;base64,#25-multilevel-glms_files/figure-html/unnamed-chunk-8-1.png" width="70%" style="display: block; margin: auto;" /><table>
 <thead>
  <tr>
   <th style="text-align:right;"> beta0 </th>
   <th style="text-align:right;"> sigma2_v </th>
  </tr>
 </thead>
<tbody>
  <tr>
   <td style="text-align:right;"> -3.328 </td>
   <td style="text-align:right;"> 60.049 </td>
  </tr>
</tbody>
</table>
]

---

## Number of fouls per game

⚠️ In the two-stage approach, games with 3 fouls are treated with equal weight as games with 20 + fouls

---

## Unified multilevel model

---

## Composite model

**Composite model**

.eq[
`$$\log\Big(\frac{p_{ij}}{1 - p_{ij}}\Big) = \alpha_0 + \beta_0 ~ \text{foul.diff}_{ij} + [u_i + v_i ~ \text{foul.diff}_{ij}]$$`
`$$\left[ \begin{array}{c}
            u_i \\ v_i
          \end{array}  \right] \sim N \left( \left[
          \begin{array}{c}
            0 \\ 0
          \end{array} \right], \left[
          \begin{array}{cc}
            \sigma_u^{2} & \sigma_{uv} \\
            \sigma_{uv} & \sigma_{v}^{2}
          \end{array} \right] \right)$$`
          
]

---

## Fit the model in R

Use the `glmer` function in the **lme4** package to fit multilevel GLMs.

```r
model1 <- glmer(foul.home ~ foul.diff + (foul.diff|game), 
                data = basketball, family = binomial)
```

```
## boundary (singular) fit: see help('isSingular')
```

<table>
 <thead>
  <tr>
   <th style="text-align:left;"> effect </th>
   <th style="text-align:left;"> group </th>
   <th style="text-align:left;"> term </th>
   <th style="text-align:right;"> estimate </th>
   <th style="text-align:right;"> std.error </th>
   <th style="text-align:right;"> statistic </th>
   <th style="text-align:right;"> p.value </th>
  </tr>
 </thead>
<tbody>
  <tr>
   <td style="text-align:left;"> fixed </td>
   <td style="text-align:left;"> NA </td>
   <td style="text-align:left;"> (Intercept) </td>
   <td style="text-align:right;"> -0.157 </td>
   <td style="text-align:right;"> 0.046 </td>
   <td style="text-align:right;"> -3.382 </td>
   <td style="text-align:right;"> 0.001 </td>
  </tr>
  <tr>
   <td style="text-align:left;"> fixed </td>
   <td style="text-align:left;"> NA </td>
   <td style="text-align:left;"> foul.diff </td>
   <td style="text-align:right;"> -0.285 </td>
   <td style="text-align:right;"> 0.038 </td>
   <td style="text-align:right;"> -7.440 </td>
   <td style="text-align:right;"> 0.000 </td>
  </tr>
  <tr>
   <td style="text-align:left;"> ran_pars </td>
   <td style="text-align:left;"> game </td>
   <td style="text-align:left;"> sd__(Intercept) </td>
   <td style="text-align:right;"> 0.542 </td>
   <td style="text-align:right;"> NA </td>
   <td style="text-align:right;"> NA </td>
   <td style="text-align:right;"> NA </td>
  </tr>
  <tr>
   <td style="text-align:left;"> ran_pars </td>
   <td style="text-align:left;"> game </td>
   <td style="text-align:left;"> cor__(Intercept).foul.diff </td>
   <td style="text-align:right;"> -1.000 </td>
   <td style="text-align:right;"> NA </td>
   <td style="text-align:right;"> NA </td>
   <td style="text-align:right;"> NA </td>
  </tr>
  <tr>
   <td style="text-align:left;"> ran_pars </td>
   <td style="text-align:left;"> game </td>
   <td style="text-align:left;"> sd__foul.diff </td>
   <td style="text-align:right;"> 0.035 </td>
   <td style="text-align:right;"> NA </td>
   <td style="text-align:right;"> NA </td>
   <td style="text-align:right;"> NA </td>
  </tr>
</tbody>
</table>

---

## Boundary constraints

- The estimates of the parameters `$\alpha_0, \beta_0, \sigma_u, \sigma_v, \rho_{uv}$` are those that maximize the likelihood of observing the data

- The fixed effects, e.g., `$\alpha_0$` and `$\beta_0$`, can take any values, but the terms associated with the error terms are constrained to a set of "allowable" values

`$$\sigma_u \geq 0 \hspace{10mm} \sigma_v \geq 0 \hspace{10mm} -1 \leq \rho_{uv} \leq 1$$`

- Because of these boundaries, a "constrained" search is used to find the MLEs.

- The error message `"## boundary (singular) fit"`, means the estimate of one or more terms was set at the maximum (or minimum) allowable value, not the value it would've been if an unconstrained search were allowable

---

## Illustrating boundary constraints

Contour plots from a hypothetical likelihood `$L(\beta_0, \sigma^2)$`

<br>

<div class="figure" style="text-align: center">
<img src="data:image/png;base64,#25-multilevel-glms_files/figure-html/boundary-1.png" alt="From BMLR Figure 10.14" width="100%" />
<p class="caption">From BMLR Figure 10.14</p>
</div>
]

- In the unconstrained search, the likelihood `$L(\beta_0, \sigma^2)$` is maximized at `$\hat{\beta}_0 = 4, \hat{\sigma}^2 = -2$`

- In reality `$\sigma^2$` must be non-negative, so the search for the MLE is restricted to the region such that `$\sigma^2 \geq 0$`.

- The constrained likelihood is maximized at `$\hat{\beta}_0 = 4, \hat{\sigma}^2 = 0$`
]

---

## Addressing boundary constraints

Address boundary constraints by reparameterizing the model

- Remove variance and/or correlation terms estimated at the boundary

- Reduce the number of parameters to be estimated by fixing the values of some parameters

- Apply transformation to covariates, such as centering variables, standardizing variables, or changing units. Extreme values, outliers, or highly correlated covariates can sometimes cause issues with MLEs.

---

## Is it OK to use model that faces boundary constraints?

Best to try to deal with boundary constraints, but you can sometimes leave the model as is if...

- You're not interested in estimating parameters with boundary issues

- Removing the parameters does not impact conclusions about the variables of interest 
---

**Original model**

.question[
1. Which term(s) should we remove to try to address boundary constraint? 
2. Refit the model with these terms removed. 
3. Which model do you choose? Explain. 
]

---

## Interpret model coefficients

Interpret the following coefficients in the selected model (if applicable):

- `$\hat{\alpha}_0$`

- `$\hat{\beta}_0$`

- `$\hat{\sigma}_u$`

- `$\hat{\sigma}_v$`

- `$\hat{\rho}_{uv}$`

---

## Looking ahead

So far we've only considered random effects within a nested structure, but sometimes we may want to consider random effects that aren't nested.

- For example, we want to consider a random effect for the home team, but home team is not nested within game (since teams can be the home team for multiple games)

- We will deal with this using **crossed random effects**

---

## Acknowledgements

- [BMLR: Section 10.5 - Encountering boundary constraints](https://bookdown.org/roback/bookdown-BeyondMLR/ch-3level.html#sec:boundary)

- [Chapter 11: Multilevel generalized linear models  structure](https://bookdown.org/roback/bookdown-BeyondMLR/ch-GLMM.html) 
  - Sections 11.1 - 11.4
  
- Anderson, Kyle J., and David A. Pierce. 2009. “Officiating Bias: The Effect of Foul Differential on Foul Calls in NCAA Basketball.” Journal of Sports Sciences 27 (7): 687–94. https://doi.org/10.1080/02640410902729733.