Poisson Regression

# Poisson Regression
### Prof. Maria Tackett
### 01.26.22

---

##[Click for PDF of slides](06-poisson-pt1.pdf)

---

## Announcements

- Week 03 & 04 reading: 
  - [BMLR: Chapter 4 - Poisson Regression](https://bookdown.org/roback/bookdown-BeyondMLR/ch-poissonreg.html)

- Quiz 01 due Thu, Jan 27 at 3:30pm (start of lab)

- [Mini-project 01: Analysis plan](https://sta310-sp22.github.io/assignments/projects/mini-project-01.html#Analysis_plan) due Thu, Jan 27 at 11:59pm for feedback (optional)

---

## Learning goals

- Describe properties of the Poisson random variable

- Write the Poisson regression model

- Describe how the Poisson regression differs from least-squares regression

- Interpret the coefficients for the Poisson regression model

- Compare two Poisson regression models

---

## Scenarios to use Poisson regression

- Does the number of employers conducting on-campus interviews during a year differ for public and private colleges?

- Does the daily number of asthma-related visits to an Emergency Room differ depending on air pollution indices?

- Does the number of paint defects per square foot of wall differ based on the years of experience of the painter?

---

## Scenarios to use Poisson regression

- Does the .vocab[number of employers conducting on-campus interviews during a year] differ for public and private colleges?

- Does the .vocab[daily number of asthma-related visits to an Emergency Room] differ depending on air pollution indices?

- Does the .vocab[number of paint defects per square foot of wall] differ based on the years of experience of the painter?

<br>

Each response variable is a .vocab[count per a unit of time or space].

---

## Poisson distribution

Let `$Y$` be the number of events in a given unit of time or space. Then `$Y$` can be modeled using a .vocab[Poisson distribution]

`$$P(Y=y) = \frac{e^{-\lambda}\lambda^y}{y!} \hspace{10mm} y=0,1,2,\ldots, \infty$$`

- `$E(Y) = Var(Y) = \lambda$` 
- The distribution is typically skewed right, particularly if `$\lambda$` is small
- The distribution becomes more symmetric as `$\lambda$` increases
  - If `$\lambda$` is sufficiently large, it can be approximated using a normal distribution ([Click here](https://online.stat.psu.edu/stat414/lesson/28/28.2) for an example.)
  
---

<table>
 <thead>
  <tr>
   <th style="text-align:left;">   </th>
   <th style="text-align:right;"> Mean </th>
   <th style="text-align:right;"> Variance </th>
  </tr>
 </thead>
<tbody>
  <tr>
   <td style="text-align:left;"> lambda = 1 </td>
   <td style="text-align:right;"> 0.99351 </td>
   <td style="text-align:right;"> 0.9902178 </td>
  </tr>
  <tr>
   <td style="text-align:left;"> lambda = 5 </td>
   <td style="text-align:right;"> 4.99367 </td>
   <td style="text-align:right;"> 4.9865798 </td>
  </tr>
  <tr>
   <td style="text-align:left;"> lambda = 50 </td>
   <td style="text-align:right;"> 49.99288 </td>
   <td style="text-align:right;"> 49.8962683 </td>
  </tr>
</tbody>
</table>
  
---

## Example

The annual number of earthquakes registering at least 2.5 on the Richter Scale and having an epicenter within 40 miles of downtown Memphis follows a Poisson distribution with mean 6.5. **What is the probability there will be at 3 or fewer such earthquakes next year?**

`$$P(Y \leq 3) = P(Y = 0) + P(Y = 1) + P(Y = 2) + P(Y = 3)$$`

$$ = \frac{e^{-6.5}6.5^0}{0!} + \frac{e^{-6.5}6.5^1}{1!} + \frac{e^{-6.5}6.5^2}{2!} + \frac{e^{-6.5}6.5^3}{3!}$$
--

$$ = 0.112$$

```r
ppois(3, 6.5)
```

```
## [1] 0.1118496
```

<br>

.footnote[Example adapted from [Introduction to Probability Theory Example 28-2](https://online.stat.psu.edu/stat414/lesson/28/28.2)]

---

## Poisson regression

---

## The data: Household size in the Philippines

The data [fHH1.csv](data/fHH1.csv) come from the 2015 Family Income and Expenditure Survey conducted by the Philippine Statistics Authority.

**Goal**: Understand the association between household size and various characteristics of the household

**Response**: 
- `total`: Number of people in the household other than the head

.left[
**Predictors**: 
- `location`: Where the house is located
- `age`: Age of the head of household
- `roof`: Type of roof on the residence (proxy for wealth)
]

---

## The data

```r
hh_data <- read_csv("data/fHH1.csv")
hh_data %>% slice(1:5) %>% kable()
```

|location     | age| total| numLT5|roof                          |
|:------------|---:|-----:|------:|:-----------------------------|
|CentralLuzon |  65|     0|      0|Predominantly Strong Material |
|MetroManila  |  75|     3|      0|Predominantly Strong Material |
|DavaoRegion  |  54|     4|      0|Predominantly Strong Material |
|Visayas      |  49|     3|      0|Predominantly Strong Material |
|MetroManila  |  74|     3|      0|Predominantly Strong Material |

---

## Response variable

.pull-left[
<img src="data:image/png;base64,#06-poisson-pt1_files/figure-html/unnamed-chunk-6-1.png" width="100%" style="display: block; margin: auto;" />
]

|  mean|   var|
|-----:|-----:|
| 3.685| 5.534|
]
---

## Why the least-squares model doesn't work

The goal is to model `$\lambda$`, the expected number of people in the household (other than the head), as a function of the predictors (covariates)

We might be tempted to try a linear model `$$\lambda_i = \beta_0 + \beta_1x_{i1} + \beta_2x_{i2} + \dots + \beta_px_{ip}$$`

This model won't work because...

- It could produce negative values of `$\lambda$` for certain values of the predictors 
- The equal variance assumption required to conduct inference for linear regression is violated.

---

## Poisson regression model

If `$Y_i \sim Poisson$` with `$\lambda = \lambda_i$` for the given values `$x_{i1}, \ldots, x_{ip}$`, then

- Each observation can have a different value of `$\lambda$` based on its value of the predictors `$x_1, \ldots, x_p$`

- `$\lambda$` determines the mean and variance, so we don't need to estimate a separate error term

---

## Poisson vs. multiple linear regression

<div class="figure" style="text-align: center">
<img src="data:image/png;base64,#06-poisson-pt1_files/figure-html/OLSpois-1.png" alt="Regression models: Linear regression (left) and Poisson regression (right)." width="60%" />
<p class="caption">Regression models: Linear regression (left) and Poisson regression (right).</p>
</div>

.footnote[From [BMLR Figure 4.1](https://bookdown.org/roback/bookdown-BeyondMLR/ch-poissonreg.html#a-graphical-look-at-poisson-regression)]

---

## Assumptions for Poisson regression

.pull-left[
**Poisson response**: The response variable is a count per unit of time or space, described by a Poisson distribution, at each level of the predictor(s)

**Independence**: The observations must be independent of one another

**Mean = Variance**: The mean must equal the variance

**Linearity**: The log of the mean rate, `$\log(\lambda)$`, must be a linear function of the predictor(s)
]

.pull-right[
<img src="data:image/png;base64,#06-poisson-pt1_files/figure-html/unnamed-chunk-8-1.png" width="100%" style="display: block; margin: auto;" />

]

---

## Model 1: Number in household vs. age

---

## Model 1: Household vs. Age

```r
model1 <- glm(total ~ age, data = hh_data, family = poisson)

tidy(model1) %>% 
  kable(digits = 4)
```

|term        | estimate| std.error| statistic| p.value|
|:-----------|--------:|---------:|---------:|-------:|
|(Intercept) |   1.5499|    0.0503|   30.8290|       0|
|age         |  -0.0047|    0.0009|   -5.0258|       0|

`$$\log(\hat{\lambda}) = 1.5499  - 0.0047 ~ age$$`

---

.question[
The coefficient for `age` is -0.0047. Interpret this coefficient in context. Select all that apply. [Click here](https://forms.gle/L6oTrTQzgh27FzVK9) to submit your response.

.small[a. The mean household size is predicted to decrease by 0.0047 for each year older the head of the household is.]

.small[b. The mean household size is predicted to multiply by a factor of 0.9953 for each year older the head of the household is.]

.small[c. The mean household size is predicted to decrease by 0.9953 for each year older the head of the household is.]

.small[d. The mean household size is predicted to multiply by a factor of 0.47% for each year older the head of the household is.]

.small[e. The mean household size is predicted to decrease by 0.47% for each year older the head of the household is.]

]
]

---

.question[
The coefficient for `age` is -0.0047. Interpret this coefficient in context. Select all that apply. [Click here](https://forms.gle/L6oTrTQzgh27FzVK9) to submit your response.

.small[a. The mean household size is predicted to decrease by 0.0047 for each year older the head of the household is.]

.small[b. The mean household size is predicted to multiply by a factor of 0.9953 for each year older the head of the household is.]

.small[c. The mean household size is predicted to decrease by 0.9953 for each year older the head of the household is.]

.small[d. The mean household size is predicted to multiply by a factor of 0.47% for each year older the head of the household is.]

.small[e. The mean household size is predicted to decrease by 0.47% for each year older the head of the household is.]

]
]

---

## Understanding the interpretation

Let's derive the change in predicted mean when we go from `$x$` to `$x+1$`

(see boardwork)

---

## Is the coefficient of `age` statistically significant?

|term        | estimate| std.error| statistic| p.value| conf.low| conf.high|
|:-----------|--------:|---------:|---------:|-------:|--------:|---------:|
|(Intercept) |   1.5499|    0.0503|   30.8290|       0|   1.4512|    1.6482|
|age         |  -0.0047|    0.0009|   -5.0258|       0|  -0.0065|   -0.0029|

`$$H_0: \beta_1 = 0 \hspace{2mm} \text{ vs. } \hspace{2mm} H_a: \beta_1 \neq 0$$`

**Test statistic**

`$$Z = \frac{\hat{\beta}_1 - 0}{SE(\hat{\beta}_1)} = \frac{-0.0047 - 0}{0.0009} = -5.026 \text{ (using exact values)}$$`

**P-value**

`$$P(|Z| > |-5.026|) = 5.01 \times 10^{-7} \approx 0$$`

---

## What are plausible values for the coefficient of `age`?

**95% confidence interval for the coefficient of `age`**

`$$\hat{\beta}_1 \pm Z^{*}\times SE(\hat{\beta}_1)$$`
`$$-0.0047 \pm 1.96 \times 0.0009 = \mathbf{(-.0065, -0.0029)}$$`

---

**Which plot can best help us determine whether Model 1 is a good fit?**

---

## Model 2: Add a quadratic effect for `age`

---

## Model 2: Add a quadratic effect for `age`

```r
hh_data <- hh_data %>% 
  mutate(age2 = age*age)

model2 <- glm(total ~ age + age2, data = hh_data, family = poisson)
tidy(model2, conf.int = T) %>% 
  kable(digits = 4)
```

|term        | estimate| std.error| statistic| p.value| conf.low| conf.high|
|:-----------|--------:|---------:|---------:|-------:|--------:|---------:|
|(Intercept) |  -0.3325|    0.1788|   -1.8594|   0.063|  -0.6863|    0.0148|
|age         |   0.0709|    0.0069|   10.2877|   0.000|   0.0575|    0.0845|
|age2        |  -0.0007|    0.0001|  -11.0578|   0.000|  -0.0008|   -0.0006|

---

## Model 2: Add a quadratic effect for `age`

|term        | estimate| std.error| statistic| p.value| conf.low| conf.high|
|:-----------|--------:|---------:|---------:|-------:|--------:|---------:|
|(Intercept) |  -0.3325|    0.1788|   -1.8594|   0.063|  -0.6863|    0.0148|
|age         |   0.0709|    0.0069|   10.2877|   0.000|   0.0575|    0.0845|
|age2        |  -0.0007|    0.0001|  -11.0578|   0.000|  -0.0008|   -0.0006|

We can determine whether to keep `$age^2$` in the model in two ways:

1️⃣ Use the p-value (or confidence interval) for the coefficient (since we are adding a single term to the model)

2️⃣ Conduct a drop-in-deviance test

---

## Deviance

A **deviance** is a way to measure how the observed data deviates from the model predictions.

- It's a measure unexplained variability in the response variable (similar to SSE in linear regression )

- Lower deviance means the model is a better fit to the data

We can calculate the "deviance residual" for each observation in the data (more on the formula later). Let `$(\text{deviance residual}_i$` be the deviance residual for the `$i^{th}$` observation, then

`$$\text{deviance} = \sum(\text{deviance residual})_i^2$$`

*Note: Deviance is also known as the "residual deviance"*

---

## Drop-in-Deviance Test

We can use a **drop-in-deviance test** to compare two models. To conduct the test

1️⃣ Compute the deviance for each model

2️⃣ Calculate the drop in deviance

`$$\text{drop-in-deviance =  Deviance(reduced model) - Deviance(larger model)}$$`

3️⃣ Given the reduced model is the true model `$(H_0 \text{ true})$`, then `$$\text{drop-in-deviance} \sim \chi_d^2$$`

where `$d$` is the difference in degrees of freedom between the two models (i.e., the difference in number of terms)

---

## Drop-in-deviance to compare Model1 and Model2

```r
anova(model1, model2, test = "Chisq") %>%
  kable(digits = 3)
```

| Resid. Df| Resid. Dev| Df| Deviance| Pr(>Chi)|
|---------:|----------:|--:|--------:|--------:|
|      1498|   2337.089| NA|       NA|       NA|
|      1497|   2200.944|  1|  136.145|        0|

.question[
- Write the null and alternative hypotheses.
- What does the value 2337.089	tell you? 
- What does the value 1 tell you? 
- What is your conclusion? 
]

---

## Add `location` to the model?

Suppose we want to add `location` to the model, so we compare the following models:

**Model A**: `$\lambda_i = \beta_0 + \beta_1 ~ age_i + \beta_2 ~ age_i^2$`

**Model B**: `$\lambda_i =  \beta_0 + \beta_1 ~ age_i + \beta_2 ~ age_i^2 + \beta_3 ~ Loc1_i + \beta_4 ~ Loc2_i + \beta_5 ~ Loc3_i + \beta_6 ~ Loc4_i$`

.question[
.small[Which of the following are reliable ways to determine if `location` should be added to the model? Select all that apply. [Click here](https://forms.gle/G3oDWo8ifhTPqCf67) to submit your response.]

Drop-in-deviance test,  Use the p-value for each coefficient, Likelihood ratio test, Nested F Test, BIC
]

---

## Looking ahead

- For next time - [Chapter 4 - Poisson Regression](https://bookdown.org/roback/bookdown-BeyondMLR/ch-poissonreg.html)
  - Sections 4.6, 4.10

---

## Acknowledgements

These slides are based on content in [BMLR Chapter 4 - Poisson regression](https://bookdown.org/roback/bookdown-BeyondMLR/ch-poissonreg.html)
   - Focused on sections 4.4 - 4.5, 4.9