tidy
and kable
functions can help!)Age | Time Online | Number of Dates Arranged Online |
---|---|---|
19 | 35 | 3 |
29 | 20 | 5 |
38 | 15 | 0 |
55 | 10 | 0 |
a. Identify the response, predictor, and offset in this context. Does using an offset make sense? Briefly explain.
b. Write out a model for this data. As part of your model description, define the parameter, \(\lambda\).
c. Consider a zero-inflated Poisson model for this data. Describe what the “true zeros” would be in this setting.
Use the following scenario for Exercises 2 - 3.
Brockmann (1996) carried out a study of nesting female horseshoe crabs. Female horseshoe crabs often have male crabs attached to a female’s nest known as satellites. One objective of the study was to determine which characteristics of the female were associated with the number of satellites. Of particular interest is the relationship between the width of the female carapace and satellites.2 Questions adapted from Sec. 4.11.2, Ex. 2
The data can be found in crab.csv
. It includes:
Satellite
= number of satellitesWidth
= carapace width (cm)Weight
= weight (kg)Spine
= spine condition (1 = both good, 2 = one worn or broken, 3 = both worn or broken)Color
= color (1 = light medium, 2 = medium, 3 = dark medium, 4 = dark)Make sure to convert Spine
and Color
to the appropriate data types in R before doing the analysis.
a. Create a histogram of Satellite
. Is there preliminary evidence the number of satellites could be modeled as a Poisson response? Briefly explain.
b. Fit a Poisson regression model including Width
, Weight
, and Spine
as predictors. Display the model with the 95% confidence interval for each coefficient.
c. Interpret the coefficient of Weight
and its 95% confidence interval in terms of the mean number of satellites.
d. Describe the effect of Spine
in terms of the mean number of satellites.
e. Should Color
be added to the model? Conduct the appropriate test to investigate this question. State the hypotheses, display the relevant output, and state your conclusion in the context of the data.
We would like to fit a quasi-Poisson regression model for this data.
a. Briefly explain why we may want to consider fitting a quasi-Poisson regression model for this data.
b. Fit a quasi-Poisson regression model that corresponds with the model chosen the previous exercise. Display the model.
c. What is the estimated dispersion parameter? Show how this value is calculated.
d. How do the estimated coefficients change compared to the model chosen in the previous exercise? How do the standard errors change?
a. Use the R function rpois()
to generate 10,000 \(x_i\) from a regular Poisson random variable, \(X \sim \textrm{Poisson}(\lambda=1.5)\). Plot a histogram of this distribution and note its mean and variance. Next, let \(Y \sim \textrm{Gamma}(r = 3, \lambda = 2)\) and use rgamma()
to generate 10,000 random \(y_i\) from this distribution. Now, consider 10,000 different Poisson distributions where \(\lambda_i = y_i\). Randomly generate one \(z_i\) from each Poisson distribution. Plot a histogram of these \(z_i\) and compare it to your original histogram of \(X\) (where \(X \sim \textrm{Poisson}(1.5)\)). How do the means and variances compare?3 from Sec 3.7.2, Ex 2
Hint: Remember to set a seed, so your simulations are reproducible!
b. A negative binomial distribution can actually be expressed as a gamma-Poisson mixture. In Part a, you looked at a gamma-Poisson mixture \(Z \sim \textrm{Poisson}(\lambda)\) where \(\lambda \sim \textrm{Gamma}(r = 3, \lambda' = 2)\).
Find the parameters of a negative binomial distribution \(X \sim \textrm{NegBinom}(r, p)\) such that \(X\) is equivalent to \(Z\). As a hint, the means of both distributions must be the same, so \(r(1-p)/p = 3/2\). Show through histograms and summary statistics that your negative binomial distribution is equivalent to the gamma-Poisson mixture. You can use rnbinom()
in R.
Argue that if you want a \(\textrm{NegBinom}(r, p)\) random variable, you can instead sample from a Poisson distribution, where the \(\lambda\) values are themselves sampled from a gamma distribution with parameters \(r\) and \(\lambda' = \frac{p}{1-p}\).4 from Sec. 3.7.2, Ex 3
Hint: Remember to set a seed, so your simulations are reproducible!
Awad, Lebo, and Linden (2017) scraped 40628 Airbnb listings from New York City in March 2017 and put together the data set NYCairbnb.csv
. The codebook is in the data
folder of the hw-02
repo.
Perform the EDA and build a model, considering offset and accounting for overdispersion, if needed. Then, use the model to describe the characteristics of Airbnbs that are expected to have a high number of reviews..5 adapted from Sec 4.11.3, Ex 1
Before you wrap up the assignment, make sure all documents are updated in your GitHub repo.
To submit your assignment:
Go to http://www.gradescope.com and click Log in in the top right corner.
Click School Credentials ➡️ Duke NetID and log in using your NetID credentials.
Click on your STA 310 course.
Click on the assignment, and you’ll be prompted to submit it.
Mark the pages associated with each exercise. All of the pages of your assignment should be associated with at least one question (i.e., should be “checked”).
Select the first page of your .PDF submission to be associated with the “Workflow & formatting” section.
The PDF must be submitted to Gradescope by the deadline to be considered on time.
Total | 50 |
---|---|
Ex 1 | 5 |
Ex 2 | 8 |
Ex 3 | 9 |
Ex 4 | 10 |
Ex 5 | 15 |
Workflow & formatting | 3 |
The “Workflow & formatting” grade is based on the organization of the assignment write up along with the reproducible workflow. This includes having an organized write up with neat and readable headers, code, and narrative, including properly rendered mathematical notation. It also includes having a reproducible R Markdown document that can be knitted to reproduce the submitted PDF and implementing version control using multiple commits with informative commit messages.
Exercises are pulled or adapted from Beyond Multiple Linear Regression.