For this project you and your team will be (1) evaluating a published article on research that incorporates generalized linear models (GLMs) in the analysis and (2) using data from the article to replicate the analysis for one model.
The goals of the project are to better understand how GLMs are used in research and to apply what you’ve learned to analyze real-world data using GLMs.
You will work in teams of 3 - 4 students for this project. Click here for your team assignment.
Before getting stared on the proposal:
Project Week 01 (Jan 20): Select the article and accompanying data set. Write and submit the proposal.
Project Week 02 (week of Mon, Jan 24): Read and evaluate the article. Develop an analysis plan.
Project Week 03 (week of Mon, Jan 31): Work on the analysis. Peer review two teams’ initial analysis results.
Project Week 04 (week of Mon, Feb 07): Finalize the analysis, presentation, and write up. Present in class on Wed, Feb 09.
All work for the project will be submitted on GitHub.
Proposal: due Fri, Jan 21 at 11:59pm
Article evaluation + analysis plan (optional): due Thu, Jan 27 at 11:59pm. Optional submission to receive feedback from the teaching team.
Initial analysis results: due Wed, Feb 03 at 12pm (noon)
Peer reviews: due Wed, Feb 02 at 11:59pm
Write-up and presentation: due Wed, Feb 09 at 3:30pm
The article for this project may be published in an academic journal or in a reputable non-academic publication/website. The article must follow the following criteria:
See the Tips on finding articles for tips on searching databases to find articles and data.
The proposal should include the following:
The citation for the article. If you’re using a .bib
file you can use the default citation format in R Markdown (Chicago author-date format). Otherwise, use MLA format.
Brief summary about why you chose this article.
Brief summary of the article’s primary research objective.
Name of the GLM(s) used in the article and a short description of the response variable for each model.
A glimpse
of the data set
You are only required to write the proposal for one article and data set. Write the proposal in the file proposal.Rmd
, then push the .Rmd and knitted PDF to the GitHub repo by the due date for submission.
The proposal will be graded based on the following:
All required components of the proposal are included and accurate (6 pts)
Data set is in the data
folder of the GitHub repo (2 pts)
All team members have contributed (2 pts)
The goal of the analysis plan is for you team to outline your approach for the two components of the project analysis. These components are
Below are a few questions to consider as you outline the approach to replicate the modeling process for one GLM in the paper:
Below are a few questions to consider as you outline the approach for the original analysis:
You may turn in the analysis plan along with any initial results to receive feedback from the teaching team. To do so,
proposal.Rmd
document. Knit and push the updated proposal to your team’s GitHub repo.@sta310-sp22/teaching-team
. If you have any specific questions you’d like the teaching team to address, add those in the body of the issue as well.The analysis plan is optional. If you would like to receive feedback from the teaching team, you must open the issue and submit the analysis plan on GitHub by Thursday, January 27 at 11:59pm. I would suggest your group sketch an analysis plan before diving into the analysis even if you don’t turn it in for feedback.
The draft of your final analysis and report is due on Wednesday, February 2 at 12pm (noon). You should write the draft in the writeup.Rmd
document.
At a minimum, the draft should include the following:
Each team will review the drafts of two other teams. You will work on the peer review during the class period on Wed, Feb 02 and it is due no later than 11:59pm that day. When you log into GitHub, you will have read access to the two repos you’re reviewing.
Click here for the peer review assignments.
You should discuss the peer review as a team, but only one team member needs to submit the review on GitHub. Every team member should contribute to the discussion and the team’s responses to the peer review questions.
You will submit the peer review as an Issue in each team’s repo. To do so:
The draft and peer review will be graded based on the following:
Draft is comprehensive and includes an attempt at each component mentioned above (7 pts)
Peer review thoroughly addresses the questions in the template. The feedback is comprehensive and accurate. (8 pts)
The final write up is due on Friday, February 11 at 5pm. It should include the following sections:
This section includes a brief summary of the article and its primary research objective. It will also include a description of the data set and relevant variables. This section should be written as if the reader has not read the article nor has seen the data dictionary in your GitHub repo. You do not need to include a description of every variable, but you want to provide enough information that the reader has an idea of the type of information in the data set.
This section will include a description of model you’re reproducing along with a description of the response variable and any relevant descriptive statistics and visualizations. Describe the process you used to reproduce the model (data cleaning or preparation, model selection, etc.) and if there were places where your process differed from that in the original article.
Include the output from the model and the conclusions from the model. These conclusions can include those from the original paper and/or any conclusions your group derived that were in the paper.
This section will include a summary and results from your original analysis. Describe the question you’re exploring in this analysis and your motivation for choosing this question. Describe the analysis process (data cleaning, model selection, model evaluation, etc.). Include the relevant output from your results and a summary of the conclusions from this analysis. Note any conclusions that may have differed from those in the original article.
This section will include a summary of your conclusions along with any limitations to the data or analysis. Also include any challenges your group may have faced with reproducing the model and suggestions to improve the reproducible of the analysis.
Each section will assessed on whether the components of the section are clearly, comprehensively, and accurately discussed in the report. The point allocation is as follows:
The report will also be assessed based on the following:
You will present on Wednesday, February 09 during lecture. Each team will have 6 minutes for the presentation along with a few minutes for questions, and every team member should speak about an equal amount of time during the presentation.
You can make the presentation slides using the software of your choice. You can use as many slide as you wish, just be mindful of what can reasonably be presentation in 6 minutes. A suggested outline is
You will be assigned two presentations to peer review. You must submit the peer review scores for both presentations to have the “Peers” scores for your team’s presentation included in your presentation grade.
The presentation order is as follows:
This portion of the grade will the average of the scores from the members of the teaching team.
This portion of the grade will the average of the scores from the peer reviewers.
You should have the following files and folders in the project repo. The repo and brief summary in the README should be updated by Friday, February 11 at 5pm.
/data/
: The data set
/data/*
: File containing data set/data/README.md
: Codebook for data set.README.md
: 3 - 5 sentence summary of the project
/proposal
: Folder for project proposal
/proposal/proposal.Rmd
: R Markdown file for proposal/proposal/proposal.pdf
: Knitted PDF of proposal/writeup/
: Folder for write up
/writeup/writeup.Rmd
: R Markdown file for write up/writeup/writeup.pdf
: Knitted PDF of write up/presentation
: Folder for presentation
/presentation/*
: Presentation file (if not linked in README)/presentation/README.md
: Link to project (if not in presentation folder)Component | Points |
---|---|
Proposal | 10 pts |
Peer review | 15 pts |
Written report | 35 pts |
Presentation | 25 pts |
Organization | 5 pts |
Teamwork evaluation | 10 pts |
Below are tips to help you find articles based on information from Jodi Psoter, the Librarian for Chemistry and Statistical Science at Duke Libraries.
Articles in health-related fields
The PubMed heading tree lets you search by topic. The link will direct you to the results under the category of “Statistics as a Topic.”
You can use the other search options to narrow down results based on your team’s interests.
Articles in psychology
PsycInfo will allow users to search by analysis type.
Put the name of the model in the search bar, e.g., “Poisson Regression.” Then, in the drop down menu next to the search bar, select “DE Subjects [exact].” Click Search.
On the left-hand side, under Limit To, check “Open Access.” This will not guarantee the article has an associated data set, but a lot of open access articles will make the data available or utilize publicly accessible data you could pull from another source.
Articles on all topics
Web of Science Data Citation Index lets you search for data sets based on the topic of interest.
Use the search bar to search based on a topic of interest. You can also search for the model / analysis type.
On the left-hand side, check “Data Set” under Content Type and check “Dataset” under Data Types. Click “Refine” to limit the results.
3.Click on the article of interest.
Grading criteria and the repo organization for this project were adapted from Project 1 on vizdata.org.