Session 5: Derive Information with dplyr
In this session, you will continue learning data manipulation with dplyr.
Go through the RStudio Primer on Derive Information with dplyr, and complete the assignments below.
Using the taxation dataset, complete the following tasks:
Create a new column called income_labels from income_mean, where you have 4 new income categories below 50000, 5000-79999, 8000-109999, and above 110000. You can code the categories as numbers from 1 to 4. (hint: use case_when() in mutate() when you make the new columns.)
How many quarters are there for each of the four categories by year? Use group_by() and summarize() to answer. You can make a table like this:
| year | income_labels | number_quarters |
|---|---|---|
| 2001 | 1 | 7 |
| 2001 | 2 | 8 |
| 2001 | 3 | 6 |
| 2002 | 1 | 6 |
| 2002 | 2 | 9 |
| 2002 | 3 | 6 |
| 2003 | 1 | 6 |
| 2003 | 2 | 9 |
| 2003 | 3 | 6 |
| 2004 | 1 | 6 |
| 2004 | 2 | 10 |
| 2004 | 3 | 5 |
| 2005 | 1 | 6 |
| 2005 | 2 | 9 |
| 2005 | 3 | 6 |
| 2006 | 1 | 6 |
| 2006 | 2 | 9 |
| 2006 | 3 | 6 |
| 2007 | 1 | 6 |
| 2007 | 2 | 9 |
| 2007 | 3 | 5 |
| 2007 | 4 | 1 |
| 2008 | 1 | 6 |
| 2008 | 2 | 9 |
| 2008 | 3 | 5 |
| 2008 | 4 | 1 |
| 2009 | 1 | 6 |
| 2009 | 2 | 9 |
| 2009 | 3 | 5 |
| 2009 | 4 | 1 |
| 2010 | 1 | 6 |
| 2010 | 2 | 9 |
| 2010 | 3 | 5 |
| 2010 | 4 | 1 |
| 2011 | 1 | 6 |
| 2011 | 2 | 9 |
| 2011 | 3 | 6 |
| 2012 | 1 | 6 |
| 2012 | 2 | 9 |
| 2012 | 3 | 6 |
| 2013 | 1 | 4 |
| 2013 | 2 | 11 |
| 2013 | 3 | 5 |
| 2013 | 4 | 1 |
| 2014 | 1 | 4 |
| 2014 | 2 | 11 |
| 2014 | 3 | 5 |
| 2014 | 4 | 1 |
| 2015 | 1 | 3 |
| 2015 | 2 | 12 |
| 2015 | 3 | 4 |
| 2015 | 4 | 2 |
| 2016 | 1 | 2 |
| 2016 | 2 | 13 |
| 2016 | 3 | 4 |
| 2016 | 4 | 2 |
| 2017 | 1 | 2 |
| 2017 | 2 | 12 |
| 2017 | 3 | 6 |
| 2017 | 4 | 1 |