@extends('template')
@section('title')
Lab 9, Part 2: Plotting Baby Name data
@stop
@section('content')
# Lab 9, Part 2: Plotting Baby Name data
In this part of lab, we are working with a subset of data from the [US SSA (Social Security Administration](https://www.ssa.gov/oact/babynames/), which tracks names of babies born in the US.
## Set up for plotting
Running your plotting code in Canopy places your plots in the inline Python window.
To have your plots pop up in a separate window, do this:
1. _Canopy_ > _Preferences..._
2. Click on the **Python** tab
3. In the _PyLab backend_ drop-down menu, select **Interactive (wx)**
## Task 0. Familiarize yourself with the data
In your `lab09` folder there is a provided file called `topNamesData.py` that contains 6 variables. Each variable contains a list of tuples with the number of babies born that year and the name. The variables are named `year` where `` is one of the following years: 1950, 1980 or 2000 and `` is either `F` for female or `M` for male. (To keep things simple, we are only working with these 6 variables in this part of lab).
The variable `year1950F` contains the number of female babies born in 1950 with the top 10 most popular names.
```py
year1950F = [(80412, 'Linda'),
(65443, 'Mary'),
(47920, 'Patricia'),
(41560, 'Barbara'),
(38019, 'Susan'),
(29618, 'Nancy'),
(29074, 'Deborah'),
(28885, 'Sandra'),
(26159, 'Carol'),
(25693, 'Kathleen')]
```
Simiarly, the variable `year1950M` contains the number of male babies born in 1950 with the top 10 most popular names.
```py
year1950M = [(86139, 'James'),
(83534, 'Robert'),
(79396, 'John'),
(65141, 'Michael'),
(60704, 'David'),
(60699, 'William'),
(50969, 'Richard'),
(45608, 'Thomas'),
(39107, 'Charles'),
(33726, 'Gary')]
```
**Create a new file called `babynames.py`**— at the top of this file import `topNamesData` so you can use the pre-defined lists of tuples.:
```py
from topNamesData import *
```
## Task 1. Plot bar charts of counts of the names
Write a function called `makePlot1` that will take a list of tuples and produce a plot as shown.
Consider the variable `year1950F`: to make a bar chart, the names (e.g. Linda, Mary, Patricia, etc) must be separated from their respective counts (e.g. 80412, 65443, 47920, etc).
### Extract what you need to plot
Given the list of tuples in `year1950F`, figure out how to extract the counts into a list of counts. Hint: a list comprehension would work.
To make a horizontal bar chart, you will need to plot numbers against numbers.
Here is a simple plot of the female baby names in 1950.
## Goal: Get a plot to show up
```py
makePlot1(year1950F)
```
## Task 2. Better bar charts
There are several ways in which we can improve the bar graph shown in the task above.
Write a new version of your function (call it `makePlot2`) by adding these improvements (add them in one at a time):
+ Add a title to your plot
+ Label the x and y axis
+ Change the thickness of the bars
+ Change bar color [click here for predefined colors in matplotlib](https://matplotlib.org/examples/color/named_colors.html)
+ Add labels to your axis (here is an xtick label example)
Hint: Use list comprehensions to get ytick markers at half increments.
```py
makePlot2(year1950F)
```
## Task 3: Most popular name on top
Write another function called `makePlot3` that will plot the names with the most frequently occurring name on top (as shown below).
```py
makePlot3(year1950F)
```
## Task 4: Names sorted alphabetically
Plot the names in alphabetical order from top to bottom (as shown below)
```py
makePlot4(year1950F)
```
```py
makePlot4(year2000F)
```
```py
makePlot4(year2000M)
```
## Task 5: Plot last letter percentages of top 10 names
### Task 5A: `lastLetterNames()`
For this task, you'll need to create a new dictionary. Write a new function called `lastLetterNames` that takes a data variable and **returns** a dictionary. The keys in the dictionary are the last letters of the names and the values are the names that end with that letter. Here are some examples:
```py
In []: lastLetterNames(year2000F)
Out[]:
{'a': ['Samantha', 'Jessica'],
'h': ['Hannah', 'Sarah', 'Elizabeth'],
'n': ['Madison'],
'r': ['Taylor'],
's': ['Alexis'],
'y': ['Emily', 'Ashley']}
In []: lastLetterNames(year1950M)
Out[]:
{'d': ['David', 'Richard'],
'l': ['Michael'],
'm': ['William'],
'n': ['John'],
's': ['James', 'Thomas', 'Charles'],
't': ['Robert'],
'y': ['Gary']}
In []: lastLetterNames(year1980F)
Out[]:
{'a': ['Amanda', 'Jessica', 'Melissa'],
'e': ['Nicole', 'Michelle'],
'h': ['Sarah', 'Elizabeth'],
'r': ['Jennifer', 'Heather'],
'y': ['Amy']}
```
### Task 5B: `plotLastLetterPercents(data)`
Now that you have the last letter dictionaries, you can plot the last letters on the `Y` axis and the percentages on the `X` axis in a horizontal bar chart.
Write a function called `plotLastLetterPercents()` that takes in a list of tuples and displays a plot showing the last letters and the percentage representation within the top 10 names.
Note:
1. We are plotting percents rather than count
2. If you would like to change the background color of the figure,
you can use the `facecolor` attribute of figure:
```py
plt.figure(facecolor='lightsteelblue')
```
###Example 1:
```py
plotLastLetterPercents(year2000F)
```
###Example 2:
```py
plotLastLetterPercents(year1950M)
```
###Example 3:
```py
plotLastLetterPercents(year1980F)
```
---
@include('/labs/lab09/_toc')
@stop