Part 2: Plotting Baby Name data

In this part of lab, we are working with a subset of data from the US SSA (Social Security Administration, which tracks names of babies born in the US.

Set up for plotting

Running your plotting code in Canopy places your plots in the inline Python window.

To have your plots pop up in a separate window, do this:

  1. Canopy > Preferences...
  2. Click on the Python tab
  3. In the PyLab backend drop-down menu, select Interactive (wx)

Task 0. Familiarize yourself with the data

In your plotting folder there is a provided file called topNamesData.py that contains 6 variables. Each variable contains a list of tuples with the number of babies born that year and the name. The variables are named year<XXXX><S> where <XXXX> is one of the following years: 1950, 1980 or 2000 and <S> is either F for female or M for male. (To keep things simple, we are only working with these 6 variables in this part of lab).

The variable year1950F contains the number of female babies born in 1950 with the top 10 most popular names.

year1950F = [(80412, 'Linda'),
 (65443, 'Mary'),
 (47920, 'Patricia'),
 (41560, 'Barbara'),
 (38019, 'Susan'),
 (29618, 'Nancy'),
 (29074, 'Deborah'),
 (28885, 'Sandra'),
 (26159, 'Carol'),
 (25693, 'Kathleen')]

Simiarly, the variable year1950M contains the number of male babies born in 1950 with the top 10 most popular names.

year1950M = [(86139, 'James'),
 (83534, 'Robert'),
 (79396, 'John'),
 (65141, 'Michael'),
 (60704, 'David'),
 (60699, 'William'),
 (50969, 'Richard'),
 (45608, 'Thomas'),
 (39107, 'Charles'),
 (33726, 'Gary')]

Create a new file called babynames.py— at the top of this file import topNamesData so you can use the pre-defined lists of tuples.:

from topNamesData import *

Task 1. Plot bar charts of counts of the names

Write a function called makePlot1 that will take a list of tuples and produce a plot as shown. Consider the variable year1950F: to make a bar chart, the names (e.g. Linda, Mary, Patricia, etc) must be separated from their respective counts (e.g. 80412, 65443, 47920, etc).

Extract what you need to plot

Given the list of tuples in year1950F, figure out how to extract the counts into a list of counts. Hint: a list comprehension would work. To make a horizontal bar chart, you will need to plot numbers against numbers.
Here is a simple plot of the female baby names in 1950.

Goal: Get a plot to show up

 makePlot1(year1950F)

Task 2. Better bar charts

There are several ways in which we can improve the bar graph shown in the task above.

Write a new version of your function (call it makePlot2) by adding these improvements (add them in one at a time):

Hint: Use list comprehensions to get ytick markers at half increments.

 makePlot2(year1950F)

Task 3: Most popular name on top

Write another function called makePlot3 that will plot the names with the most frequently occurring name on top (as shown below).

makePlot3(year1950F)

Task 4: Names sorted alphabetically

Plot the names in alphabetical order from top to bottom (as shown below)

makePlot4(year1950F)
makePlot4(year2000F)
makePlot4(year2000M)

Task 5: Plot last letter percentages of top 10 names

Task 5A: lastLetterNames()

For this task, you'll need to create a new dictionary. Write a new function called lastLetterNames that takes a data variable and returns a dictionary. The keys in the dictionary are the last letters of the names and the values are the names that end with that letter. Here are some examples:

In []: lastLetterNames(year2000F)
Out[]: 
{'a': ['Samantha', 'Jessica'],
 'h': ['Hannah', 'Sarah', 'Elizabeth'],
 'n': ['Madison'],
 'r': ['Taylor'],
 's': ['Alexis'],
 'y': ['Emily', 'Ashley']}

In []: lastLetterNames(year1950M)
Out[]: 
{'d': ['David', 'Richard'],
 'l': ['Michael'],
 'm': ['William'],
 'n': ['John'],
 's': ['James', 'Thomas', 'Charles'],
 't': ['Robert'],
 'y': ['Gary']}

In []: lastLetterNames(year1980F)
Out[]: 
{'a': ['Amanda', 'Jessica', 'Melissa'],
 'e': ['Nicole', 'Michelle'],
 'h': ['Sarah', 'Elizabeth'],
 'r': ['Jennifer', 'Heather'],
 'y': ['Amy']}

Task 5B: plotLastLetterPercents(data)

Now that you have the last letter dictionaries, you can plot the last letters on the Y axis and the percentages on the X axis in a horizontal bar chart. Write a function called plotLastLetterPercents() that takes in a list of tuples and displays a plot showing the last letters and the percentage representation within the top 10 names.

Note:

  1. We are plotting percents rather than count
  2. If you would like to change the background color of the figure, you can use the facecolor attribute of figure:
    plt.figure(facecolor='lightsteelblue')

Example 1:

plotLastLetterPercents(year2000F)

Example 2:

plotLastLetterPercents(year1950M)

Example 3:

plotLastLetterPercents(year1980F)

Table of Contents