Problem Set 6 - Due Thu, Mar 16 at 23:59 EST

Reading material and Resources

The external resources are not required reading, but they might be helpful to provide context and details.

  1. Think Python, Ch. 11: Dictionaries
  2. Slides and notebooks from Lec 10, Lec 11
  3. Lab on Dictionaries and Data Visualization
  4. Python documentation on dictionary operations
  5. Lab notes on "How to make a pie chart"
  6. Lab plotting examples
  7. matplotlib gallery
  8. matplotlib plotting commands
  9. matplotlib examples

About this Problem Set

This problem set will give you practice with dictionaries, especially creating dictionaries and accumulating with dictionaries; preparing data for visualization using list comprehension; and data visualization.

  1. In Task 1, you will write a program to find anagrams using dictionaries.
  2. In Task 2 (a partner task), you will perform data analysis and plotting on a dataset about passengers on the Titanic. Use this shared Google Doc to find a pair programming partner. Remember that you can work with the same partner a maximum of TWICE this semester. We will deduct 2 points per day for any students who have not recorded their names on the pair programming google doc by Sunday, Mar 12 at 11:59pm.
  3. The CS111 Problem Set Guide gives an overview of psets, including a detailed description of individual and partner tasks.
  4. In Fall 2016, students spent an average of 2.96 hours on Task 1 (min 1 hour, max 9 hours), and 4.4 hours on Task 2 (min 3 hours, max 12 hours).

All code for this assignment is available in the ps06 folder in the cs111/download directory within your cs server account. This assignment also uses the Otter Inspector program to help you do a final check for Task 1, Task 2, and your Honor Code form before you submit.


Task 1: Unjumble

This is an individual problem which you must complete on your own, though you may ask for help from the CS111 staff.

Word Jumble is a popular game that appears in many newspapers and online. The game involves "unjumbling" English words whose letters have been reordered. For instance, the jumbled word ytikt can be unjumbled to kitty. Here is one version of the game online.


You will create a Python program in unjumble.py that is able to successfully unjumble jumbled words. Here is how to produce candidate words from a jumbled word:


We have provided three lists of English words of different sizes.

Fill in the functions as described in the subtasks below. All your code should be in your unjumble.py file in the ps06 folder from the download directory.

Subtask a

The unjumbleKey function takes a single argument, a string, and returns a string that is the unjumble key of its input -- i.e., a string consisting of the lowercase versions of all characters of the string in alphabetical order. For example:

In [1]: unjumbleKey('argle')
Out[1]: 'aeglr'
In [2]: unjumbleKey('regal')
Out[2]: 'aeglr'
In [3]: unjumbleKey('Star')
Out[3]: 'arst'
In [4]: unjumbleKey('HISTRIONICS')
Out[4]: 'chiiinorsst'

Notes

In [1]: sorted('abracadabra')
Out[1]: ['a','a','a','a','a','b','b','c','d','r','r']
In [2]: ':'.join(['bunny','cat','dog'])
Out[2]: 'bunny:cat:dog'
In [3]: ' '.join(['bunny','cat','dog'])
Out[3]: 'bunny cat dog'
In [4]: ''.join(['bunny','cat','dog'])
Out[4]: 'bunnycatdog'

Subtask b

The makeUnjumbleDictionary function takes a single argument, a list of words, and returns a dictionary that associates the unjumble key of every word in the word-list with the set of all words with that unjumble key. All words in the same set will be anagrams -- i.e., words that all have exactly the same letters (including repeated ones), but in different orders.

unjumble.py imports three word-lists: tinyWordList (33 words), mediumWordList (45,425), and largeWordList (438,712 words). Test makeUnjumbleDictionary on these lists.

In [1]: tinyDict = makeUnjumbleDictionary(tinyWordList)
In [2]: tinyDict
Out[2]:
{'acerst': {'caster', 'caters', 'crates', 'reacts', 'recast', 'traces'},
'aegilnrt': {'alerting', 'altering', 'integral', 'relating', 'triangle'},
'aeglr': {'glare', 'lager', 'large', 'regal'},
'aeinrrst': {'restrain', 'retains', 'strainer', 'terrains', 'trainers'},
'arst': {'arts', 'rats', 'star', 'tars', 'tsar'},
'chiiinorsst': {'histrionics', 'trichinosis'},
'opst': {'opts', 'post', 'pots', 'spot', 'stop', 'tops'}}
In [3]: mediumDict = makeUnjumbleDictionary(mediumWordList)
In [4]: print "len of mediumDict:", len(mediumDict)
len of mediumDict: 42273
In [5]: largeDict = makeUnjumbleDictionary(largeWordList)
In [6]: print "len of largeDct:", len(largeDict)
len of largeDict: 386268

Notes

In [1]: example = set()
In [2]: example
Out[2]: set()

Do not initialize a set by using empty curly braces, because Python uses them for initializing an empty dict. However, if you need to create a set with one element, you can write:

In [3]: one = {5}
In [4]: one
Out[4]: {5}
In [5]: type(one)
Out[5]: set
In [6]: example.add('a')
In [7]: example
Out[7]: {'a'}
In [8]: example.add('b')
In [9]: example
Out[9]: {'a', 'b'}

Subtask c

The unjumble function takes an unjumble dictionary (of the type created by makeUnjumbleDictionary) and a string (that doesn't need to be a meaningful word), and returns a set of all words in the dictionary to which the input string unjumbles. If there are no such words, it returns the empty set. For example,

In [1]: tinyDict = makeUnjumbleDictionary(tinyWordList)
In [2]: unjumble(tinyDict, 'esrat')
Out[2]: set()
In [3]: print unjumble(tinyDict, 'esrat')
Out[3]: set([])
In [4]: mediumDict = makeUnjumbleDictionary(mediumWordList)
In [5]: unjumble(mediumDict, 'esrat')
Out[5]: {'aster', 'rates', 'stare', 'tears'}
In [6]: print unjumble(mediumDict, 'esrat')
Out[6]: set(['stare', 'rates', 'tears', 'aster'])
In [7]: largeDict = makeUnjumbleDictionary(largeWordList)
In [8]: unjumble(largeDict, 'esrat')
Out[8]: {'arest', 'arets', 'aster', 'astre', 'earst', 'rates', 'reast', 'resat', 
'serta', 'stare', 'stear', 'strae', 'tares', 'tarse', 'taser', 'tears', 'teras'}

For fun, use your unjumble function as an assistant in playing the online jumble game

Notes

Subtask d

Which words have the most anagrams? Define a function named mostAnagrams that takes an unjumble dictionary (of the type created by makeUnjumbleDictionary) and returns the largest set of anagrams in that dictionary.

In [1]: tinyDict = makeUnjumbleDictionary(tinyWordList)
In [2]: mostAnagrams(tinyDict)
Out[2]: {'opts', 'post', 'pots', 'spot', 'stop', 'tops'}
In [3]: mediumDict = makeUnjumbleDictionary(mediumWordList)
In [4]: mostAnagrams(mediumDict)
Out[4]: {'pares', 'parse', 'pears', 'rapes', 'reaps', 'spare', 'spear'}
In [5]: mostAnagrams(largeDict)
Out[5]: {'arest',
 'arets',
 'aster',
 'astre',
 'earst',
 'rates',
 'reast',
 'resat',
 'serta',
 'stare',
 'stear',
 'strae',
 'tares',
 'tarse',
 'taser',
 'tears',
 'teras'}

Notes

set(['pots', 'spot', 'stop', 'post', 'tops', 'opts'])
set(['pares', 'spear', 'pears', 'parse', 'spare', 'rapes', 'reaps'])
set(['earst', 'reast', 'arest', 'taser', 'tares', 'resat', 'aster', 'stear', 'tarse', 'teras', 'tears', 'arets', 'serta', 'rates', 'astre', 'strae', 'stare'])

Task 2: Titanic Analysis

This task is a partner problem in which you are required to work with a partner as part of a two-person team.

In this task, you will analyze data about passengers from the Titanic, a large ship that tragically sank in the waters of the North Atlantic Ocean in 1912.

Passenger data can be found in the titanicdata.txt file provided in the ps06 folder. Study the structure of the data in titanicdata.txt. Then, study the code in the readtitanic.py file, which reads the contents of titanicdata.txt and transforms it into a list of dictionaries, where every list element is a passenger represented by a dictionary. Because the information for the passengers is incomplete, not all passenger dictionaries have the same keys. Try the examples below in your interactive console after running readtitanic.py. Notice how passenger 2168 has six keys in the dictionary, but passenger 2203 has only four keys.

In [25]: len(passengerList)
Out[25]: 2208
In[26]: passengerList[2168]
Out[26]:
{'age': '31.0',
 'class': '1st Class',
 'group': 'Servant',
 'job': 'Personal Maid',
 'name': 'Miss Helen Alice Wilson',
 'status': 'survivor'}
In[27]: passsengerList[2203]
Out[27]:
{'age': '22.0',
 'class': '3rd Class',
 'name': 'Mr Mapriededer Zakarian',
 'status': 'victim'}

There is even a passenger who has only three keys in the dictionary. UNGRADED CHALLENGE: Can you find this passenger using a single line list comprehension statement? UNGRADED EXTRA CHALLENGE: Can you find both this passenger and his index in the list with one line of code?

Open up titanic.py in your ps06 folder. This file imports the passengerList variable from readtitanic.py. You will define the functions described in the subtasks below in titanic.py.

Subtask a

Write a function that takes a list of passenger dictionaries (such as passengerList) and creates a dictionary that keeps track of the number of survivors and victims in every cabin class, as well as the calculated survival rate by cabin class. Below is the contract of the function you'll need to write. (The header is already in titanic.py).

def createSurvivalDictionary(passengers):
   """Given a list of passenger dictionaries, return a dictionary
   whose keys are all the cabin classes that appear in passengers.
   The value associated with each cabin class name should
   itself be another dictionary that has three key/value pairs:
     (1) The key "survivors" maps to the number of survivors in
         the cabin class;
     (2) The key "victims" maps to the number of victims in the
         cabin class;
     (3) The key "survivalRate" maps to the survival rate in the
         cabin class (a floating point number rounded to 3 decimal digits)
    """

Testing:

In[30]: createSurvivalDictionary(passengerList)
Out[30]:
{'1st Class': {'survivalRate': 0.62, 'survivors': 201, 'victims': 123},
'2nd Class': {'survivalRate': 0.418, 'survivors': 119, 'victims': 166},
'3rd Class': {'survivalRate': 0.254, 'survivors': 180, 'victims': 528},
'A la Carte': {'survivalRate': 0.043, 'survivors': 3, 'victims': 66},
'Deck': {'survivalRate': 0.652, 'survivors': 43, 'victims': 23},
'Engine': {'survivalRate': 0.222, 'survivors': 72, 'victims': 253},
'Victualling': {'survivalRate': 0.218, 'survivors': 94, 'victims': 337}}

Notes

In [50]: fruitList = ['apple', 'orange', 'apple', 'orange', 'lemon']
In [51]: set(fruitList)
Out[51]: {'apple', 'lemon', 'orange'}
In [52]: print set(fruitList)
set(['orange', 'lemon', 'apple'])

Subtask b: Plotting

Bar Chart of Survival Rates by Class

Define the function barChartOfSurvivalRates which will display a horizontal bar chart showing the percentage of survivors by cabin class, in decreasing order by survival rate.

We have provided a statement, plt.savefig('survival.png') at the end of the function body that will save the plot in a file named "survival.png" in your ps06 folder. This is useful because the Otter Inspect displays it in the test results.

def barChartOfSurvivalRates(passengers):
    """Given a list of passenger dictionaries, displays a horizontal
    bar chart of the sorted percentage of survival rates for each
    cabin class. A percentage value is a floating point number
    between 0 and 1.0.
    """
    # flesh this out here
    plt.savefig('survival.png')  # saves plot in file
    plt.clf()  # clear plot so future plots will start fresh

Our solution for barChartOfSurvivalRates(passengerList) generates the bar chart below. You are free to use different styling to display the information, as long as you fulfill the major requirements. The color of the bars is "indian red" (the name of an iron oxide pigment, click here to read more).

Notes:

Pie Chart of Victims' Cabin Classes

Implement the function pieChartOfVictimsCabinClasses which will create a pie chart showing the relative number of Titanic victims from each cabin class.

We have provided a statement, plt.savefig('cabinclasses.png') at the end of the function body that will save the plot in a file named "cabinclasses.png" in your ps06 folder.

def pieChartOfVictimsCabinClasses(passengers):
    """Given a list of passenger dictionaries, displays a pie chart
    showing the relative number of victims from each cabin class.
    Pie slices should be labeled with the cabin names, and ordered
    clockwise from smallest pie slice to largest pie slice
    starting at 12 o'clock.
    """
    # flesh this out here
    plt.savefig('cabinclasses.png')  # saves plot in file
    plt.clf()  # clear plot so future plots will start fresh

The following chart is the one generated by our solution when pieChartOfVictimsCabinClasses(passengerList). See notes for details how this was created.

You can test your plots in two ways:

Notes:

colormap = plt.cm.Set2  # Set2 is the name of the colormap
colors = colormap(np.linspace(0., 1., NRofSLICES)))
# NRofSLICES needs to be replaced with an expression.

Task 3: Honor Code Form and Final Checks

As in the previous problem sets, your honor code submission for this pset will involve defining entering values for the variables in the honorcode.py file. This is a Python file, so your values must be valid Python code (strings or numbers).

If you wrote any function invocations or print statements in your Python files to test your code, please remove them, comment them out before you submit, or wrap them in a if __name__=='__main__' block. Points will be deducted for isolated function invocations or superfluous print statements.

Your titanic.py program and functions should not pop open any plots when run. If you used plt.show() during testing to see the plots, please delete that line or comment it out.

Remember to run otterInspect.py one final time before you submit to check that your unjumble.py and titanic.py programs work as intended, and that your honor code form is complete. While running your code through otterInspect is not required, it is highly recommended since it may catch errors that you haven't noticed.

If you have issues running otterInspect.py, first check for the following common issues.

If you are unable to resolve this issue after going through this checklist, seek help from any of the course staff.


How to turn in this Problem Set