Problem Set 7 - Due Tue, Nov. 06 at 23:59 EST

Reading material and Resources

Reading

  1. Slides and notebooks from Lec 13 Dictionaries, and Lec 14 Accumulation Pattern with Lists and Dictionaries, and Lec 16 Sorting, lambda, map, and filter,
  2. Lab 08 on Dictionaries and Lab 09 on Data Analysis and Data Visualization
  3. Think Python, Ch. 11: Dictionaries
  4. Python documentation on dictionary operations
  5. Python documentation on set operations
  6. Lab notes on "How to make a pie chart"
  7. Lab plotting examples
  8. matplotlib gallery
  9. matplotlib plotting commands
  10. matplotlib examples

About this Problem Set

This problem set will give you practice with dictionaries, which are an important data structure for the rest of the course. It will also give you practice with

testing (in the context of finding differences between large dictionaries), data analysis, and data visualization.

  1. In Task 1 (Individual Task), you will write a program to find anagrams using dictionaries.
  2. In Task 2 (Partner-recommended task), you will perform data analysis and plotting on a dataset about passengers on the Titanic. Use this shared Google Doc to find a pair programming partner and record your partner. Remember that you can talk with other people about high-level problem-solving strategies, but sharing code in an Honor Code violation.

Other notes:


Task 1: Unjumble

This is an individual problem which you must complete on your own, though you may ask for help from the CS111 staff.

Word Jumble is a popular game that appears in many newspapers and online. The game involves "unjumbling" English words whose letters have been reordered. For instance, the jumbled word ytikt can be unjumbled to kitty. Here is one version of the game online.


You will create a Python program in unjumble.py that is able to successfully unjumble jumbled words. Here is how to produce candidate words from a jumbled word:


We have provided five lists of English words of different sizes.

Fill in the functions and iteration table as described in the subtasks below. All your code should be in your unjumble.py file in the ps07 folder from the download directory.

Subtask 1a: unjumbleKey

In this part, you will define the unjumbleKey function.

The unjumbleKey function takes a single argument, a string, and returns a string that is the unjumble key of its input -- i.e., a string consisting of the lowercase versions of all characters of the string in alphabetical order. For example:

In [1]: unjumbleKey('argle')
Out[1]: 'aeglr'

In [2]: unjumbleKey('regal')
Out[2]: 'aeglr'

In [3]: unjumbleKey('Star')
Out[3]: 'arst'

In [4]: unjumbleKey('HISTRIONICS')
Out[4]: 'chiiinorsst'

Notes

Subtask 1b: makeUnjumbleDictionary Iteration Table

In this part, you will create an iteration table for the makeUnjumbleDictionary function. This will prepare you to define the makeUnjumbleDictionary function in Subtask 1c.

The makeUnjumbleDictionary function takes a single argument, a list of words, and returns a dictionary that associates the unjumble key of every word in the word list with the set of all words with that unjumble key. All words in the same set will be anagrams -- i.e., words that all have exactly the same letters (including repeated ones), but in different orders.

Here we show how makeUnjumbleDicionary should behave on two of the word lists imported by unjumble.py: minisculeWordList (6 words) and tinyWordList (33 words):

In [1]: minisculeWordList
Out[1]: ['arts', 'post', 'rats', 'regal', 'spot', 'star']

In [2]: makeUnjumbleDictionary(minisculeWordList)
Out[2]: 
{'aeglr': {'regal'},
 'arst': {'arts', 'rats', 'star'},
 'opst': {'post', 'spot'}}

In [3]: tinyWordList
# Note: Canopy prints each of the 33 words on separate lines, but here we compress
# them into five lines to save space.
Out[3]: 
['alerting', 'altering', 'arts', 'caster', 'caters', 'crates', 'glare', 
 'histrionics', 'integral', 'lager', 'large', 'opts', 'post', 'pots',
 'rats', 'reacts', 'recast', 'regal', 'relating', 'restrain', 'retrains',
 'spot', 'star', 'strainer', 'stop', 'tars', 'terrains', 'traces', 'trainers',
 'triangle', 'trichinosis', 'tops', 'tsar']

In [4]: makeUnjumbleDictionary(tinyWordList)
Out[4]:
{'acerst': {'caster', 'caters', 'crates', 'reacts', 'recast', 'traces'},
 'aegilnrt': {'alerting', 'altering', 'integral', 'relating', 'triangle'},
 'aeglr': {'glare', 'lager', 'large', 'regal'},
 'aeinrrst': {'restrain', 'retains', 'strainer', 'terrains', 'trainers'},
 'arst': {'arts', 'rats', 'star', 'tars', 'tsar'},
 'chiiinorsst': {'histrionics', 'trichinosis'},
 'opst': {'opts', 'post', 'pots', 'spot', 'stop', 'tops'}}

Notes on Sets

The dictionary values in this problem are sets, not lists. Like lists, sets are mutable collections. Unlike lists, they are unordered and cannot contain duplicate elements. Below are some notes on creating and using sets. See this Python documentation for more details on sets manipulation.

Your Subtask: Create an Iteration Table

Before you start to define the makeUnjumbleDictionary function, we ask you to create an iteration table to understand how the loop in this function should process the word list minsculeWordList, which contains six words: ['arts', 'post', 'rats', 'regal', 'spot', 'star']

Assume that the loop has a state variable named unjumbleDictSoFar that will be used to "grow" the resulting dictionary as each word is processed. Initially, unjumbleDictSoFar should be the empty dictionary. The purpose of this subtask is for you to think carefully about how this dictionary should change when each word is processed.

Here is a template for this iteration table that includes the first two rows:

word unjumbleDictSoFar
{}
'arts' {'arst': {'arts'}}
'post'
'rats'
'regal'
'spot'
'star'

Your subtask is to fill out the dictionaries in the remaining five rows. Keep in mind that each dictionary key should be a string that's an unjumble key for a word, and each value in the dicionary should be a set of words with they key as their unjumble key.

As in PS06 Task 2a, you will submit your iteration table as a list of tuples, so that it can be tested by Codder. In this case, you should flesh out the following skeleton of the list of tuples named minisculeDictTable in unjumble.py.

minisculeDictTable = [
    ('word', 'unjumbleDictSoFar'),
    (None, {}),
    ('arts', {'arst': {'arts'}}),
    ('post', None), # Replace None by the correct dictionary 
    ('rats', None), # Replace None by the correct dictionary 
    ('regal', None), # Replace None by the correct dictionary 
    ('spot', None), # Replace None by the correct dictionary 
    ('star', None) # Replace None by the correct dictionary 
    ]

Codder will test the that the structure of the tuple list in your iteration table is valid, but the contents of the tuple will not be checked by Codder until the final grading pass after submission.

Subtask 1c: makeUnjumbleDictionary Function

In this subtask, you will define the makeUnumbleDictionary function described in Subtask 1b.

unjumble.py imports five word lists: minisculeWordList (6 words), tinyWordList (33 words), smallWordList (1,159 words), mediumWordList (45,423 words), and largeWordList (445,528 words). Test makeUnjumbleDictionary on these lists:

In [1]: minisculeDict = makeUnjumbleDictionary(minisculeWordList)

In [2]: minisculeDict
Out[2]: 
{'aeglr': {'regal'},
 'arst': {'arts', 'rats', 'star'},
 'opst': {'post', 'spot'}}

In [3]: tinyDict = makeUnjumbleDictionary(tinyWordList)

In [4]: tinyDict
Out[4]:
{'acerst': {'caster', 'caters', 'crates', 'reacts', 'recast', 'traces'},
 'aegilnrt': {'alerting', 'altering', 'integral', 'relating', 'triangle'},
 'aeglr': {'glare', 'lager', 'large', 'regal'},
 'aeinrrst': {'restrain', 'retains', 'strainer', 'terrains', 'trainers'},
 'arst': {'arts', 'rats', 'star', 'tars', 'tsar'},
 'chiiinorsst': {'histrionics', 'trichinosis'},
 'opst': {'opts', 'post', 'pots', 'spot', 'stop', 'tops'}}

In [5]: smallDict = makeUnjumbleDictionary(smallWordList)

# smallDict and the other test dictionaries have too many elements to display
# them all, so let's just print its length.
In [6]: print 'len of smallDict:', len(smallDict)
len of smallDict: 483

In [7]: mediumDict = makeUnjumbleDictionary(mediumWordList)

In [8]: print 'len of mediumDict:', len(mediumDict)
len of mediumDict: 42271

In [9]: largeDict = makeUnjumbleDictionary(largeWordList)

In [10]: print 'len of largeDct:', len(largeDict)
len of largeDict: 388509

Notes

Subtask 1d: unjumble

The unjumble function takes an unjumble dictionary (of the type created by makeUnjumbleDictionary) and a string (that doesn't need to be a meaningful word), and returns a set of all words in the dictionary to which the input string unjumbles. If there are no such words, it returns the empty set. For example,

In [1]: tinyDict = makeUnjumbleDictionary(tinyWordList)

In [2]: unjumble(tinyDict, 'esrat')
Out[2]: set()

In [3]: print unjumble(tinyDict, 'esrat')
set([])

In [4]: mediumDict = makeUnjumbleDictionary(mediumWordList)

In [5]: unjumble(mediumDict, 'esrat')
Out[5]: {'aster', 'rates', 'stare', 'tears'}

In [6]: print unjumble(mediumDict, 'esrat')
set(['stare', 'rates', 'tears', 'aster'])

In [7]: largeDict = makeUnjumbleDictionary(largeWordList)

In [8]: unjumble(largeDict, 'esrat')
Out[8]: {'arest', 'arets', 'aster', 'astre', 'earst', 'rates', 'reast', 'resat', 
'serta', 'stare', 'stear', 'strae', 'tares', 'tarse', 'taser', 'tears', 'teras'}

For fun, use your unjumble function as an assistant in playing the online jumble game

Notes

Subtask 1e: mostAnagrams

Which words have the most anagrams? Define a function named mostAnagrams that takes an unjumble dictionary (of the type created by makeUnjumbleDictionary) and returns the largest set of anagrams in that dictionary.

In [1]: tinyDict = makeUnjumbleDictionary(tinyWordList)

In [2]: mostAnagrams(tinyDict)
Out[2]: {'opts', 'post', 'pots', 'spot', 'stop', 'tops'}
# This is one of two correct results for mostAnagrams(tinyDict). 
# The other is {'caster', 'caters', 'crates', 'reacts', 'recast', 'traces'}

In [3]: mediumDict = makeUnjumbleDictionary(mediumWordList)

In [4]: mostAnagrams(mediumDict)
Out[4]: {'pares', 'parse', 'pears', 'rapes', 'reaps', 'spare', 'spear'}

In [5]: mostAnagrams(largeDict)
Out[5]: {'arest',
 'arets',
 'aster',
 'astre',
 'earst',
 'rates',
 'reast',
 'resat',
 'serta',
 'stare',
 'stear',
 'strae',
 'tares',
 'tarse',
 'taser',
 'tears',
 'teras'}

Notes

Subtask 1f: compareUnjumbleDicts

Motivation

One downside to using Codder for testing code is that it doesn't give you practice with writing your own testing code, which is an important real-world programming skill. For example, if you go on to take CS230 Data Structures, in that course no testing code will be provided; you will be expected to write all of your testing code from scratch by yourself!

This problem is designed to give you some practice with writing and using your own version of the kind of testing code used in Codder. In particular, you will address the following problem: how do you know whether or not the dictionaries created by makeUnjumbleDictionary in Subtask 1c are correct?

To simplify things, we provide you with with the correct unjumbler dictionaries minisculeDictCorrect, tinyDictCorrect, smallDictCorrect, mediumDictCorrect, and largeDictCorrect. Then your task becomes comparing the correct dictionary with your dictionary that is returned by your makeUnjumbleDictionary.

It turns out it is trivial in Python to compare two dictionaries dict1 and dict2 for equality: just use ==. So dict1 == dict2 is True if and only if:

  1. the dictionaries have exactly the same keys; and

  2. the values at each of the keys is the same (as determined by ==). In other words, it is the case that for all keys k in the dictionary, dict1[k] == dict2[k].

But when two unjumble dictionaries differ, you don't just want to know that they differ, you want to know how they differ, because this provides valuable information on possible bugs in makeUnjumbleDictionary. After all, you would be upset if Codder just reported that two huge dictionaries were different, but didn't give any indication of ways in which they differed.

Your Task

In this task, you will flesh out a function compareUnjumbleDicts that takes two arguments:

  1. the expected (correct) unjumble dictionary

  2. an actual unjumble dictionary returned by makeUnjumbleDictionary

This function should print out information highlighting any differences between the dictionaries, as detailed below.

When it is done, this function should also return True if the dictionaries are the same and False if they differ.

Here is the information that compareUnjumbleDicts should print:

1. Number of Keys:

If the number of keys in the two dictionaries differs, this message should be printed (where of course the correct number of keys should be used):

The expected dict has 483 keys but the actual dict has 509 keys.

But if the dictionaries have the same number of keys, this message should be printed:

Both the expected and actual dicts have 483 keys.

2. Differences in Keys:

If the two sets of keys are exactly the same, then this message should be printed:

Both the expected and actual dicts have exactly the same keys.

But if there are any expected keys that are not found in the actual dict, this message should be printed, which should list up to the first five such keys (in sorted order):

Some keys in the expected dict but not the actual dict:
["'aehor", "'aehos", "'aelory", "'aillnosuv", "'beinor"]

Additionally if there are any actual keys that are not found in the expected dict, this message should be printed, which should list up to the first five such keys (in sorted order):

Some keys in the actual dict but not the expected one:
["'boeinr", "'coellnno", "'connoor", "'doell", "'doellnno"]

3. Differences in Values:

If there are any keys in both dictionaries at which the two unjumble dictionaries have different values, then this message should be printed for the first such key in sorted order:

Both dicts have key "abel", but the sets at this key differ:
* the expected set has sorted values
  ['Abel', 'Bela', 'Elba', 'abel', 'able', 'albe', 'bael', 'bale', 'beal', 'bela', 'blae', 'blea']
* the actual set has sorted values
  ['abel', 'able', 'albe', 'bael', 'bale', 'beal', 'bela', 'blae', 'blea']

But if the dictionaries have exactly the same values at all keys that they share, then this message should be printed

Both dicts have exactly the same sets for all keys that are in both dicts.

More Set Operations

Implementing the above behavior is simplified by using set operations on the keys and values in the unjumble dictionary. Here we illustrate these opertions in the context of some sample sets:

In [1]: s1 = {'cat', 'ferret', 'ant', 'dog'}

In [2]: s2 = {'bunny', 'dog', 'cat', 'emu'}

Even though sets are inherently unordered, Canopy will show their elements in sorted order when they are the result of an expression in the interaction pane:

In [3]: s1
Out[3]: {'cat', 'ferret', 'ant', 'dog'}

However, when print is used on a set in Canopy, the order of the elements is unpredictable:

In [4]: print(s1)
set(['ant', 'ferret', 'dog', 'cat'])

The union of sets s1 and s2, written s1.union(s2) or s1 | s2, returns a new set that has all the elements that are in either set:

In [5]: s1.union(s2)
Out[5]: {'ant', 'bunny', 'cat', 'dog', 'emu', 'ferret'}

In [6]: s1 | s2
Out[6]: {'ant', 'bunny', 'cat', 'dog', 'emu', 'ferret'}

The intersection of sets s1 and s2, written s1.intersection(s2) or s1 & s2, returns a new set that has all the elements that are in both sets:

In [7]: s1.intersection(s2)
Out[7]: {'cat', 'dog'}

In [8]: s1 & s2
Out[8]: {'cat', 'dog'}

The difference of sets s1 and s2, written s1.difference(s2) or s1 - s2, returns a all the elements that are in the first set but not the second:

In [9]: s1.difference(s2)
Out[9]: {'ant', 'ferret'}

In [10]: s1 - s2
Out[10]: {'ant', 'ferret'}

In [11]: s2.difference(s1)
Out[11]: {'bunny', 'emu'}

In [12]: s2 - s1
Out[12]: {'bunny', 'emu'}

Sets are tested for equality using ==. The order in which elemets are written in a set is irrelevant for determining equality

In [13]: s1 == s2
Out[13]: False

In [14]: {'cat', 'bunny', 'dog', 'ant'} == {'dog', 'ant', 'cat', 'bunny'} 
Out[14]: True

A set can be created from a list by use the set function, and a list can be create from a set via the list function:

In [15]: set(['cat', 'emu', 'ant'])
Out[15]: {'ant', 'cat', 'emu'}

In [16]: set(['emu', 'cat', 'emu', 'ant', 'emu', 'cat'])
Out[16]: {'ant', 'cat', 'emu'}

In [17]: list({'cat', 'bunny', 'dog', 'ant'})
Out[17]: ['ant', 'cat', 'dog', 'bunny']

Note that set removes duplicates from a list, and the order of elements returned by list from a set is unpredicatable.

Testing compareUnjumbleDicts

unjumble.py imports six versions of a small dictionary named smallDict1 through smallDict6. Some of these are the same as the correct dictionary smallDictCorrect, while others differ from the correct dictionary in their keys and/or values.

You can test your compareUnjumbleDicts on these test dictionaries. For example:


In [18]: from unjumbleDicts import*  # Need this line to import unjumble dictionaries. 

In [19]: compareUnjumbleDicts(smallDictCorrect, smallDictCorrect)

Both the expected and actual dicts have 483 keys.

Both the expected and actual dicts have exactly the same keys.

Both dicts have exactly the same sets for all keys that are in both dicts.

Out[19]: True

In [20] compareUnjumbleDicts(smallDictCorrect, smallDict1)

The expected dict has 483 keys but the actual dict has 509 keys.

Some keys in the expected dict but not the actual dict:
["'aehor", "'aehos", "'aelory", "'aillnosuv", "'beinor"]

Some keys in the actual dict but not the expected one:
["'boeinr", "'coellnno", "'connoor", "'doell", "'doellnno"]

Both dicts have key "abel", but the sets at this key differ:
* the expected set has sorted values
  ['Abel', 'Bela', 'Elba', 'abel', 'able', 'albe', 'bael', 'bale', 'beal', 'bela', 'blae', 'blea']
* the actual set has sorted values
  ['abel', 'able', 'albe', 'bael', 'bale', 'beal', 'bela', 'blae', 'blea']

Out[20]: False

In unjumble.py, the if __name__ == '__main__': block contains the following code that you can uncomment and run to test all six test dictionaries.

    def testCompareUnjumbleDicts(testCases):
        for (actualName, actualDict, expectedName, expectedDict) in testCases:
            print('-'*60)
            print('compareUnjumbleDicts({}, {})'.format(expectedName, actualName))
            result = compareUnjumbleDicts(expectedDict, actualDict)
            print('compareUnjumbleDicts returns {}'.format(result))

    smallDictTests = [
        ('smallDict1', smallDict1, 'smallDictCorrect', smallDictCorrect),
        ('smallDict2', smallDict2, 'smallDictCorrect', smallDictCorrect),
        ('smallDict3', smallDict3, 'smallDictCorrect', smallDictCorrect),
        ('smallDict4', smallDict4, 'smallDictCorrect', smallDictCorrect),
        ('smallDict5', smallDict5, 'smallDictCorrect', smallDictCorrect),
        ('smallDict6', smallDict6, 'smallDictCorrect', smallDictCorrect),
        ]

    testCompareUnjumbleDicts(smallDictTests)

Here are the outputs expected from these tests:

------------------------------------------------------------
compareUnjumbleDicts(smallDictCorrect, smallDict1)

The expected dict has 483 keys but the actual dict has 509 keys.

Some keys in the expected dict but not the actual dict:
["'aehor", "'aehos", "'aelory", "'aillnosuv", "'beinor"]

Some keys in the actual dict but not the expected one:
["'boeinr", "'coellnno", "'connoor", "'doell", "'doellnno"]

Both dicts have key "abel", but the sets at this key differ:
* the expected set has sorted values
  ['Abel', 'Bela', 'Elba', 'abel', 'able', 'albe', 'bael', 'bale', 'beal', 'bela', 'blae', 'blea']
* the actual set has sorted values
  ['abel', 'able', 'albe', 'bael', 'bale', 'beal', 'bela', 'blae', 'blea']

compareUnjumbleDicts returns False
------------------------------------------------------------
compareUnjumbleDicts(smallDictCorrect, smallDict2)

Both the expected and actual dicts have 483 keys.

Both the expected and actual dicts have exactly the same keys.

Both dicts have key "'aehor", but the sets at this key differ:
* the expected set has sorted values
  ["O'Hare"]
* the actual set has sorted values
  ['OHare']

compareUnjumbleDicts returns False
------------------------------------------------------------
compareUnjumbleDicts(smallDictCorrect, smallDict3)

Both the expected and actual dicts have 483 keys.

Both the expected and actual dicts have exactly the same keys.

Both dicts have exactly the same sets for all keys that are in both dicts.

compareUnjumbleDicts returns True
------------------------------------------------------------
compareUnjumbleDicts(smallDictCorrect, smallDict4)

The expected dict has 483 keys but the actual dict has 482 keys.

Some keys in the expected dict but not the actual dict:
["'aehor", "'aehos", "'aelory", "'aillnosuv", "'beinor"]

Some keys in the actual dict but not the expected one:
['aaaabdgghitv', 'adlmou', 'aehor', 'aehos', 'aelory']

Both dicts have exactly the same sets for all keys that are in both dicts.

compareUnjumbleDicts returns False
------------------------------------------------------------
compareUnjumbleDicts(smallDictCorrect, smallDict5)

Both the expected and actual dicts have 483 keys.

Both the expected and actual dicts have exactly the same keys.

Both dicts have exactly the same sets for all keys that are in both dicts.

compareUnjumbleDicts returns True
------------------------------------------------------------
compareUnjumbleDicts(smallDictCorrect, smallDict6)

Both the expected and actual dicts have 483 keys.

Both the expected and actual dicts have exactly the same keys.

Both dicts have key "'aehor", but the sets at this key differ:
* the expected set has sorted values
  ["O'Hare"]
* the actual set has sorted values
  ["o'hare"]

compareUnjumbleDicts returns False

Using compareUnjumbleDicts to test your makeUnjumbleDictionary

The whole point of defining compareUnjumbleDicts is to test if the dictionaries returned by your makeUnjumbleDictionary function are correct.

Let's assume that you first uncomment the following lines from the if __name__ == '__main__': block in unjumble.py:

    tinyDict = makeUnjumbleDictionary(tinyWordList)
    smallDict = makeUnjumbleDictionary(smallWordList)
    mediumDict = makeUnjumbleDictionary(mediumWordList)
    largeDict = makeUnjumbleDictionary(largeWordList)

    print 'loading unjumbleDictionaries for testing ...'
    from unjumbleDicts import *

Then you can uncomment the following lines to compare the expected dictionaries against your actual ones:

    testCompareUnjumbleDicts(smallDictTests)

    myDictTests = [
        ('minisculeDict', minisculeDict, 
         'minisculeDictCorrect', minisculeDictCorrect),
        ('tinyDict', tinyDict, 'tinyDictCorrect', tinyDictCorrect),
        ('smallDict', smallDict, 'smallDictCorrect', smallDictCorrect),
        ('mediumDict', mediumDict, 'mediumDictCorrect', mediumDictCorrect),
        ('largeDict', largeDict, 'largeDictCorrect', largeDictCorrect),
        ]

Task 2: Titanic Analysis

This task is a Partner-recommended problem in which it is recommended that you work with a partner as part of a two-person team. Use this shared Google Doc to find a pair programming partner and record your partner.

Background

In this task, you will analyze data about passengers from the Titanic, a large ship that tragically sank in the waters of the North Atlantic Ocean in 1912.

Passenger data can be found in the titanicdata.txt file provided in the ps07 folder. Study the structure of the data in titanicdata.txt. Then, study the code in the readtitanic.py file, which reads the contents of titanicdata.txt and transforms it into a list of dictionaries, where every list element is a passenger represented by a dictionary. Because the information for the passengers is incomplete, not all passenger dictionaries have the same keys. Try the examples below in your interactive console after running readtitanic.py. Notice how passenger 2168 has six keys in the dictionary, but passenger 2203 has only four keys.

In [25]: len(passengerList)
Out[25]: 2208

In[26]: passengerList[2168]
Out[26]:
{'age': '31.0',
 'class': '1st Class',
 'group': 'Servant',
 'job': 'Personal Maid',
 'name': 'Miss Helen Alice Wilson',
 'status': 'survivor'}

In[27]: passsengerList[2203]
Out[27]:
{'age': '22.0',
 'class': '3rd Class',
 'name': 'Mr Mapriededer Zakarian',
 'status': 'victim'}

There is even a passenger who has only three keys in the dictionary. UNGRADED CHALLENGE: Can you find this passenger using a single line list comprehension statement? UNGRADED EXTRA CHALLENGE: Can you find both this passenger and his index in the list with one line of code?

Open up titanic.py in your ps07 folder. This file imports the passengerList variable from readtitanic.py. You will define the functions described in the subtasks below in titanic.py.

Subtask 2a createSurvivalDictionary

In this task, you will define a createSurvivalDictionary function that takes a list of passenger dictionaries (such as passengerList) and creates a dictionary that keeps track of the number of survivors and victims in every cabin class, as well as the calculated survival rate by cabin class. Below is the contract of the function you'll need to write. (The header is already in titanic.py).

def createSurvivalDictionary(passengers):
   """Given a list of passenger dictionaries, return a dictionary
   whose keys are all the cabin classes that appear in passengers.
   The value associated with each cabin class name should
   itself be another dictionary that has three key/value pairs:
     (1) The key "survivors" maps to the number of survivors in
         the cabin class;
     (2) The key "victims" maps to the number of victims in the
         cabin class;
     (3) The key "survivalRate" maps to the survival rate in the
         cabin class (a floating point number rounded to 3 decimal digits)
    """

Testing:

In[30]: createSurvivalDictionary(passengerList)
Out[30]:
{'1st Class': {'survivalRate': 0.62, 'survivors': 201, 'victims': 123},
'2nd Class': {'survivalRate': 0.418, 'survivors': 119, 'victims': 166},
'3rd Class': {'survivalRate': 0.254, 'survivors': 180, 'victims': 528},
'A la Carte': {'survivalRate': 0.043, 'survivors': 3, 'victims': 66},
'Deck': {'survivalRate': 0.652, 'survivors': 43, 'victims': 23},
'Engine': {'survivalRate': 0.222, 'survivors': 72, 'victims': 253},
'Victualling': {'survivalRate': 0.218, 'survivors': 94, 'victims': 337}}

Algorithm Notes

Here we sketch two slightly different algorithms for solving this subtask. We recommend that you implement whichever one you find more intuitive. (And if you prefer yet another approach that works, that's fine, too.)

Algorithm 1

  1. Find all the cabin classes (using the function set). Hint, accumulate all the cabin classes, then apply set.

    • NOTE: if you feel comfortable with list comprehension, it's a great place to use it here.

    • DO NOT hard-code the keys by looking up what the cabin classes are. Imagine that we give you another data set like this, for example about the sinking of SS Hong Mo, where the cabin names are different.

  2. Make a survival dictionary that associates each cabin class with a mini-dictionary for cabins , {'survivors': 0, 'victims': 0}.

    • DO NOT create a single mini-dictionary miniDct = {'survivors': 0, 'victims': 0}, and then use the variable miniDict to map it to every key. This will give incorrect results due to the aliasing issues we've been warning you about, because all of your keys will be pointing to the same dictionary value.

      Instead, within a for loop assign the value {'survivors': 0, 'victims': 0} directly to each cabin class (from the set you generated in step 1.)

  3. Iterate through the passenger list, updating the mini-dictionary for the cabin class of each passenger.

  4. Once all the passengers have been processed, the correct survivalRate value can be assigned to each mini-dictionary based on the number of survivors and victims (adding a new key:value item to the existing mini-dicts).

Algorithm 2: Algorithm 1 iterates through all passengers in the passenger list twice (once to find the names of the cabin classes and create mini-dictionaries for each class; and a second time to populate the mini-dictionaries). Algorithm 2 iterates through all passengers only once, creating a mini-dictionary for each cabin class the first time it encounters that cabin class.

  1. Create an initially empty survival dictionary that will eventually associate each cabin class with a mini-dictionary for cabins.

  2. Iterate through the passenger list, performing the following substeps for each passenger:

    a. Determine the cabin class of the passenger.

    b. If this cabin class does not already exists in the survival dictionary, associate it in that dictionary with a new mini-dictionary {'survivors': 0, 'victims': 0}

    c. Update mini-dict for cabin class with survival status of passenger.

  3. Once all the passengers have been processed, the correct survivalRate value can be assigned to each mini-dictionary based on the number of survivors and victims (adding a new key:value item to the existing mini-dicts).

Other Notes

Subtask 2b: Bar Chart of Survival Rates by Class

In this subtask, you will flesh out a function barChartOfSurvivalRates that generates a horizontal bar chart showing the percentage of survivors by cabin class, in decreasing order by survival rate. For example, barChartOfSurvivalRates(passengerList) should create this bar chart:

To generate this bar chart, you will flesh out this function skeleton barChartOfSurvivalRates in titanic.py:

def barChartOfSurvivalRates(passengers):
    """Given a list of passenger dictionaries, displays a horizontal bar chart
    of the sorted percentage of survival rates for each cabin class.
    A percentage value is a floating point number between 0.0 and 1.0.
    """

    #----------------- DO NOT CHANGE THE LINES IN THIS SECTION ----------------
    # 
    # Prepare the data for the plot
    survivalDict = createSurvivalDictionary(passengers)

    # Configure the matplotlib figure
    plt.figure(figsize=(6,4), facecolor='white')

    #---------------------- PUT YOUR CODE IN THIS SECTION ---------------------
    #
    # flesh this out, writing all your plotting code here.       

    #----------------- DO NOT CHANGE THE LINES IN THIS SECTION ----------------
    plt.tight_layout()
    plt.savefig('survivalBarChart.png') # saves plot in file                    

    #---------------- COMMENT OUT plt.show() BEFORE SUBMISSION ----------------
    plt.show()

Notes:

Subtask 2c: Pie Chart of Victims' Cabin Classes

In this subtask, you will flesh out a function pieChartOfVictimsCabinClasses that generates a pie chart showing the relative number of Titanic victims from each cabin class. For example, pieChartOfVictimsCabinClasses(passengerList) should should create this bar chart:

To generate this bar chart, you will flesh out the function skeleton pieChartOfVictimsCabinClasses in titanic.py:

def pieChartOfVictimsCabinClasses(passengers):
    """Given a list of passenger dictionaries, displays a pie chart
    showing the relative number of victims from each cabin class.
    Pie slices should be labeled with the cabin names, and ordered
    clockwise from smallest pie slice to largest pie slice
    starting at 12 o'clock.
    """

    #----------------- DO NOT CHANGE THE LINES IN THIS SECTION ----------------
    #
    # Prepare the data for the plot

    survivalDict = createSurvivalDictionary(passengers)

    # Configure the matplotlib figure and pie slice colors
    plt.figure(figsize=(6,6), facecolor='white')
    colormap = plt.cm.Set2
    colorsList = colormap(np.linspace(0., 1., len(survivalDict)))

    #---------------------- PUT YOUR CODE IN THIS SECTION ---------------------
    #
    # flesh this out, writing all your plotting code here.

    #------------------------- DO NOT CHANGE THIS LINE ------------------------
    plt.savefig('cabinClassesPieChart.png') # saves plot in file

    #---------------- COMMENT OUT plt.show() BEFORE SUBMISSION ----------------
    plt.show()

Notes:


Task 3: Honor Code Form

As in the previous psets, your honor code submission for this pset will involve defining entering values for the variables in the honorcode.py file.


How to turn in this Problem Set

Softcopy (electronic) submission

Hardcopy (paper) submission

There is no hardcopy submission. Problem set submission is entirely electronic.