Lecture 10 - Collections: Dictionaries

Sequences vs. Collections

We have seen two kinds of sequences: list and string. We haven't seen yet a tuple:

Tuples

In [1]:
person = ('Harry', 'Potter')
date = ('February', 29, 2016)
In [2]:
type(person)
Out[2]:
tuple

Tuples are similar to lists and strings in terms of most operations: subscripting, membership with in, slicing, and use of len:

In [3]:
person[1]
Out[3]:
'Potter'
In [4]:
2016 in date
Out[4]:
True
In [5]:
len(date)
Out[5]:
3
In [6]:
date[:2]
Out[6]:
('February', 29)

Tuples are immutable: like strings, but differently from lists, tuples are immutable.

In [7]:
date[1] = 28
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-7-c6e74cd94257> in <module>()
----> 1 date[1] = 28

TypeError: 'tuple' object does not support item assignment

Why do we care about tuples?

Tuples are used often in python to do multiple assignments in one single statement:

In [8]:
a, b = 0, 1
print a, b
0 1
In [9]:
0, 1
Out[9]:
(0, 1)
In [10]:
a, b
Out[10]:
(0, 1)

It turns out that doing multiple variable assignments in one step is useful (we'll see it later today with the method .items()).

Dictionaries

Dictionaries are collections that map keys to values:

In [11]:
daysOfMonth = {'Jan': 31, 'Feb': 28, 'Mar': 31, 'Apr': 30,
               'May': 31, 'Jun': 30, 'Jul': 31, 'Aug': 31,
               'Sep': 30, 'Oct': 30, 'Nov': 30} # one month is missing
daysOfMonth
Out[11]:
{'Apr': 30,
 'Aug': 31,
 'Feb': 28,
 'Jan': 31,
 'Jul': 31,
 'Jun': 30,
 'Mar': 31,
 'May': 31,
 'Nov': 30,
 'Oct': 30,
 'Sep': 30}

Important Note: Dictionaries are unordered. Notice that the order we entered the key,value pairs differs from the value that was outputed.

Use subscripting to look up values associated with a key.

In [12]:
daysOfMonth['Jan']  
Out[12]:
31

Trying to look for a key that doesn't exist...

In [13]:
daysOfMonth['October']
---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
<ipython-input-13-7118df37cbb8> in <module>()
----> 1 daysOfMonth['October']

KeyError: 'October'

Note: Indexing with 0, 1, ..., will not work, the same way 'October' didn't work, because subscription is based only on the keys.

Exercise 1: Scrabble score of a letter

In [14]:
scrabbleDct = {'a': 1, 'b': 3, 'c': 3, 'd': 2, 'e': 1, 'f': 4, 'g': 2, 'h': 4, 'i': 1, 'j': 8, 
              'k': 5, 'l': 1, 'm': 3, 'n': 1, 'o': 1, 'p': 3, 'q': 10, 'r': 1, 's': 1, 't': 1, 
              'u': 1, 'v': 4, 'w': 4, 'x': 8, 'y': 4, 'z': 10}

Here is the old version of the scrabblePoints() function that uses if, elif, else statements.

In [15]:
def scrabblePoints(letter):
    if letter in 'aeilnorstu':
        return 1
    elif letter in 'dg':
        return 2
    elif letter in 'bcmp':
        return 3
    elif letter in 'fhvwy':
        return 4
    elif letter in 'k':
        return 5
    elif letter in 'jx':
        return 8
    elif letter in 'qz':
        return 10
    return 0


scrabblePoints('j')
Out[15]:
8

Your task: Write the scrabblePoints2() function that uses the scrabbleDct to look-up a value for a letter.

In [16]:
# Write the code here

def scrabblePoints2(letter):
    if letter in scrabbleDct:
        return scrabbleDct[letter]
    else:
        return 0
    
scrabblePoints('j')
Out[16]:
8

Different ways to create dictionaries

The built-in function dict()

In [17]:
d = dict() # creates an empty dict
d
Out[17]:
{}

Besides writing out the key: value pairs as we did in daysOfMonth, we can create a dictionary from a list of tuples
where every tuple is of length 2, with the first object an immutable one.

In [18]:
d = dict([('a', 1)]) # a list of tuples
d
Out[18]:
{'a': 1}

A tuple that is not part of a list will not work:

In [19]:
dict(('a', 1))
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-19-912ba7953c73> in <module>()
----> 1 dict(('a', 1))

ValueError: dictionary update sequence element #0 has length 1; 2 is required

By providing a list of two-element tuples, we can create dictionaries easily:

In [20]:
points = dict([('a', 1), ('b', 3), ('c', 3), ('d', 2), ('e', 1)])
points
Out[20]:
{'a': 1, 'b': 3, 'c': 3, 'd': 2, 'e': 1}

Exploration:

What happens when we use tuples that are not exactly length 2, or have mutable objects in the first index?

In [21]:
short_tup = ('Hermione Granger',)
dict([short_tup])
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-21-abcf26f9c816> in <module>()
      1 short_tup = ('Hermione Granger',)
----> 2 dict([short_tup])

ValueError: dictionary update sequence element #0 has length 1; 2 is required
In [22]:
long_tup = ('Hermione Granger', 1, 2017, 'Gryffindor')
dict([long_tup])
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-22-c9aff79a00bf> in <module>()
      1 long_tup = ('Hermione Granger', 1, 2017, 'Gryffindor')
----> 2 dict([long_tup])

ValueError: dictionary update sequence element #0 has length 4; 2 is required
In [23]:
list_tup = (['Hermione', 'Granger'], 2017)
dict([list_tup])
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-23-d8da320bfe37> in <module>()
      1 list_tup = (['Hermione', 'Granger'], 2017)
----> 2 dict([list_tup])

TypeError: unhashable type: 'list'
In [24]:
good_tup = ('Hermione Granger', 2017)
dict([good_tup])
Out[24]:
{'Hermione Granger': 2017}

Why care about list of tuples?

In what situation might we want to create a dictionary from a list of tuples?
It turns out that we can easily generate list of tuples with code. Below is shown such a way, that uses the built-in function zip().

In [25]:
from string import lowercase  # lowercase is the string, "abcdefghijklmnopqrstuvwxyz"
letters = list(lowercase)
scores = [1, 3, 3, 2, 1, 4, 2, 4, 1, 8, 5, 1, 3, 1, 1, 3, 10, 1, 1, 1, 1, 4, 4, 8, 4, 10]
print dict(zip(letters, scores))
{'a': 1, 'c': 3, 'b': 3, 'e': 1, 'd': 2, 'g': 2, 'f': 4, 'i': 1, 'h': 4, 'k': 5, 'j': 8, 'm': 3, 'l': 1, 'o': 1, 'n': 1, 'q': 10, 'p': 3, 's': 1, 'r': 1, 'u': 1, 't': 1, 'w': 4, 'v': 4, 'y': 4, 'x': 8, 'z': 10}

Let's unpack what happened above:

In [26]:
print "lowercase ->", lowercase
print "letters ->", letters
print "zip output ->", zip(letters, scores)
lowercase -> abcdefghijklmnopqrstuvwxyz
letters -> ['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j', 'k', 'l', 'm', 'n', 'o', 'p', 'q', 'r', 's', 't', 'u', 'v', 'w', 'x', 'y', 'z']
zip output -> [('a', 1), ('b', 3), ('c', 3), ('d', 2), ('e', 1), ('f', 4), ('g', 2), ('h', 4), ('i', 1), ('j', 8), ('k', 5), ('l', 1), ('m', 3), ('n', 1), ('o', 1), ('p', 3), ('q', 10), ('r', 1), ('s', 1), ('t', 1), ('u', 1), ('v', 4), ('w', 4), ('x', 8), ('y', 4), ('z', 10)]

Here are some easy examples of zip():

In [27]:
zip(range(5), 'abcde')
Out[27]:
[(0, 'a'), (1, 'b'), (2, 'c'), (3, 'd'), (4, 'e')]
In [28]:
zip(range(5,10), range(20,15,-1))
Out[28]:
[(5, 20), (6, 19), (7, 18), (8, 17), (9, 16)]

Mutability

Dictionaries are mutable:

  1. one can change an existing value
  2. one can add new pairs of key-values
In [29]:
daysOfMonth['Feb'] = 29   # change value associated with a key
daysOfMonth
Out[29]:
{'Apr': 30,
 'Aug': 31,
 'Feb': 29,
 'Jan': 31,
 'Jul': 31,
 'Jun': 30,
 'Mar': 31,
 'May': 31,
 'Nov': 30,
 'Oct': 30,
 'Sep': 30}
In [30]:
daysOfMonth['Dec'] = 31   # add new key and value 
daysOfMonth
Out[30]:
{'Apr': 30,
 'Aug': 31,
 'Dec': 31,
 'Feb': 29,
 'Jan': 31,
 'Jul': 31,
 'Jun': 30,
 'Mar': 31,
 'May': 31,
 'Nov': 30,
 'Oct': 30,
 'Sep': 30}

However, keys of dictionaries must be immutable.

In [31]:
daysOfMonth[['Feb', 2015]] = 28    # try to use a key that has month and year in a list
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-31-e48079de2e9d> in <module>()
----> 1 daysOfMonth[['Feb', 2015]] = 28    # try to use a key that has month and year in a list

TypeError: unhashable type: 'list'
In [32]:
# but this works, because a tuple is "immutable"
daysOfMonth[('Feb', 2015)] = 28
daysOfMonth
Out[32]:
{'Apr': 30,
 'Aug': 31,
 'Dec': 31,
 'Feb': 29,
 'Jan': 31,
 'Jul': 31,
 'Jun': 30,
 'Mar': 31,
 'May': 31,
 'Nov': 30,
 'Oct': 30,
 'Sep': 30,
 ('Feb', 2015): 28}

What does "unhashable" mean?

hash

When Python stores the keys of a dictionary in memory, it stores their hashes, which is an integer returned by the hash() function. Only immutable objecs can be hashed.

In [33]:
hash("Wellesley")
Out[33]:
1371402960993349759
In [34]:
hash(['Feb', 2015])
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-34-26abf4af730b> in <module>()
----> 1 hash(['Feb', 2015])

TypeError: unhashable type: 'list'

Don't have to worry about why the keys are hashed, or how the hash() function works!

Dictionary Methods

A dictionary object has several methods that either query the state of the object, or modify its state.
We show below several examples that are summarized in slide 10-19.

In [35]:
daysOfMonth = {'Jan': 31, 'Feb': 28, 'Mar': 31, 'Apr': 30,
               'May': 31, 'Jun': 30, 'Jul': 31, 'Aug': 31,
               'Sep': 30, 'Oct': 30, 'Nov': 30} # one month is missing

daysOfMonth # the entire dict
Out[35]:
{'Apr': 30,
 'Aug': 31,
 'Feb': 28,
 'Jan': 31,
 'Jul': 31,
 'Jun': 30,
 'Mar': 31,
 'May': 31,
 'Nov': 30,
 'Oct': 30,
 'Sep': 30}
In [36]:
daysOfMonth.keys() # a list of keys only
Out[36]:
['Feb', 'Aug', 'Jan', 'Oct', 'Mar', 'Sep', 'May', 'Jun', 'Jul', 'Apr', 'Nov']
In [37]:
daysOfMonth.values() # a list of values only
Out[37]:
[28, 31, 31, 30, 31, 30, 31, 30, 31, 30, 30]
In [38]:
daysOfMonth.items() # a list of tuples for key and value
Out[38]:
[('Feb', 28),
 ('Aug', 31),
 ('Jan', 31),
 ('Oct', 30),
 ('Mar', 31),
 ('Sep', 30),
 ('May', 31),
 ('Jun', 30),
 ('Jul', 31),
 ('Apr', 30),
 ('Nov', 30)]

Mutating methods

In [39]:
daysOfMonth.pop('Sep')
Out[39]:
30
In [40]:
daysOfMonth
Out[40]:
{'Apr': 30,
 'Aug': 31,
 'Feb': 28,
 'Jan': 31,
 'Jul': 31,
 'Jun': 30,
 'Mar': 31,
 'May': 31,
 'Nov': 30,
 'Oct': 30}
In [41]:
rest = {'Sep': 30, 'Dec': 31}
daysOfMonth.update(rest)
In [42]:
daysOfMonth
Out[42]:
{'Apr': 30,
 'Aug': 31,
 'Dec': 31,
 'Feb': 28,
 'Jan': 31,
 'Jul': 31,
 'Jun': 30,
 'Mar': 31,
 'May': 31,
 'Nov': 30,
 'Oct': 30,
 'Sep': 30}
In [43]:
daysOfMonth.clear()
daysOfMonth
Out[43]:
{}

Iterating over a dictionary

There are many ways to iterate over a dictionary:

  1. over the keys (either directly or over .keys())
  2. over the values (with .values())
  3. over the items (with .items())
In [44]:
phones = {5558671234: 'Justin Bieber', 9996541212: 'Shakira', 7811234567: 'Kim Bottomly'}
In [45]:
# iterate directly (by default Python goes over the keys, because they are unique)
for key in phones:
    print key, phones[key]
5558671234 Justin Bieber
9996541212 Shakira
7811234567 Kim Bottomly
In [46]:
for key in phones.keys():
    print key, phones[key]
5558671234 Justin Bieber
9996541212 Shakira
7811234567 Kim Bottomly
In [47]:
for val in phones.values():
    print val
Justin Bieber
Shakira
Kim Bottomly

Question: Can we go from values to keys, as we did from keys to values? What can we say about keys and values in a dictionary?

In [48]:
daysOfMonth = {'Jan': 31, 'Feb': 28, 'Mar': 31, 'Apr': 30,
               'May': 31, 'Jun': 30, 'Jul': 31, 'Aug': 31,
               'Sep': 30, 'Oct': 31, 'Nov': 30, 'Dec': 31}

Answer: Going from values to keys is more complicated and there is no straighforward way as the subsripting we used for the keys.
This is because the keys are unique and we know that there is a one-to-one mapping from keys to values. However, as you can see in the
dictionary daysOfMonth, a value, for example 30, corresponds to many keys, so the program cannot know which one the user needed.

In [49]:
for item in daysOfMonth.items():
    print item
('Feb', 28)
('Aug', 31)
('Jan', 31)
('Dec', 31)
('Oct', 31)
('Mar', 31)
('Sep', 30)
('May', 31)
('Jun', 30)
('Jul', 31)
('Apr', 30)
('Nov', 30)

Multiple variable assignments

In [50]:
for key, val in daysOfMonth.items():
    print key, "has", val, "days"
Feb has 28 days
Aug has 31 days
Jan has 31 days
Dec has 31 days
Oct has 31 days
Mar has 31 days
Sep has 30 days
May has 31 days
Jun has 30 days
Jul has 31 days
Apr has 30 days
Nov has 30 days

Exercise 2: Word Frequencies

In [51]:
def frequencies(wordList):
    """Given a list of words, returns a dictionary of word frequencies"""
    # create an empty dict
    # iterate through the words of the list
    # set the value or update the value for each word
    freqs = {}
    for word in wordList:
        if word in freqs:
            freqs[word] += 1
        else:
            freqs[word] = 1
    return freqs
    
frequencies('I know that I said that I did'.split())
Out[51]:
{'I': 3, 'did': 1, 'know': 1, 'said': 1, 'that': 2}

The method .get()

The method .get() is used to avoid the step of checking for a key before updating.
This is possible because this method will return a "default" value when the key is not in the dictionary.
In all other cases, it will return the value associated with the given key.

In [52]:
daysOfMonth.get('Oct', 'unknown') 
Out[52]:
31
In [53]:
daysOfMonth.get('OCT', 'unknown') 
Out[53]:
'unknown'
In [ ]:
 

Exercise 3: Rewrite frequencies to use .get()

In [56]:
def frequencies2(wordList):
    """Given a list of words, returns a dictionary of word frequencies"""
    # create an empty dict
    # iterate through the words of the list
    # set the value or update the value for each word
    freqs = {}
    for word in wordList:
        freqs[word] = freqs.get(word, 0) + 1
        
    return freqs
    
frequencies2('I know that I said that I did'.split())
Out[56]:
{'I': 3, 'did': 1, 'know': 1, 'said': 1, 'that': 2}

Exercise 4: Write a function to find largest value

Write the function getKeyWithMaxValue() that behaves as shown below:

In [2]: getKeyWithMaxValue({'A': 0.25, 'E': 0.36, 
                            'I': 0.16, 'O': 0.18, 'U': 0.05})

Out[2]: 'E'

Hint: Remember the built-in function max().

In [58]:
def getKeyWithMaxValue(dct):
    """Given a dict whose values are numbers, return the key that
       corresponds to the highest value.
    """
    # enter your code
    maxVal = max(dct.values()) # find max value among all values
    for key in dct:
        if dct[key] == maxVal:
            return key  # early return
            
getKeyWithMaxValue({'A': 0.25, 'E': 0.36, 'I': 0.16, 'O': 0.18, 'U': 0.05})
Out[58]:
'E'

Challenge yourself

If you solved Ex. 4 with a for loop, try now an approach without a for loop.

Hint: Remember the buil-in function zip() we introduced earlier.

In [59]:
def getKeyWithMaxValue2(dct):
    """Given a dict whose values are numbers, return the key that
       corresponds to the highest value.
    """
    # enter your code
    pairs = zip(dct.values(), dct.keys())
    maxPair = max(pairs)
    return maxPair[1] # this will have the key, since the first item is the value
    
    
getKeyWithMaxValue2({'A': 0.25, 'E': 0.36, 'I': 0.16, 'O': 0.18, 'U': 0.05})
Out[59]:
'E'

Exercise 5: Reverse Lookup

Write a function that takes a dict that has many similar values and creates a new dict where the keys are the unique values and
the values are lists of the keys.

Example:

In [30]: disneyOlympicResults = {'mickey': 'silver', 'minnie': 'gold', 
                        'donald': 'bronze', 'daisy': 'bronze',
                        'goofy': 'bronze', 'ariel': 'silver', 
                        'nemo': 'gold', 'mulan': 'silver', 'elsa': 'gold'}

In [31]: reverseDictionary(disneyOlympicResults)
Out[31]: 
{'bronze': ['donald', 'goofy', 'daisy'],
 'gold': ['elsa', 'nemo', 'minnie'],
 'silver': ['mickey', 'mulan', 'ariel']}
In [60]:
def reverseDictionary(dct):
    """Given a dict that has many similar values, returns a new dict where 
       the keys are the unique values and the values are lists of the keys.
    """
    reverseD = {}
    for key, value in dct.items():
        if value not in reverseD:
            reverseD[value] = [key]
        else:
            reverseD[value].append(key)
    return reverseD
In [61]:
disneyOlympicResults = {'mickey': 'silver', 'minnie': 'gold', 
                        'donald': 'bronze', 'daisy': 'bronze',
                        'goofy': 'bronze', 'ariel': 'silver', 
                        'nemo': 'gold', 'mulan': 'silver', 'elsa': 'gold'}

reverseDictionary(disneyOlympicResults)
Out[61]:
{'bronze': ['donald', 'goofy', 'daisy'],
 'gold': ['elsa', 'nemo', 'minnie'],
 'silver': ['mickey', 'mulan', 'ariel']}
In [ ]: