CS 111 Lecture: Real-world Data

Table of Contents

  1. Sorting a list of numbers
  2. Sorting other sequences
  3. Sorting a list of sequences
    3.1 Sorting a list of strings
    3.2 Sorting a list of tuples
  4. Sorting with the key parameter
  5. EXERCISES: Your turn to sort
  6. Mutating list methods for sorting
  7. Sorting dictionaries and list of dictionaries
  8. EXERCISES: Sort the Census data to answer the questions below

In the previous lecture on File Formats, we worked with the file us-states-more.csv, which contained information about each state, such as:

State,StatePop,Abbrev.,Capital,CapitalPop
Alabama,4921532,AL,Montgomery,198525
Alaska,731158,AK,Juneau,32113
Arizona,7421401,AZ,Phoenix,1680992
Arkansas,3030522,AR,Little Rock,197312
California,39368078,CA,Sacramento,513624
Colorado,5807719,CO,Denver,727211

These are real-world data retrieved from the Census website. In the United States, representation in Congress (House of Representatives) is decided about changes in states' population every ten years. In this lecture, you will learn to write Python code to answer questions with Census data:

  • Which are the most populated US states? Rank the data in that order.
  • Which are the least populated US states? Rank the data in that order.
  • Which US state capitals are the most populated? Rank the data in that order.
  • Which US state capitals are the least populated? Rank the data in that order.
  • What percentage of each US state’s population lives in the state capital? Rank the data by that percentage from the largest to the smallest.

In order to answer these questions, we first need to learn about sorting sequences.

1. Sorting a list of numbers

Let's start with the simplest case, sorting a list of unordered numbers, positive and negative.

In [1]:
numbers = [35, -2, 17, -9, 0, 12, 19] 

We will use Python's built-in function, sorted to sort the list. This function always returns a new list.

In [2]:
sorted(numbers)
Out[2]:
[-9, -2, 0, 12, 17, 19, 35]

And we can verify that numbers hasn't changed:

In [3]:
numbers
Out[3]:
[35, -2, 17, -9, 0, 12, 19]

Note: This suggests that if we want to use the result of sorted, we must define a variable to save its returned value, for example:

sortedNumbers = sorted(numbers)

By default the list is sorted in the ascending order (from the smalled value to the largest), but we can easily reverse the order, using a named parameter of the function sorted, as shown below:

In [4]:
sorted(numbers, reverse=True)
Out[4]:
[35, 19, 17, 12, 0, -2, -9]

The use of the named parameter, reverse, in this function call is new syntax that you haven't seen before. We will get back to it later in this lecture.

2. Sorting other sequences

Strings and tuples can also be sorted in the same way. The result is always going to be a new list.

In [5]:
phrase = 'Red Code 1'
In [6]:
sorted(phrase)
Out[6]:
[' ', ' ', '1', 'C', 'R', 'd', 'd', 'e', 'e', 'o']

Question: Why do we see a space as the first element in the sorted list?
Answer: Because of the ASCII representation of characters.

We can use the Python built-in function ord to find the ASCII code of every character:

In [7]:
ord(' ')
Out[7]:
32

We can write a for loop to print the code for every character.
We use the f-string method to format the output.

In [8]:
for item in sorted(phrase):
    print(f"'{item}' has ASCII code {ord(item)}")
' ' has ASCII code 32
' ' has ASCII code 32
'1' has ASCII code 49
'C' has ASCII code 67
'R' has ASCII code 82
'd' has ASCII code 100
'd' has ASCII code 100
'e' has ASCII code 101
'e' has ASCII code 101
'o' has ASCII code 111

Just as in the case of the list numbers in the above example, the string value of phrase hasn't changed:

In [9]:
phrase
Out[9]:
'Red Code 1'

This is to be expected, because strings are immutable.

Tuples can be sorted too

In [10]:
digits = (9, 7, 5, 3, 1) # this is a tuple 
In [11]:
type(digits) # check the type
Out[11]:
tuple
In [12]:
sorted(digits)
Out[12]:
[1, 3, 5, 7, 9]

Notice that the result of the sorting is a list, not a tuple. This is because the function sorted always returns a list.

In [13]:
digits
Out[13]:
(9, 7, 5, 3, 1)

The original tuple value hasn't changed.

3. Sorting a list of sequences

We can sort list of sequences such as list of strings, list of tuples, and list of lists.
Sorting the list of tuples and the list of lists is going to be similar. The same principles will apply.

3. 1 Sorting a list of strings

In [14]:
# a long string that we will split into a list of words

phrase = "99 red balloons *floating* in the Summer sky" 
words = phrase.split()
words
Out[14]:
['99', 'red', 'balloons', '*floating*', 'in', 'the', 'Summer', 'sky']
In [15]:
sorted(words)
Out[15]:
['*floating*', '99', 'Summer', 'balloons', 'in', 'red', 'sky', 'the']

Question: Can you explain the results of sorting here? What rules are in place?
Answer: Words that contain special characters come first, then words that contain digits, words starting with uppercase letters, and finally, words with lowercase letters in alphabetical order. This ordering corresponds to the ASCII table numerical representations of each word's first character.

String characters are ordered by these rules:

  • Punctuation symbols (. , ; : * ! # ^)
  • Digits
  • Uppercase letters
  • Lowercase letters
In [16]:
sorted(words, reverse=True)
Out[16]:
['the', 'sky', 'red', 'in', 'balloons', 'Summer', '99', '*floating*']

Remember, the original list is unchanged:

In [17]:
words
Out[17]:
['99', 'red', 'balloons', '*floating*', 'in', 'the', 'Summer', 'sky']

3.2 Sorting a list of tuples

Tuples are compared element by element, starting with the one at index 0. This is known as lexicographic order, which is a generalization of dictionary order on strings in which each tuple element generalizes a character in a string.

In [18]:
triples = [(8, 'a', '$'), (7, 'c', '@'),
           (7, 'b', '+'), (8, 'a', '!')] 

sorted(triples)
Out[18]:
[(7, 'b', '+'), (7, 'c', '@'), (8, 'a', '!'), (8, 'a', '$')]

Q: What happens in the case of ties for the first elements of tuples?
A: We keep comparing elements with the same indices until we find two that are not the same. (See example for the two tuples that start with 8.)

In [19]:
ord('!') < ord('$')
Out[19]:
True
In [20]:
print(ord('!'), ord('$'))
33 36

That is, the reason '!' is less than '$' is that the first has a smaller ASCII code than the latter.

4. Sorting with the key parameter

Often there are cases in which we want to sort by an element that is not first in a sequence, for example, given the list of tuples people (below), we want to sort by the age of a person.

people = [('Mary Beth Johnson', 18), 
          ('Ed Smith', 17), 
          ('Janet Doe', 25), 
          ('Bob Miller', 31)]

Simply using sorted as we have done so far will not work. But the function sorted has been designed to deal with this scenario in mind. Let us read its doc string.

In [21]:
help(sorted)
Help on built-in function sorted in module builtins:

sorted(iterable, /, *, key=None, reverse=False)
    Return a new list containing all items from the iterable in ascending order.
    
    A custom key function can be supplied to customize the sort order, and the
    reverse flag can be set to request the result in descending order.

Notice the phrase: A custom key function can be supplied to customize the sort order. This means that we can specify a function that for each element determines how it should be compared to other elements of the iterable. Let us see an example.

In [22]:
people = [('Mary Beth Johnson', 18), 
          ('Ed Smith', 17), 
          ('Janet Doe', 25), 
          ('Bob Miller', 31)]

We'll create the function age that given a person tuple (name, age) will return the age value.

In [23]:
def age(personTuple):
    """Helper function to use in sorted"""
    return personTuple[1]
In [24]:
age(('Janet Doe', 25))
Out[24]:
25

Now that we have this function, we will use it as the value for key in sorted.

In [25]:
sorted(people, key=age)
Out[25]:
[('Ed Smith', 17),
 ('Mary Beth Johnson', 18),
 ('Janet Doe', 25),
 ('Bob Miller', 31)]

The list was sorted by the age values! Let's see one more example. We will create a helper function lastName that returns a person's last name.

In [26]:
def lastName(personTuple):
    """Helper function to use in sorted"""
    return personTuple[0].split()[-1]        # first access the whole name (has index=0 in the tuple)
                                             # then split it (will create a list), 
                                             # then return its last element (index=-1)
In [27]:
lastName(('Bob Miller', 31))
Out[27]:
'Miller'
In [28]:
sorted(people, key=lastName)
Out[28]:
[('Janet Doe', 25),
 ('Mary Beth Johnson', 18),
 ('Bob Miller', 31),
 ('Ed Smith', 17)]

Important: The parameter key is being assigned as its value a function value. Functions in Python are values, see the examples below:

In [29]:
age
Out[29]:
<function __main__.age(personTuple)>
In [30]:
lastName
Out[30]:
<function __main__.lastName(personTuple)>

We can create a variable, assign it a function value, and then call that variable as if it was a function (because indeed it's an alias for a function).

In [31]:
boo = age
boo(('Janet Doe', 25))
Out[31]:
25
In [32]:
foo = lastName
foo(('Ed Smith', 17))
Out[32]:
'Smith'

The variables boo and foo are aliases for the functions age and lastName, which we can easily verify:

In [33]:
boo
Out[33]:
<function __main__.age(personTuple)>
In [34]:
foo
Out[34]:
<function __main__.lastName(personTuple)>

Breaking ties with key functions

Assume we have a new list of person tuples, where there are lots of ambiguities in terms of what comes first. Concretely:

people2 = [('Ed Jones', 18), 
           ('Ana Doe', 25), 
           ('Ed Doe', 18),
           ('Bob Doe', 25), 
           ('Ana Jones', 18)]

Notice that we have several individuals with the same age, or the same first name, or the same last name. How should we sort elements in this situation?

We can create a function that uses a tuple to break the ties.

def ageLastFirst(person):
    return (age(person), lastName(person), firstName(person))

Your Turn

Write a function firstName, that mimics lastName, but returns the first name of a person.

In [35]:
# Your code here

def firstName(personTuple):
    """Helper function to use in sorted"""
    return personTuple[0].split()[0]

If you wrote firstName, now we can write ageLastFirst:

In [36]:
def ageLastFirst(person):
    """Helper function to use in sorted"""
    return (age(person), lastName(person), firstName(person))
In [37]:
people2 = [('Ed Jones', 18), 
           ('Ana Doe', 25), 
           ('Ed Doe', 18),
           ('Bob Doe', 25), 
           ('Ana Jones', 18)]

sorted(people2, key=ageLastFirst)
Out[37]:
[('Ed Doe', 18),
 ('Ana Jones', 18),
 ('Ed Jones', 18),
 ('Ana Doe', 25),
 ('Bob Doe', 25)]

Notice that in the result, the tuples are sorted first by age, then by last name (when the same age), and then by first name (when same age and last name).

How does sorting with key work?

When sorted is called with a key parameter, the first thing it does is to invoke the function that is referred to by key for each element of the sequence. If we think of the value returned by the key function as keyvalue, then what sorted does is to create a tuple (keyvalue,value), sort the sequence based on this tuple, and then get rid of the tuple and return the sorted values only.

This process is also known as Decorate, Sort, Undecorate and we can try it too:

In [38]:
# Step 1 (Decorate): create a list of tuples (keyvalue, value)
decorated = [(age(person), person) for person in people]
decorated
Out[38]:
[(18, ('Mary Beth Johnson', 18)),
 (17, ('Ed Smith', 17)),
 (25, ('Janet Doe', 25)),
 (31, ('Bob Miller', 31))]
In [ ]:
 
In [39]:
# Step 2 (Sort): invoke the function sorted without the key function
decoratedSorted = sorted(decorated)
decoratedSorted
Out[39]:
[(17, ('Ed Smith', 17)),
 (18, ('Mary Beth Johnson', 18)),
 (25, ('Janet Doe', 25)),
 (31, ('Bob Miller', 31))]
In [ ]:
 
In [40]:
# Step 3 (Undecorate): extract now the value from each (keyvalue,value) pair to create the end result
undecoratedResult = [item[1] for item in decoratedSorted]
undecoratedResult
Out[40]:
[('Ed Smith', 17),
 ('Mary Beth Johnson', 18),
 ('Janet Doe', 25),
 ('Bob Miller', 31)]

As you might remember, when we include key in sorted the result is the same:

In [41]:
sorted(people, key=age)
Out[41]:
[('Ed Smith', 17),
 ('Mary Beth Johnson', 18),
 ('Janet Doe', 25),
 ('Bob Miller', 31)]

Basically, the parameter key works, becuase of the rules for sorting a list of tuples, that we saw earlier on the notebook.

5. Exercises: Your turn to sort

Here is a list of the Space Missions where the second element in the tuple represents the number of days the mission lasted.

missions = [('Apollo 11', 8), 
            ('Mercury-Redstone 3', 1),
            ('Apollo 13', 5), 
            ('Gemini 3', 1), 
            ('Little Joe 6', 1)]

Some quick terminology:

  • Program name refers to 'Apollo' or 'Mercury-Redstone'
  • Mission number refers to the 11 in 'Apollo 11' or '3' in Gemini 3
  • Days refers to the number of days the mission lasted. In truth, many of the missions with one day listed lasted only a matter of minutes but have been rounded to 1 for ease.
In [42]:
missions = [('Apollo 11', 8), 
            ('Mercury-Redstone 3', 1),
            ('Apollo 13', 5), 
            ('Gemini 3', 1), 
            ('Little Joe 6', 1)]

Task 1: Sort by number of days

Using function sorted with key, sort the list of missions based on the number of days on space. This should produce:

[('Mercury-Redstone 3', 1),
 ('Gemini 3', 1),
 ('Little Joe 6', 1),
 ('Apollo 13', 5),
 ('Apollo 11', 8)]

You will have to create first a helper function to be used as a value for key.

In [43]:
# Your code here

def days(missionTuple):
    return missionTuple[1]

sorted(missions, key=days)
Out[43]:
[('Mercury-Redstone 3', 1),
 ('Gemini 3', 1),
 ('Little Joe 6', 1),
 ('Apollo 13', 5),
 ('Apollo 11', 8)]

Task 2: Sort by mission number

This should produce the list:

[('Mercury-Redstone 3', 1),
 ('Gemini 3', 1),
 ('Little Joe 6', 1),
 ('Apollo 11', 8),
 ('Apollo 13', 5)]

Similarly as above, create a helper function to be assigned to key.

In [44]:
# Your code here

def missionNumber(missionTuple):
    return int(missionTuple[0].split()[-1])

sorted(missions, key=missionNumber)
Out[44]:
[('Mercury-Redstone 3', 1),
 ('Gemini 3', 1),
 ('Little Joe 6', 1),
 ('Apollo 11', 8),
 ('Apollo 13', 5)]

Task 3: Sort by program name

This should produce the list:

[('Apollo 11', 8),
 ('Apollo 13', 5),
 ('Gemini 3', 1),
 ('Little Joe 6', 1),
 ('Mercury-Redstone 3', 1)]

Again, start by creating a helper function to be the value for key.

In [45]:
# Your code here

def programName(missionTuple):
    return ''.join(missionTuple[0].split()[:-1])

sorted(missions, key=programName)
Out[45]:
[('Apollo 11', 8),
 ('Apollo 13', 5),
 ('Gemini 3', 1),
 ('Little Joe 6', 1),
 ('Mercury-Redstone 3', 1)]

Task 4: Sort by days, program name, and mission number

Here, you'll combine together all the functions you wrote above, to produce this list:

[('Gemini 3', 1),
 ('Little Joe 6', 1),
 ('Mercury-Redstone 3', 1),
 ('Apollo 13', 5),
 ('Apollo 11', 8)]
In [46]:
# Your code here

def daysProgramNumber(missionTuple):
    return (days(missionTuple), programName(missionTuple), missionNumber(missionTuple))

sorted(missions, key=daysProgramNumber)
Out[46]:
[('Gemini 3', 1),
 ('Little Joe 6', 1),
 ('Mercury-Redstone 3', 1),
 ('Apollo 13', 5),
 ('Apollo 11', 8)]

6. Mutating list methods for sorting

Lists have two methods for sorting. These methods mutate the original list. They are sort and reverse.

Method sort

In [47]:
numbers = [35, -2, 17, -9, 0, 12, 19]
numbers.sort() 

Notice that no value was returned, because sort mutates the original list.

In [48]:
numbers
Out[48]:
[-9, -2, 0, 12, 17, 19, 35]

Method reverse

In [49]:
numbers2 = [35, -2, 17, -9, 0, 12, 19]
numbers2.reverse()

This method also does not return a value, because it too mutates the list.

In [50]:
numbers2
Out[50]:
[19, 12, 0, -9, 17, -2, 35]

In combination, sort and reverse can sort a list in reverse order:

In [51]:
numbers2.sort()
numbers2.reverse()
numbers2
Out[51]:
[35, 19, 17, 12, 0, -2, -9]

We can use the parameters key and reverse as well

In [52]:
people.sort(key=age, reverse=True)
people
Out[52]:
[('Bob Miller', 31),
 ('Janet Doe', 25),
 ('Mary Beth Johnson', 18),
 ('Ed Smith', 17)]

7. Sorting dictionaries and lists of dictionaries

Does it make sense to sort a dictioanry? What happens when we try to do so?

In [53]:
fruitColors = {"banana": "yellow", "kiwi": "green", 
               "grapes": "purple", "apple": "red", 
               "lemon": "yellow", "pomegranate": "red"}

sorted(fruitColors)
Out[53]:
['apple', 'banana', 'grapes', 'kiwi', 'lemon', 'pomegranate']

We didn't get an error, but all that was sorted were the keys of the dictionary.

Try to predict what we will see in the following lines:

In [54]:
sorted(fruitColors.values())
Out[54]:
['green', 'purple', 'red', 'red', 'yellow', 'yellow']
In [55]:
sorted(fruitColors.items())
Out[55]:
[('apple', 'red'),
 ('banana', 'yellow'),
 ('grapes', 'purple'),
 ('kiwi', 'green'),
 ('lemon', 'yellow'),
 ('pomegranate', 'red')]
In [56]:
sorted(fruitColors.keys())
Out[56]:
['apple', 'banana', 'grapes', 'kiwi', 'lemon', 'pomegranate']

While it doesn't make sense to sort a dictionary, it certainly makes sense to sort a list of dictionaries, but we will need to do some work. Let's try it out:

In [57]:
peopleDctList = [{'name':'Mary Beth Johnson', 'age': 18},
                 {'name':'Ed Smith', 'age': 17},
                 {'name':'Janet Doe', 'age': 25},
                 {'name':'Bob Miller', 'age': 31}]
In [58]:
sorted(peopleDctList)
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-58-4383f94b94f7> in <module>
----> 1 sorted(peopleDctList)

TypeError: '<' not supported between instances of 'dict' and 'dict'

It doesn't happen out of the box, becuase Python doesn't have rules for comparing dictionaries, the ways that there are rules for comparing strings and tuples (as we saw at the beginning). That means, we would have to provide the key parameter to tell Python how to sort the items of this list of dictionaries.

In [59]:
def byAge(personDct):
    """Helper function to be used in sorted."""
    return personDct['age']
In [60]:
sorted(peopleDctList, key=byAge)
Out[60]:
[{'name': 'Ed Smith', 'age': 17},
 {'name': 'Mary Beth Johnson', 'age': 18},
 {'name': 'Janet Doe', 'age': 25},
 {'name': 'Bob Miller', 'age': 31}]

Your turn: Sort by last name

Create a function byLastName, that will sort the list of dictionaries by the last name of people. When used in sorted, it should produce the list:

[{'name': 'Janet Doe', 'age': 25},
 {'name': 'Mary Beth Johnson', 'age': 18},
 {'name': 'Bob Miller', 'age': 31},
 {'name': 'Ed Smith', 'age': 17}]
In [61]:
# Your code here

def byLastName(personDct):
    """Helper function to be used in sorted."""
    return personDct['name'].split()[-1]
In [62]:
sorted(peopleDctList, key=byLastName)
Out[62]:
[{'name': 'Janet Doe', 'age': 25},
 {'name': 'Mary Beth Johnson', 'age': 18},
 {'name': 'Bob Miller', 'age': 31},
 {'name': 'Ed Smith', 'age': 17}]

Conclusion: You learned to sort a list of dictionaries. Now that you know that, you can answer our initial questions about the Census data.

8. Exercises: Sorting Census Data

We want to answer all these questions through sorting:

  1. Which are the most populated US states? Rank the data in that order.
  2. Which are the least populated US states? Rank the data in that order.
  3. Which US state capitals are the most populated? Rank the data in that order.
  4. Which US state capitals are the least populated? Rank the data in that order.
  5. What percentage of each US state’s population lives in the state capital? Rank the data by that percentage from the largest to the smallest.

For the sake of space, you should only print the first 6 items of each sorted list. The final results for each question are shown in the slides.

Step 0: Get the data from the file

We are providing you with the code of reading a CSV file as a list of dictionaries, so that you can focus on sorting.

In [63]:
import csv
In [64]:
# Read content of CSV file as a list of dictionaries

with open("us-states-more.csv", 'r') as inputF:
    dctReader = csv.DictReader(inputF)
    stateDctList = list(dctReader)
    
len(stateDctList)
Out[64]:
50

Look up an item of this list:

In [65]:
stateDctList[0]
Out[65]:
OrderedDict([('State', 'Alabama'),
             ('StatePop', '4921532'),
             ('Abbrev.', 'AL'),
             ('Capital', 'Montgomery'),
             ('CapitalPop', '198525')])

Remember that while Python displays this item as a list of tuples, this is in fact a dictionary, you can access its keys and values in the usual way:

In [66]:
for key in stateDctList[0]:
    print(f"{key}: {stateDctList[0][key]}")
State: Alabama
StatePop: 4921532
Abbrev.: AL
Capital: Montgomery
CapitalPop: 198525

Question 1: Find the top most populated US States

The steps for answering this question can be found also in Slide 20 of our lecture notes, but here they are for simplicity of access:

  • Create a helper function byStatePop, which, given a dictionary with state data (one row from our file), returns the appropriate value. Remember that all values in the dictionary are strings, because they come from the CSV file.
  • Apply the sorted function to the list of dictionaries of state data (stateDctList), using the key parameter that has as value byStatePop.
  • Look at the results, in which way are they sorted?
  • Include the function parameter reverse to change the order of sorting.
  • Use f-string formatting to print out top six results as shown below.

You can find the results of the printing in Slide 20.

In [67]:
# Write the function byStatePop
# Your code here

def byStatePop(stateDct):
    """Helper function to be used in sorted."""
    return int(stateDct['StatePop'])
In [68]:
# Sort the list in descending order, save it in a variable
# Your code here

sortedStates1 = sorted(stateDctList, key=byStatePop, reverse=True)
In [69]:
# Print out the top 6 entries (see Slide 20)
# Do not worry if you cannot get the formatting of the numbers, check the solution later
# Your code here

print("Top six most populated US states:\n")
for stateDct in sortedStates1[:6]:
    print(f"{stateDct['Abbrev.']} -> {int(stateDct['StatePop']):,}")
Top six most populated US states:

CA -> 39,368,078
TX -> 29,360,759
FL -> 21,733,312
NY -> 19,336,776
PA -> 12,783,254
IL -> 12,587,530

Question 2: Find the top least populated US States

This is very similar to question 1. You do not need to write a helper function, because you can use byStatePop. You only have to sort data in the ascending order (the default way that sorted works) and then print the top six values.

In [70]:
# First sort (save in a variable), and then print out the top 6 least populated states in the US
# Your code here

sortedStates2 = sorted(stateDctList, key=byStatePop)

print("Top six least populated US states:\n")
for stateDct in sortedStates2[:6]:
    print(f"{stateDct['Abbrev.']} -> {int(stateDct['StatePop']):,}")
Top six least populated US states:

WY -> 582,328
VT -> 623,347
AK -> 731,158
ND -> 765,309
SD -> 892,717
DE -> 986,809

Question 3: Find the top most populated US State capitals

This will be very similar to the two questions above, but first you'll have to write the helper function byCapitalPop. Once you do that, check out Slide 21 to see what kind of output to print. Do not worry about the formatting, make sure you have the content.

In [71]:
# Write the function byCapitalPop
# Your code here

def byCapitalPop(stateDct):
    """Helper function to be used by sorted."""
    return int(stateDct['CapitalPop'])
In [72]:
# Sort the list in descending order, save it in a variable
# Your code here

sortedCapitals1 = sorted(stateDctList, key=byCapitalPop, reverse=True)
In [73]:
# Print the top 6 capitals together with some other data (See slide 21 for what information to include)
# Your code here

print("Top six most populated US state capitals:\n")

for stateDct in sortedCapitals1[:6]:
    combined = f"{stateDct['Capital']} ({stateDct['Abbrev.']})"
    print(f"{combined:18s} -> {int(stateDct['CapitalPop']):10,d}")
    
## What does the formatting mean?
# :18s means allocate 18 characters for this string, padd with white space as necessary
# :10,d means allocate 10 digits that will be separated by , at the 1000s
Top six most populated US state capitals:

Phoenix (AZ)       ->  1,680,992
Austin (TX)        ->    978,908
Columbus (OH)      ->    898,553
Indianapolis (IN)  ->    876,384
Denver (CO)        ->    727,211
Boston (MA)        ->    692,600

Question 4: Find the top least populated US State capitals

Very similar to Question 3, but the capitals will now be in ascending order and you'll only print the top 6 least populated.

In [74]:
# First sort (save in a variable), and then print out the top 6 least populated US state capitals
# Your code here

sortedCapitals2 = sorted(stateDctList, key=byCapitalPop)

print("Top six least populated US state capitals:\n")

for stateDct in sortedCapitals2[:6]:
    combined = f"{stateDct['Capital']} ({stateDct['Abbrev.']})"
    print(f"{combined:18s} -> {int(stateDct['CapitalPop']):10,d}")
Top six least populated US state capitals:

Montpelier (VT)    ->      7,855
Pierre (SD)        ->     13,646
Augusta (ME)       ->     18,681
Frankfort (KY)     ->     27,679
Juneau (AK)        ->     32,113
Helena (MT)        ->     32,315

Question 5: Find the top states with the highest percentage of population living in their capitals

The general algorithm for answering this question remains the same.

  • create a helper function byPercentage, that finds the percentage of people in a state living in its capital
  • sort with the help of this function (in descending order)
  • print out some useful information about this question (See Slide 22)
In [75]:
# Write the function byPercentage
# Your code here

def byPercentage(stateDct):
    """Helper function to be used by sorted."""
    return round(int(stateDct['CapitalPop'])/int(stateDct['StatePop'])*100, 2)
In [76]:
# Sort the list of state dicts by using the key function you wrote. Save the result in a variable.
# Your code here

sortedPercentages = sorted(stateDctList, key=byPercentage, reverse=True)
In [77]:
# Print the top six results (see what to print in Slide 22)
# Your code here

print("Top six US states with the largest population percentage living in the capital:\n")

for stateDct in sortedPercentages[:6]:
    combined = f"{stateDct['State']:15s} {byPercentage(stateDct):2.2f}%"
    print(f"{combined} of population lives in the capital, {stateDct['Capital']}.")
Top six US states with the largest population percentage living in the capital:

Hawaii          24.52% of population lives in the capital, Honolulu.
Arizona         22.65% of population lives in the capital, Phoenix.
Rhode Island    17.02% of population lives in the capital, Providence.
Oklahoma        16.46% of population lives in the capital, Oklahoma City.
Nebraska        14.92% of population lives in the capital, Lincoln.
Indiana         12.97% of population lives in the capital, Indianapolis.

This is the end of this notebook!