CS 111 Lecture: List Comprehension and List Sorting¶

Table of Contents

List Comprehensions simplify mapping and filtering
1.1 Exercise 1: State comprehensions
1.2 Exercise 2: makeSquarePairs
List Comprehension with both mapping and filtering
2.1 Exercise 3: More state comprehensions
Sorting a list of numbers
Sorting other sequences
Sorting a list of sequences
5.1 Sorting a list of strings
5.2 Sorting a list of tuples
Sorting with the key parameter
6.1 6.1 Exercise 3: Sorting people by the the length of names
6.2 Breaking ties with key functions
6.3 Exercise 4: Another tiebreaker
6.4 How does sorting with key work?
The split and join methods
Mutating list methods for sorting

1. List Comprehensions simplify mapping and filtering¶

We can simplify the mapping/filtering patterns with a syntactic device called list comprehension.

# The old way of doing mapping

nums = [17, 42, 6, 23, 38]
result = []
for x in nums:
    result.append(x*2)
result # list value at the end of mapping process

[34, 84, 12, 46, 76]

# The new way of doing mapping with a list comprehension: 

[x*2 for x in nums]

[34, 84, 12, 46, 76]

# The old way of doing filtering

nums = [17, 42, 6, 23, 38]
result = []
for n in nums:
    if n%2 == 0:
        result.append(n)
result # list value at the end of mapping process

[42, 6, 38]

# The new way of doing filtering

[n for n in nums if n%2 == 0]

[42, 6, 38]

1.1 Exercise 1: State comprehensions¶

Consider the following list of US states:

states = ["Alabama", "California", "Illinois", "Massachusetts", 
          "Michigan", "Ohio", "Oklahoma", "Washington"]

Ex 1a: Use list comprehension to write a single line that creates a list of the lengths of the strings in states:

# Your code here
[len(state) for state in states]

[7, 10, 8, 13, 8, 4, 8, 10]

Ex 1b: Use list comprehension to write a single line that creates a list of the abbreviations of the states in states
(i.e. ['AL', 'AR', 'CA', 'IL', 'MA', 'MI', 'OK', 'UT', 'WA'])

# Your code here
[state[0:2].upper() for state in states]

['AL', 'CA', 'IL', 'MA', 'MI', 'OH', 'OK', 'WA']

Ex 1c: Use list comprehension to write a single line that creates a list of all states in states that end in 'a':

# Your code here
[state for state in states if state[-1]=='a']

['Alabama', 'California', 'Oklahoma']

1.2 Exercise 2: `makeSquarePairs`¶

Define the function makeSquarePairs that given a single integer num returns a list of tuples containing all numbers from 1 to the num inclusive and their square values. An example is shown below:

makeSquarePairs(5)
[(1, 1), (2, 4), (3, 9), (4, 16), (5, 25)]

Use list comprehension!

def makeSquarePairs(num):
    # flesh out the body of this function
    # you only need ONE line of code
    # Your code here
    return [(n, n*n) for n in range(1, num+1)]

makeSquarePairs(5)

[(1, 1), (2, 4), (3, 9), (4, 16), (5, 25)]

2. List Comprehension with both mapping and filtering¶

It is possible to do both mapping and filtering in a single list comprehension. Examine the example below which filters a list by even numbers and creates a new list of their squares.

# Mapping and filtering together

[(x**2) for x in range(10) if x % 2 == 0]

[0, 4, 16, 36, 64]

Note that the expression for mapping still comes before the for keyword and the filtering with the if keyword still comes after the sequence expression. Below is the equivalent code without list comprehension.

newList = []
for x in range(10):
    if x % 2 == 0:
        newList.append(x**2)
newList

[0, 4, 16, 36, 64]

2.1 Exercise 3: More state comprehensions¶

Here's a second list of states:

states2 = ['Arkansas', 'Idaho', 'North Carolina', 'New Mexico', 
           'Oregon', 'Rhode Island', 'South Dakota', 'Utah']

Ex 3a: Use list comprehension to write a single line that creates a list of the abbreviations of only the one-word states in states2
(i.e., ['AR', 'ID', 'OR', 'UT'])

# Your code here
[state[0:2].upper() for state in states2 if ' ' not in state]

['AR', 'ID', 'OR', 'UT']

Ex 3b: Use list comprehension to write a single line that creates a list of the abbreviations of only the two-word states in states2
(i.e., ['NC', 'NM', 'RI', 'SD'])

Reminder: the .split() method of strings can split a string into multiple parts wherever spaces occur.

# Your code here
# Solution #1
[state.split()[0][0] + state.split()[1][0] for state in states2 if ' ' in state]

# Solution #2 uses nested list comprehensions to avoid calling .split() twice on a state.
[strings[0][0] + strings[1][0] 
    for strings in [state.split() for state in states2] 
    if len(strings) == 2]

['NC', 'NM', 'RI', 'SD']

3. Sorting a list of numbers¶

Let's start with the simplest case, sorting a list of unordered numbers, positive and negative.

numbers = [35, -2, 17, -9, 0, 12, 19]

We will use Python's built-in function, sorted to sort the list. This function always returns a new list.

sorted(numbers)

[-9, -2, 0, 12, 17, 19, 35]

And we can verify that numbers hasn't changed:

numbers

[35, -2, 17, -9, 0, 12, 19]

Note: This suggests that if we want to use the result of sorted, we must define a variable to save its returned value, for example:

sortedNumbers = sorted(numbers)

By default the list is sorted in the ascending order (from the smalled value to the largest), but we can easily reverse the order, using the reverse keyword parameter of the function sorted, as shown below:

sorted(numbers, reverse=True)

[35, 19, 17, 12, 0, -2, -9]

You have seen keyword parameters to functions before in the context of print. Remember the sep and end keyword parameters for print?

4. Sorting other sequences¶

Strings and tuples can also be sorted in the same way. The result is always going to be a new list.

Characters in a string will be ordered in dictionary order:

sorted('facetiously')

['a', 'c', 'e', 'f', 'i', 'l', 'o', 's', 't', 'u', 'y']

phrase = 'Red Code 1'
sorted(phrase)

[' ', ' ', '1', 'C', 'R', 'd', 'd', 'e', 'e', 'o']

Question: Why do we see a space as the first element in the sorted list of characters for phrase?
Answer: Because of the ASCII representation of characters.

We can use the Python built-in function ord to find the ASCII code of every character:

ord(' ')

32

We can write a for loop to print the code for every character.

for item in sorted(phrase):
    print(f"'{item}' has ASCII code {ord(item)}")

' ' has ASCII code 32
' ' has ASCII code 32
'1' has ASCII code 49
'C' has ASCII code 67
'R' has ASCII code 82
'd' has ASCII code 100
'd' has ASCII code 100
'e' has ASCII code 101
'e' has ASCII code 101
'o' has ASCII code 111

The above example uses a so-called f-string of the form f"...", where "..." is a template string in which parts delimited by curly braces contain Python expressions whose values are automatically converted to strings and then are inserted into the template via concatenation.

For example, f"'{item}' has ASCII code {ord(item)}" is just a more convenient way to write the harder-to-read concatenation expression

"'" + item + "' has ASCII code " + str(ord(item))

Just as in the case of the list numbers in the above example, the string value of phrase hasn't changed:

phrase

'Red Code 1'

This is to be expected, because strings are immutable.

Tuples can be sorted too¶

digits = (9, 7, 5, 3, 1) # this is a tuple

type(digits) # check the type

tuple

sorted(digits)

[1, 3, 5, 7, 9]

Notice that the result of the sorting is a list, not a tuple. This is because the function sorted always returns a list.

digits

(9, 7, 5, 3, 1)

The original tuple value hasn't changed.

5. Sorting a list of sequences¶

We can sort list of sequences such as list of strings, list of tuples, and list of lists.
Sorting the list of tuples and the list of lists is going to be similar. The same principles will apply.

5.1 Sorting a list of strings¶

# a long string that we will split into a list of words

phrase = "99 red balloons *floating* in the Summer sky" 
words = phrase.split()
words

['99', 'red', 'balloons', '*floating*', 'in', 'the', 'Summer', 'sky']

sorted(words)

['*floating*', '99', 'Summer', 'balloons', 'in', 'red', 'sky', 'the']

Question: Can you explain the results of sorting here? What rules are in place?
Answer: Words that start with special characters come first, then words that start with digits, words starting with uppercase letters, and finally, words with lowercase letters in alphabetical order. This ordering corresponds to the ASCII table numerical representations of each word's first character.

*String characters are ordered by these rules:*

Special symbols, including whitespace characters (tab, newline, space, etc.)
Some punctuation symbols (! + , . etc.)
Digits (0 1 2 etc.)
A few more punctuation symbols (: < ? etc.)
Uppercase letters (A B C etc.)
A few more punctuation symbols (^ _ etc.)
Lowercase letters (a b c etc.)
Final punctuation symbols (| ~ etc.)

sorted(words, reverse=True)

['the', 'sky', 'red', 'in', 'balloons', 'Summer', '99', '*floating*']

Remember, the original list is unchanged:

words

['99', 'red', 'balloons', '*floating*', 'in', 'the', 'Summer', 'sky']

5.2 Sorting a list of tuples¶

Tuples are compared element by element, starting with the one at index 0. This is known as lexicographic order, which is a generalization of dictionary order on strings in which each tuple element generalizes a character in a string.

triples = [(8, 'a', '$'), (7, 'c', '@'),
           (7, 'b', '+'), (8, 'a', '!')] 

sorted(triples)

[(7, 'b', '+'), (7, 'c', '@'), (8, 'a', '!'), (8, 'a', '$')]

Q: What happens in the case of ties for the first elements of tuples?
A: We keep comparing elements with the same indices until we find two that are not the same. (See example for the two tuples that start with 8.)

ord('!') < ord('$')

True

print(ord('!'), ord('$'))

33 36

That is, the reason '!' is less than '$' is that the first has a smaller ASCII code than the latter.

6. Sorting with the `key` keyword parameter¶

Often there are cases in which we want to sort by an element that is not first in a sequence, for example, given the list of tuples people (below), we want to sort by the age of a person.

people = [('Mary Beth Johnson', 18), 
          ('Ed Smith', 17), 
          ('Janet Doe', 25), 
          ('Bob Miller', 31)]

Simply using sorted as we have done so far will not work. But the function sorted has been designed to deal with this scenario in mind. Let us read its doc string.

help(sorted)

Help on built-in function sorted in module builtins:

sorted(iterable, /, *, key=None, reverse=False)
    Return a new list containing all items from the iterable in ascending order.
    
    A custom key function can be supplied to customize the sort order, and the
    reverse flag can be set to request the result in descending order.

Notice the phrase: A custom key function can be supplied to customize the sort order. This means that we can specify a function that for each element determines how it should be compared to other elements of the iterable. Let us see an example.

people = [('Mary Beth Johnson', 18), 
          ('Ed Smith', 17), 
          ('Janet Doe', 25), 
          ('Bob Miller', 31)]

We'll create the function age that given a person tuple (name, age) will return the age value.

def age(personTuple):
    """Helper function to use in sorted"""
    return personTuple[1]

age(('Janet Doe', 25))

25

Now that we have this function, we will use it as the value for the key keyword parameter in sorted.

sorted(people, key=age)

[('Ed Smith', 17),
 ('Mary Beth Johnson', 18),
 ('Janet Doe', 25),
 ('Bob Miller', 31)]

The list was sorted by the age values! Let's see one more example. We will create a helper function lastName that returns a person's last name.

def lastName(personTuple):
    """Helper function to use in sorted"""
    return personTuple[0].split()[-1]        # first access the whole name (has index=0 in the tuple)
                                             # then split it (will create a list), 
                                             # then return its last element (index=-1)

lastName(('Bob Miller', 31))

'Miller'

sorted(people, key=lastName)

[('Janet Doe', 25),
 ('Mary Beth Johnson', 18),
 ('Bob Miller', 31),
 ('Ed Smith', 17)]

Important: The keyword parameter key is being assigned as its value a function value. Functions in Python are values, see the examples below:

age

<function __main__.age(personTuple)>

lastName

<function __main__.lastName(personTuple)>

We can create a variable, assign it a function value, and then call that variable as if it was a function (because indeed it's an alias for a function).

boo = age
boo(('Janet Doe', 25))

25

foo = lastName
foo(('Ed Smith', 17))

'Smith'

The variables boo and foo are aliases for the functions age and lastName, which we can easily verify:

boo

<function __main__.age(personTuple)>

foo

<function __main__.lastName(personTuple)>

sorted(people, key=boo) # boo is an alias for age

[('Ed Smith', 17),
 ('Mary Beth Johnson', 18),
 ('Janet Doe', 25),
 ('Bob Miller', 31)]

sorted(people, key=foo) # foo is an alias for lastName

[('Janet Doe', 25),
 ('Mary Beth Johnson', 18),
 ('Bob Miller', 31),
 ('Ed Smith', 17)]

6.1 Exercise 3: Sorting `people` by the the length of names¶

Suppose we want to sort people in ascending order by the lengths of their names. I.e. the sorted result should be:

[('Ed Smith', 17),
 ('Janet Doe', 25),
 ('Bob Miller', 31)
 ('Mary Beth Johnson', 18),
]

Define a helper function nameLength that can be used as the key parameter for sorted to perform sorting by name length

# define the nameLength function below
# Your code here
def nameLength(personTuple):
    return len(personTuple[0])

sorted(people, key=nameLength)

[('Ed Smith', 17),
 ('Janet Doe', 25),
 ('Bob Miller', 31),
 ('Mary Beth Johnson', 18)]

6.2 Breaking ties with `key` functions¶

Assume we have a new list of person tuples, where there are lots of ambiguities in terms of what comes first. Concretely:

people2 = [('Ed Jones', 18), 
           ('Bob Doe', 25), 
           ('Ed Doe', 18),
           ('Ana Doe', 25), 
           ('Ana Jones', 18)]

Notice that we have several individuals with the same age, or the same first name, or the same last name. How should we sort elements in this situation?

We can create a function that uses a tuple to break the ties.

def ageLastFirst(person):
    return (age(person), lastName(person), firstName(person))

Your Turn

Define a function firstName, that mimics lastName, but returns the first name of a person.

# Your code here

def firstName(personTuple):
    """Helper function to use in sorted"""
    return personTuple[0].split()[0]

If you defined firstName, now we can write ageLastFirst:

def ageLastFirst(person):
    """Helper function to use in sorted"""
    return (age(person), lastName(person), firstName(person))

people2 = [('Ed Jones', 18), 
           ('Bob Doe', 25),
           ('Ed Doe', 18),
           ('Ana Doe', 25),
           ('Ana Jones', 18)]

sorted(people2, key=ageLastFirst)

[('Ed Doe', 18),
 ('Ana Jones', 18),
 ('Ed Jones', 18),
 ('Ana Doe', 25),
 ('Bob Doe', 25)]

Notice that in the result, the tuples are sorted first by age, then by last name (when the same age), and then by first name (when same age and last name).

6.3 Exercise 4: Another tiebreaker¶

Suppose we want to sort people2 first by the length of their names, and then sort names with the same length alphabetically. I.e., we want the result to be:

[('Ed Doe', 18),
 ('Ana Doe', 25),
 ('Bob Doe', 25), 
 ('Ed Jones', 18),
 ('Ana Jones', 18)]

Define a helper function nameLengthThenAlphabetic that can be used as the key parameter for sorted to perform sorting first by name length, and then alphabetically by name to break ties.

# Your code here
def nameLengthThenAlphabetic(person):
    return (nameLength(person), person[0]) # person[0] will sort names of same length alphabetically

sorted(people2, key=nameLengthThenAlphabetic)

[('Ed Doe', 18),
 ('Ana Doe', 25),
 ('Bob Doe', 25),
 ('Ed Jones', 18),
 ('Ana Jones', 18)]

6.4 How does sorting with `key` work?¶

When sorted is called with a key parameter, the first thing it does is to invoke the function that is referred to by key for each element of the sequence. If we think of the value returned by the key function as keyvalue, then what sorted does is to create a tuple (keyvalue,value), sort the sequence based on this tuple, and then get rid of the tuple and return the sorted values only.

This process is also known as Decorate, Sort, Undecorate and we can try it too:

# Step 1 (Decorate): create a list of tuples (keyvalue, value)
decorated = [(age(person), person) for person in people]
decorated

[(18, ('Mary Beth Johnson', 18)),
 (17, ('Ed Smith', 17)),
 (25, ('Janet Doe', 25)),
 (31, ('Bob Miller', 31))]

# Step 2 (Sort): invoke the function sorted without the key function
decoratedSorted = sorted(decorated)
decoratedSorted

[(17, ('Ed Smith', 17)),
 (18, ('Mary Beth Johnson', 18)),
 (25, ('Janet Doe', 25)),
 (31, ('Bob Miller', 31))]

# Step 3 (Undecorate): extract now the value from each (keyvalue,value) pair to create the end result
undecoratedResult = [item[1] for item in decoratedSorted]
undecoratedResult

[('Ed Smith', 17),
 ('Mary Beth Johnson', 18),
 ('Janet Doe', 25),
 ('Bob Miller', 31)]

As you might remember, when we include key in sorted the result is the same:

sorted(people, key=age)

[('Ed Smith', 17),
 ('Mary Beth Johnson', 18),
 ('Janet Doe', 25),
 ('Bob Miller', 31)]

Basically, the parameter key works, becuase of the rules for sorting a list of tuples, that we saw earlier on the notebook.

7. The `split` and `join` methods¶

There are two string methods that are quite useful when dealing with lists of strings, including when sorting them. These are split and join. split when applied to a string will split that string up into pieces, returning a list of strings. Without any argument, the string will be split wherever whitespace is found (i.e., spaces, newlines, tabs, etc.). If an argument is given, wherever that specific character or sequence of characters is found will become a split point. For example:

'This is a sentence'.split()

['This', 'is', 'a', 'sentence']

'one-two-three'.split('-')

['one', 'two', 'three']

'To be or not to be'.split('o')

['T', ' be ', 'r n', 't t', ' be']

'I saw the cat befriend the mouse that ate the cheese'.split(' the ')

['I saw', 'cat befriend', 'mouse that ate', 'cheese']

The join method is the opposite of split, and you also give it infromation in the opposite order: the separator comes before .join and the things to join together (a list of strings) is provided as an argument. It returns the string that results from concatenating all the strings in the list separated by the separator. For example:

', '.join(['aardvark', 'bunny', 'cat', 'dingo'])

'aardvark, bunny, cat, dingo'

';'.join(['aardvark', 'bunny', 'cat', 'dingo'])

'aardvark;bunny;cat;dingo'

' '.join(['aardvark', 'bunny', 'cat', 'dingo'])

'aardvark bunny cat dingo'

''.join(['aardvark', 'bunny', 'cat', 'dingo'])

'aardvarkbunnycatdingo'

Split and join can be used together to separate off one part of a string and put the rest back together. For example, if you want to chop off the last word of a multi-word string, you can split it, slice the result, and then join that slice back together, like this:

text = 'every word is important'
' '.join(text.split()[:-1])

'every word is'

If we break that down into steps using variables, it would be:

text = 'every word is important'
words = text.split()
allButLast = words[:-1]
' '.join(allButLast)

'every word is'

8. Mutating list methods for sorting¶

Lists have two methods for sorting: sort and reverse. These methods mutate the original list and return None rather than returning the mutated list.

Method `sort`¶

numbers = [35, -2, 17, -9, 0, 12, 19]
numbers.sort()

Notice that no value was returned, because sort mutates the original list.

numbers

[-9, -2, 0, 12, 17, 19, 35]

Method `reverse`¶

numbers2 = [35, -2, 17, -9, 0, 12, 19]
numbers2.reverse()

This method also does not return a value, because it too mutates the list.

numbers2

[19, 12, 0, -9, 17, -2, 35]

In combination, sort and reverse can sort a list in reverse order:

numbers2.sort()
numbers2.reverse()
numbers2

[35, 19, 17, 12, 0, -2, -9]

Like the `sorted` function, the `sort` method can also use the keyword parameters `key` and `reverse`¶

people.sort(key=age, reverse=True)
people

[('Bob Miller', 31),
 ('Janet Doe', 25),
 ('Mary Beth Johnson', 18),
 ('Ed Smith', 17)]

This is the end of this notebook!