CS 111 Lecture: List Comprehension and List Sorting¶

Table of Contents

List Comprehension
List Comprehension with mapping and filtering
Sorting a list of numbers
Sorting other sequences
Sorting a list of sequences
5.1 Sorting a list of strings
5.2 Sorting a list of tuples
Sorting with the key parameter
The split and join methods
Mutating list methods for sorting

1. List Comprehension¶

We can simplify the mapping/filtering patterns with a syntactic device called list comprehension.

# The old way of doing mapping

nums = [17, 42, 6]
result = []
for x in nums:
    result.append(x*2)
print(result)

[34, 84, 12]

# The new way of doing mapping

result = [x*2 for x in nums]
print(result)

[34, 84, 12]

# The old way of doing filtering

nums = [17, 42, 6]
result = []
for n in nums:
    if n%2 == 0:
        result.append(n)
print(result)

[42, 6]

# The new way of doing filtering

result = [n for n in nums if n%2 == 0]
print(result)

[42, 6]

Challenge Exercise¶

Write the function makeSquarePairs that given a single integer num returns a list of tuples containing all numbers from 1 to the num inclusive and their square values. An example is shown below:

makeSquarePairs(5)
[(1, 1), (2, 4), (3, 9), (4, 16), (5, 25)]

Use list comprehension!

def makeSquarePairs(num):
    # flesh out the body of this function
    # you only need ONE line of code
    # Your code here
    return [(n, n*n) for n in range(1, num+1)]

makeSquarePairs(5)

[(1, 1), (2, 4), (3, 9), (4, 16), (5, 25)]

2. List Comprehension with Mapping and Filtering¶

It is possible to do both mapping and filtering in a single list comprehension. Examine the example below which filters a list by even numbers and creates a new list of their squares.

# Mapping and filtering together

[(x**2) for x in range(10) if x % 2 == 0]

[0, 4, 16, 36, 64]

Note that the expression for mapping still comes before the "for" and the filtering with "if" still comes after the sequence. Below is the equivalent code without list comprehension.

newList = []
for x in range(10):
    if x % 2 == 0:
        newList.append(x**2)
newList

[0, 4, 16, 36, 64]

Challenge Exercise:

Write a list of comprehension that generates state abbreviations only for two-word states.

Reminder: the .split method of strings can split a string into multiple parts wherever spaces occur.

states = ['Alaska', 'Maine', 'North Carolina', 'New Mexico', 'Oregon', 'Rhode Island', 'Tennessee']
# expected result: ['NC', 'NM', 'RI']

# Your code here
# Solution #1
[state.split()[0][0] + state.split()[1][0] for state in states if ' ' in state]

# Solution #2 uses nested list comprehensions to avoid calling .split() twice on a state.
[strings[0][0] + strings[1][0] 
 for strings in [state.split() for state in states] 
     if len(strings) == 2]

['NC', 'NM', 'RI']

3. Sorting a list of numbers¶

Let's start with the simplest case, sorting a list of unordered numbers, positive and negative.

numbers = [35, -2, 17, -9, 0, 12, 19]

We will use Python's built-in function, sorted to sort the list. This function always returns a new list.

sorted(numbers)

[-9, -2, 0, 12, 17, 19, 35]

And we can verify that numbers hasn't changed:

numbers

[35, -2, 17, -9, 0, 12, 19]

Note: This suggests that if we want to use the result of sorted, we must define a variable to save its returned value, for example:

sortedNumbers = sorted(numbers)

By default the list is sorted in the ascending order (from the smalled value to the largest), but we can easily reverse the order, using a named parameter of the function sorted, as shown below:

sorted(numbers, reverse=True)

[35, 19, 17, 12, 0, -2, -9]

The use of the named parameter, reverse, in this function call is new syntax that you haven't seen before. We will get back to it later in this lecture.

4. Sorting other sequences¶

Strings and tuples can also be sorted in the same way. The result is always going to be a new list.

phrase = 'Red Code 1'

sorted(phrase)

[' ', ' ', '1', 'C', 'R', 'd', 'd', 'e', 'e', 'o']

Question: Why do we see a space as the first element in the sorted list?
Answer: Because of the ASCII representation of characters.

We can use the Python built-in function ord to find the ASCII code of every character:

ord(' ')

32

We can write a for loop to print the code for every character.

for item in sorted(phrase):
    print(f"'{item}' has ASCII code {ord(item)}")

' ' has ASCII code 32
' ' has ASCII code 32
'1' has ASCII code 49
'C' has ASCII code 67
'R' has ASCII code 82
'd' has ASCII code 100
'd' has ASCII code 100
'e' has ASCII code 101
'e' has ASCII code 101
'o' has ASCII code 111

The above example uses a so-called f-string of the form f"...", where "..." is a template string in which parts delimited by curly braces contain Python expressions whose values are automatically converted to strings and then are inserted into the template via concatenation.

For example, f"'{item}' has ASCII code {ord(item)}" is just a more convenient way to write the harder-to-read concatenation expression

"'" + item + "' has ASCII code " + str(ord(item))

Just as in the case of the list numbers in the above example, the string value of phrase hasn't changed:

phrase

'Red Code 1'

This is to be expected, because strings are immutable.

Tuples can be sorted too¶

digits = (9, 7, 5, 3, 1) # this is a tuple

type(digits) # check the type

tuple

sorted(digits)

[1, 3, 5, 7, 9]

Notice that the result of the sorting is a list, not a tuple. This is because the function sorted always returns a list.

digits

(9, 7, 5, 3, 1)

The original tuple value hasn't changed.

5. Sorting a list of sequences¶

We can sort list of sequences such as list of strings, list of tuples, and list of lists.
Sorting the list of tuples and the list of lists is going to be similar. The same principles will apply.

5. 1 Sorting a list of strings¶

# a long string that we will split into a list of words

phrase = "99 red balloons *floating* in the Summer sky" 
words = phrase.split()
words

['99', 'red', 'balloons', '*floating*', 'in', 'the', 'Summer', 'sky']

sorted(words)

['*floating*', '99', 'Summer', 'balloons', 'in', 'red', 'sky', 'the']

Question: Can you explain the results of sorting here? What rules are in place?
Answer: Words that start with special characters come first, then words that start with digits, words starting with uppercase letters, and finally, words with lowercase letters in alphabetical order. This ordering corresponds to the ASCII table numerical representations of each word's first character.

*String characters are ordered by these rules:*

Most punctuation symbols (. , ' etc.)
Digits
A few more punctuation symbols (: < ? etc.)
Uppercase letters
A few more punctuation symbols (^ _ etc.)
Lowercase letters
Final punctuation symbols (| ~ etc.)

sorted(words, reverse=True)

['the', 'sky', 'red', 'in', 'balloons', 'Summer', '99', '*floating*']

Remember, the original list is unchanged:

words

['99', 'red', 'balloons', '*floating*', 'in', 'the', 'Summer', 'sky']

5.2 Sorting a list of tuples¶

Tuples are compared element by element, starting with the one at index 0. This is known as lexicographic order, which is a generalization of dictionary order on strings in which each tuple element generalizes a character in a string.

triples = [(8, 'a', '$'), (7, 'c', '@'),
           (7, 'b', '+'), (8, 'a', '!')] 

sorted(triples)

[(7, 'b', '+'), (7, 'c', '@'), (8, 'a', '!'), (8, 'a', '$')]

Q: What happens in the case of ties for the first elements of tuples?
A: We keep comparing elements with the same indices until we find two that are not the same. (See example for the two tuples that start with 8.)

ord('!') < ord('$')

True

print(ord('!'), ord('$'))

33 36

That is, the reason '!' is less than '$' is that the first has a smaller ASCII code than the latter.

6. Sorting with the `key` parameter¶

Often there are cases in which we want to sort by an element that is not first in a sequence, for example, given the list of tuples people (below), we want to sort by the age of a person.

people = [('Mary Beth Johnson', 18), 
          ('Ed Smith', 17), 
          ('Janet Doe', 25), 
          ('Bob Miller', 31)]

Simply using sorted as we have done so far will not work. But the function sorted has been designed to deal with this scenario in mind. Let us read its doc string.

help(sorted)

Help on built-in function sorted in module builtins:

sorted(iterable, /, *, key=None, reverse=False)
    Return a new list containing all items from the iterable in ascending order.
    
    A custom key function can be supplied to customize the sort order, and the
    reverse flag can be set to request the result in descending order.

Notice the phrase: A custom key function can be supplied to customize the sort order. This means that we can specify a function that for each element determines how it should be compared to other elements of the iterable. Let us see an example.

people = [('Mary Beth Johnson', 18), 
          ('Ed Smith', 17), 
          ('Janet Doe', 25), 
          ('Bob Miller', 31)]

We'll create the function age that given a person tuple (name, age) will return the age value.

def age(personTuple):
    """Helper function to use in sorted"""
    return personTuple[1]

age(('Janet Doe', 25))

25

Now that we have this function, we will use it as the value for key in sorted.

sorted(people, key=age)

[('Ed Smith', 17),
 ('Mary Beth Johnson', 18),
 ('Janet Doe', 25),
 ('Bob Miller', 31)]

The list was sorted by the age values! Let's see one more example. We will create a helper function lastName that returns a person's last name.

def lastName(personTuple):
    """Helper function to use in sorted"""
    return personTuple[0].split()[-1]        # first access the whole name (has index=0 in the tuple)
                                             # then split it (will create a list), 
                                             # then return its last element (index=-1)

lastName(('Bob Miller', 31))

'Miller'

sorted(people, key=lastName)

[('Janet Doe', 25),
 ('Mary Beth Johnson', 18),
 ('Bob Miller', 31),
 ('Ed Smith', 17)]

Important: The parameter key is being assigned as its value a function value. Functions in Python are values, see the examples below:

age

<function __main__.age(personTuple)>

lastName

<function __main__.lastName(personTuple)>

We can create a variable, assign it a function value, and then call that variable as if it was a function (because indeed it's an alias for a function).

boo = age
boo(('Janet Doe', 25))

25

foo = lastName
foo(('Ed Smith', 17))

'Smith'

The variables boo and foo are aliases for the functions age and lastName, which we can easily verify:

boo

<function __main__.age(personTuple)>

foo

<function __main__.lastName(personTuple)>

sorted(people, key=boo) # boo is an alias for age

[('Ed Smith', 17),
 ('Mary Beth Johnson', 18),
 ('Janet Doe', 25),
 ('Bob Miller', 31)]

sorted(people, key=foo) # foo is an alias for lastName

[('Janet Doe', 25),
 ('Mary Beth Johnson', 18),
 ('Bob Miller', 31),
 ('Ed Smith', 17)]

Breaking ties with `key` functions¶

Assume we have a new list of person tuples, where there are lots of ambiguities in terms of what comes first. Concretely:

people2 = [('Ed Jones', 18), 
           ('Ana Doe', 25), 
           ('Ed Doe', 18),
           ('Bob Doe', 25), 
           ('Ana Jones', 18)]

Notice that we have several individuals with the same age, or the same first name, or the same last name. How should we sort elements in this situation?

We can create a function that uses a tuple to break the ties.

def ageLastFirst(person):
    return (age(person), lastName(person), firstName(person))

Your Turn

Write a function firstName, that mimics lastName, but returns the first name of a person.

# Your code here

def firstName(personTuple):
    """Helper function to use in sorted"""
    return personTuple[0].split()[0]

If you wrote firstName, now we can write ageLastFirst:

def ageLastFirst(person):
    """Helper function to use in sorted"""
    return (age(person), lastName(person), firstName(person))

people2 = [('Ed Jones', 18), 
           ('Ana Doe', 25), 
           ('Ed Doe', 18),
           ('Bob Doe', 25), 
           ('Ana Jones', 18)]

sorted(people2, key=ageLastFirst)

[('Ed Doe', 18),
 ('Ana Jones', 18),
 ('Ed Jones', 18),
 ('Ana Doe', 25),
 ('Bob Doe', 25)]

Notice that in the result, the tuples are sorted first by age, then by last name (when the same age), and then by first name (when same age and last name).

How does sorting with `key` work?¶

When sorted is called with a key parameter, the first thing it does is to invoke the function that is referred to by key for each element of the sequence. If we think of the value returned by the key function as keyvalue, then what sorted does is to create a tuple (keyvalue,value), sort the sequence based on this tuple, and then get rid of the tuple and return the sorted values only.

This process is also known as Decorate, Sort, Undecorate and we can try it too:

# Step 1 (Decorate): create a list of tuples (keyvalue, value)
decorated = [(age(person), person) for person in people]
decorated

[(18, ('Mary Beth Johnson', 18)),
 (17, ('Ed Smith', 17)),
 (25, ('Janet Doe', 25)),
 (31, ('Bob Miller', 31))]

# Step 2 (Sort): invoke the function sorted without the key function
decoratedSorted = sorted(decorated)
decoratedSorted

[(17, ('Ed Smith', 17)),
 (18, ('Mary Beth Johnson', 18)),
 (25, ('Janet Doe', 25)),
 (31, ('Bob Miller', 31))]

# Step 3 (Undecorate): extract now the value from each (keyvalue,value) pair to create the end result
undecoratedResult = [item[1] for item in decoratedSorted]
undecoratedResult

[('Ed Smith', 17),
 ('Mary Beth Johnson', 18),
 ('Janet Doe', 25),
 ('Bob Miller', 31)]

As you might remember, when we include key in sorted the result is the same:

sorted(people, key=age)

[('Ed Smith', 17),
 ('Mary Beth Johnson', 18),
 ('Janet Doe', 25),
 ('Bob Miller', 31)]

Basically, the parameter key works, becuase of the rules for sorting a list of tuples, that we saw earlier on the notebook.

7. The `split` and `join` methods¶

There are two string methods that are quite useful when dealing with lists of strings, including when sorting them. These are split and join. split when applied to a string will split that string up into pieces, returning a list of strings. Without any argument, the string will be split wherever whitespace is found (i.e., spaces, newlines, tabs, etc.). If an argument is given, wherever that specific character or sequence of characters is found will become a split point. For example:

'This is a sentence'.split()

['This', 'is', 'a', 'sentence']

'one-two-three'.split('-')

['one', 'two', 'three']

The join method is the opposite of split, and you also give it infromation in the opposite order: the separator comes before .join and the things to join together (a list of strings) is provided as an argument. It returns the string that results from concatenating all the strings in the list separated by the separator. For example:

', '.join(['aardvark', 'bunny', 'cat', 'dingo'])

'aardvark, bunny, cat, dingo'

';'.join(['aardvark', 'bunny', 'cat', 'dingo'])

'aardvark;bunny;cat;dingo'

' '.join(['aardvark', 'bunny', 'cat', 'dingo'])

'aardvark bunny cat dingo'

''.join(['aardvark', 'bunny', 'cat', 'dingo'])

'aardvarkbunnycatdingo'

Split and join can be used together to separate off one part of a string and put the rest back together. For example, if you want to chop off the last word of a multi-word string, you can split it, slice the result, and then join that slice back together, like this:

text = 'every word is important'
' '.join(text.split()[:-1])

'every word is'

If we break that down into steps using variables, it would be:

text = 'every word is important'
words = text.split()
allButLast = words[:-1]
' '.join(allButLast)

'every word is'

8. Mutating list methods for sorting¶

Lists have two methods for sorting. These methods mutate the original list. They are sort and reverse.

Method `sort`¶

numbers = [35, -2, 17, -9, 0, 12, 19]
numbers.sort()

Notice that no value was returned, because sort mutates the original list.

numbers

[-9, -2, 0, 12, 17, 19, 35]

Method `reverse`¶

numbers2 = [35, -2, 17, -9, 0, 12, 19]
numbers2.reverse()

This method also does not return a value, because it too mutates the list.

numbers2

[19, 12, 0, -9, 17, -2, 35]

In combination, sort and reverse can sort a list in reverse order:

numbers2.sort()
numbers2.reverse()
numbers2

[35, 19, 17, 12, 0, -2, -9]

We can use the parameters `key` and `reverse` as well¶

people.sort(key=age, reverse=True)
people

[('Bob Miller', 31),
 ('Janet Doe', 25),
 ('Mary Beth Johnson', 18),
 ('Ed Smith', 17)]

This is the end of this notebook!