1. Opening a file

A file is an object that is created by the built-in function open.

In [1]:
myFile = open('thesis.txt', 'r') # 'r' means open the file for reading
print(myFile)
<_io.TextIOWrapper name='thesis.txt' mode='r' encoding='US-ASCII'>

The object returned from open() is a type of file object called io.TextIOWrapper. In short, it is a type of file object that interprets the file as a stream of text.

In [2]:
type(myFile)
Out[2]:
_io.TextIOWrapper

The mode 'r' for reading the file is optional, because the most basic way for opening a file is for reading:

In [3]:
myFile2 = open('thesis.txt')
myFile2
Out[3]:
<_io.TextIOWrapper name='thesis.txt' mode='r' encoding='US-ASCII'>

Aside: Note that by default Python assumes that your text file (in this case, thesis.txt) is encoded in ASCII. Recall, the ASCII table: http://www.asciitable.com/. There are other types of encodings for text, in particular Unicode encoding which encompasses most of the world's writing characters. To learn more about how the computer interprets text encoding, take CS240!

2. Reading from a file: first cut

The simplest way to read a file is to call .read() on a file object, which returns the contents of the file as a string of characters.

In [4]:
text = myFile.read() # read (as a string) the contents of the file myFile opened above
text
Out[4]:
'Lorem ipsum dolor sit amet, consectetur adipiscing elit. Duis vel fringilla eros. Curabitur vel mollis odio. In vulputate ex et vulputate dignissim. Pellentesque vel leo et ex malesuada facilisis ac sit amet elit. Duis metus arcu, tincidunt vel suscipit quis, venenatis at ex. Duis id lacus sit amet felis tristique hendrerit vel nec nulla. Morbi odio elit, consectetur a rhoncus a, vestibulum at tortor. Suspendisse leo felis, molestie ac justo bibendum, molestie sagittis tortor. Etiam venenatis, leo eget imperdiet pharetra, nulla risus finibus arcu, et hendrerit sem enim eget tortor.\n\nVestibulum commodo euismod enim, eget lacinia eros vehicula id. Phasellus tempor odio in commodo sagittis. Aliquam non condimentum orci. Curabitur gravida blandit nulla id suscipit. Aenean lacus nisl, convallis at felis et, interdum pellentesque eros. Nunc tincidunt, purus at rutrum elementum, ipsum nulla pretium nibh, id placerat odio libero quis libero. In pulvinar tortor cursus orci scelerisque aliquet. Etiam neque purus, posuere ac interdum a, euismod eu tortor. Vestibulum in arcu odio.\n'

Reading the contents of a open file object mutates it. In most cases, this means that subsequent read operations on the same open file will return the empty string:

In [5]:
myFile.read()
Out[5]:
''

If we want to read the file again, we'd need to open a new Python file object for that file. But before we do that, we should close the file; see the next section.

3. Closing Files

It's important to close a file when you are done with it, to make sure that its contents get saved (if you have written to it) and to avoid taking up operating system resources (if you are just reading from it).

There are two ways to open/close files: one is explicit, the other is implicit. We prefer the implicit approach in CS111.

3.1 Open and Close explicitly

In [6]:
f = open('thesis.txt', 'r')  # open the file
text = f.read()              # read the contents of the file as a string
f.close()                    # close the file
text # return the file contents
Out[6]:
'Lorem ipsum dolor sit amet, consectetur adipiscing elit. Duis vel fringilla eros. Curabitur vel mollis odio. In vulputate ex et vulputate dignissim. Pellentesque vel leo et ex malesuada facilisis ac sit amet elit. Duis metus arcu, tincidunt vel suscipit quis, venenatis at ex. Duis id lacus sit amet felis tristique hendrerit vel nec nulla. Morbi odio elit, consectetur a rhoncus a, vestibulum at tortor. Suspendisse leo felis, molestie ac justo bibendum, molestie sagittis tortor. Etiam venenatis, leo eget imperdiet pharetra, nulla risus finibus arcu, et hendrerit sem enim eget tortor.\n\nVestibulum commodo euismod enim, eget lacinia eros vehicula id. Phasellus tempor odio in commodo sagittis. Aliquam non condimentum orci. Curabitur gravida blandit nulla id suscipit. Aenean lacus nisl, convallis at felis et, interdum pellentesque eros. Nunc tincidunt, purus at rutrum elementum, ipsum nulla pretium nibh, id placerat odio libero quis libero. In pulvinar tortor cursus orci scelerisque aliquet. Etiam neque purus, posuere ac interdum a, euismod eu tortor. Vestibulum in arcu odio.\n'

If you try to perform operations on a closed file, you'll get an error.

In [7]:
f.read()
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-7-bacd0e0f09a3> in <module>()
----> 1 f.read()

ValueError: I/O operation on closed file.

3.2 Automatic closing with the syntax: with ... as ...

This automatically closes the file opened by open upon exiting the with statement.

In [8]:
with open('thesis.txt', 'r') as f:
    contents = f.read()
# f is implicitly closed here
contents # return the contents
Out[8]:
'Lorem ipsum dolor sit amet, consectetur adipiscing elit. Duis vel fringilla eros. Curabitur vel mollis odio. In vulputate ex et vulputate dignissim. Pellentesque vel leo et ex malesuada facilisis ac sit amet elit. Duis metus arcu, tincidunt vel suscipit quis, venenatis at ex. Duis id lacus sit amet felis tristique hendrerit vel nec nulla. Morbi odio elit, consectetur a rhoncus a, vestibulum at tortor. Suspendisse leo felis, molestie ac justo bibendum, molestie sagittis tortor. Etiam venenatis, leo eget imperdiet pharetra, nulla risus finibus arcu, et hendrerit sem enim eget tortor.\n\nVestibulum commodo euismod enim, eget lacinia eros vehicula id. Phasellus tempor odio in commodo sagittis. Aliquam non condimentum orci. Curabitur gravida blandit nulla id suscipit. Aenean lacus nisl, convallis at felis et, interdum pellentesque eros. Nunc tincidunt, purus at rutrum elementum, ipsum nulla pretium nibh, id placerat odio libero quis libero. In pulvinar tortor cursus orci scelerisque aliquet. Etiam neque purus, posuere ac interdum a, euismod eu tortor. Vestibulum in arcu odio.\n'

The file has been closed by with, even though we didn't close it explicitly:

In [9]:
f.read()
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-9-bacd0e0f09a3> in <module>()
----> 1 f.read()

ValueError: I/O operation on closed file.

4. Three ways to read: read, readlines, and readline

These methods read data from the file, but their outputs are different.

read

As shown above, this method returns a single string with the entire contents of the file. For small files, it makes it easy to access the words with only one split command. This method is not recommended for big files.

In [10]:
with open('hogwarts.txt', 'r') as inputFile:
    allText = inputFile.read()
    
allText
Out[10]:
'Gryffindor\nHufflepuff\nRavenclaw\nSlytherin\n'
In [11]:
allText.split()
Out[11]:
['Gryffindor', 'Hufflepuff', 'Ravenclaw', 'Slytherin']

readlines

This method returns a list all lines in the file, where each line is a string ending in the newline character (except possibly the last line). If a list of lines is desired, this is easier than splitting the result of read. Note: This creates a list of the lines in the file. If the file is big, this is a big list that needs to be stored in the memory.

In [12]:
with open('hogwarts.txt', 'r') as inputFile:
    allLines = inputFile.readlines()
    
allLines
Out[12]:
['Gryffindor\n', 'Hufflepuff\n', 'Ravenclaw\n', 'Slytherin\n']

readline

This returns the next line in the file as a string, and it keeps the newline character. Conceptually, it also moves a cursor in the file object to the beginning of the next line, and the next call to readline will read the line starting at that cursor.

In [13]:
with open('hogwarts.txt', 'r') as inputFile:
    line1 = inputFile.readline()
    line2 = inputFile.readline()

(line1, line2)
Out[13]:
('Gryffindor\n', 'Hufflepuff\n')

Preferred reading method: list comprehension

Most of the time, we will not be using any of the three methods introduced above.
The file object is an iterator that, when used in a for loop, will iterate over the lines of the file without using .readline() or .readlines() explicitly.

In [14]:
def linesFromFile(filename):
    '''Returns a list of all the lines from a file with the given filename. 
    In each line, the terminating newline has been removed.
    '''    
    with open(filename, 'r') as inputFile:
        return [line.strip() for line in inputFile] # .strip() removes leading
                                                    # and trailing whitespace. 


print(linesFromFile('hogwarts.txt'))
['Gryffindor', 'Hufflepuff', 'Ravenclaw', 'Slytherin']

5. OS Commands

Canopy and Python notebooks allow us to use some simple operating systems (OS) commands to query the computer's filesystem.

In order to work, these commands must be used in a cell with no other Python code.

pwd

The pwd (print working directory) command show which directory (folder) in the computer we're currently connected to. Other commands will operate on this directory.

In [15]:
pwd
Out[15]:
'/Users/andrewdavis/Desktop/Lecture17_Files/lec17_files_solns'

ls

The ls command lists the files in the current working directory.

In [16]:
ls
hogwarts.txt*            students.csv*            titanic.txt*
lec17_files_solns.ipynb* thesis.txt*

The 'ls -l' command lists the files with extra information, including their size (in bytes) and a timestamp of when they were last modified.

In [17]:
ls -l
total 472
-rwxrwx---@ 1 andrewdavis  staff      42 Apr 10  2018 hogwarts.txt*
-rwxrwx---@ 1 andrewdavis  staff   67329 Apr 13 19:04 lec17_files_solns.ipynb*
-rwxrwx---@ 1 andrewdavis  staff     217 Nov  5 22:55 students.csv*
-rwxrwx---@ 1 andrewdavis  staff    1086 Apr 10  2018 thesis.txt*
-rwxrwx---@ 1 andrewdavis  staff  159080 Apr 10  2018 titanic.txt*

more

The more command displays the contents of a file. In Canopy, the contents are displayed in the interactive pane. In a Jupyter notebook, they appear in a pop-up window at the bottom of the notebook page; the pop-up can be closed by clicking on the X in its upper right corner.

In [18]:
more hogwarts.txt

Note that the file name does not appear in quotes.

6. Writing Files

To open a file for writing, we use open with the mode 'w'.

The following code will create a new file named memories.txt in the current working directory and write in it several lines of text.

In [19]:
with open('memories.txt', 'w') as memfileW:
    memfileW.write('get coffee\n') # need newlines
    memfileW.write('do CS111 homework\n')
    memfileW.write('vote in midterm elections!\n')

We can use ls -l to see that a new file memories.txt has been created:

In [20]:
ls -l
total 480
-rwxrwx---@ 1 andrewdavis  staff      42 Apr 10  2018 hogwarts.txt*
-rwxrwx---@ 1 andrewdavis  staff   67329 Apr 13 19:04 lec17_files_solns.ipynb*
-rw-r--r--  1 andrewdavis  staff      56 Apr 13 19:05 memories.txt
-rwxrwx---@ 1 andrewdavis  staff     217 Nov  5 22:55 students.csv*
-rwxrwx---@ 1 andrewdavis  staff    1086 Apr 10  2018 thesis.txt*
-rwxrwx---@ 1 andrewdavis  staff  159080 Apr 10  2018 titanic.txt*

In your notebook, you should see the new file memories.txt that was just created and has the timestamp to prove it.

Use the OS command more to view the contents of the file:

In [21]:
more memories.txt

Alternatively, go to Finder (on a Mac) or Windows Explorer (PC) to view the contents of the file.

7. Formatting strings with File Reading

Let's write formatted strings to our files. Recall the format method that fills the holes represented by curly brackets:

In [22]:
# the curly braces are "holes" that will be filled with values
formatTemplate = 'I need to buy {} pounds of {}.\n' 

Think of the {} in the string above as holes that will be filled in with values. The rest of the information, indicates what else will be in the final string. Python will automatically convert non-string values to strings that will fill the holes in the template.

In [23]:
groceries = { 'celery': 34, 'rice': 27, 
             'cholocate chips': 3.5, 'sorbet': 73}

with open('memories.txt', 'w') as memfile:
    for g, n in groceries.items():
        memfile.write(formatTemplate.format(n, g))
In [24]:
more memories.txt

Note that writing to an existing file erases the previous contents and replaces it by the new contents.

In [25]:
ls -l
total 480
-rwxrwx---@ 1 andrewdavis  staff      42 Apr 10  2018 hogwarts.txt*
-rwxrwx---@ 1 andrewdavis  staff   67329 Apr 13 19:04 lec17_files_solns.ipynb*
-rw-r--r--  1 andrewdavis  staff     148 Apr 13 19:05 memories.txt
-rwxrwx---@ 1 andrewdavis  staff     217 Nov  5 22:55 students.csv*
-rwxrwx---@ 1 andrewdavis  staff    1086 Apr 10  2018 thesis.txt*
-rwxrwx---@ 1 andrewdavis  staff  159080 Apr 10  2018 titanic.txt*

8. Exercise 1: Print formatted strings from Hogwarts.txt

Take the contents of hogwarts.txt and print out the following:

This is the house Gryffindor from line 1
This is the house Hufflepuff from line 2
This is the house Ravenclaw from line 3
This is the house Slytherin from line 4

Your implementation should safely open and close hogwarts.txt. You should avoid using .read or .readlines(). Remember that the lines will end with \n.

In [26]:
# Your code here
with open('hogwarts.txt', 'r') as hogwartsFile:
    lineNumber = 0
    for line in hogwartsFile:
        lineNumber += 1
        print("This is the house {} from line {}".format(house.strip(), lineNumber))
---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
<ipython-input-26-64dffc9d4c24> in <module>()
      4     for line in hogwartsFile:
      5         lineNumber += 1
----> 6         print("This is the house {} from line {}".format(house.strip(), lineNumber))

NameError: name 'house' is not defined

9. Appending to files

How do we add lines to the end of an existing file?

We can't open the file in write mode (with a 'w'), because that erases all previous contents and starts with an empty file.

Instead, we open the file in append mode (with an 'a'). Any subsequent writes are made after the existing contents.

In [27]:
with open('memories.txt', 'a') as memfileA:
    memfileA.write('win Nobel prize\n')
    memfileA.write('eat big sundae\n')

Open the file memories.txt again, using the OS command more:

In [28]:
more memories.txt

10. Exceptions

Suppose we misspelled a file name...

In [29]:
memories = linesFromFile("memory.txt")
---------------------------------------------------------------------------
FileNotFoundError                         Traceback (most recent call last)
<ipython-input-29-16a62560ace8> in <module>()
----> 1 memories = linesFromFile("memory.txt")

<ipython-input-14-c811a5558af2> in linesFromFile(filename)
      3     In each line, the terminating newline has been removed.
      4     '''    
----> 5     with open(filename, 'r') as inputFile:
      6         return [line.strip() for line in inputFile] # .strip() removes leading
      7                                                     # and trailing whitespace.

FileNotFoundError: [Errno 2] No such file or directory: 'memory.txt'

An error like this will terminate the execution of a program, which we'd like to avoid.

Can we somehow handle the error programmatically, within our program?

Yes! There are two approaches.

10.1 Avoid the error with if ... else

One way of avoiding the error is by using if ... else in conjunction with the os.path.exists function from the Python os library. This function indicates whether a file or subdirectory exists in the current working directory.

In [30]:
import os

def getLines(filename):
    """If filename names an existing file, return its lines. 
    Otherwise return the empty list."""
   
    if os.path.exists(filename):
        return linesFromFile(filename)
    else:
        return []
In [31]:
getLines('memories.txt')
Out[31]:
['I need to buy 34 pounds of celery.',
 'I need to buy 73 pounds of sorbet.',
 'I need to buy 3.5 pounds of cholocate chips.',
 'I need to buy 27 pounds of rice.',
 'win Nobel prize',
 'eat big sundae']
In [32]:
getLines('memory.txt') # function succeeds with empty list rather than terminating with error.
Out[32]:
[]

10.2 Exception Handling with try ... except

try: statements1 except exceptionType : statements2 first executes statements1. If statements1 executes without error, statements2 are not executed. But if executing statements1 encounters an error, statements2 will be executed.

Dealing with IOError

In [33]:
def getLines(filename):
    """If filename names an existing file, return its lines. 
    Otherwise return the empty list."""
    
    try: 
        return linesFromFile(filename)
    except IOError:
        return []
In [34]:
getLines('memories.txt')
Out[34]:
['I need to buy 34 pounds of celery.',
 'I need to buy 73 pounds of sorbet.',
 'I need to buy 3.5 pounds of cholocate chips.',
 'I need to buy 27 pounds of rice.',
 'win Nobel prize',
 'eat big sundae']
In [35]:
getLines('memory.txt')
Out[35]:
[]

Exception handling example 1: Dealing with ZeroDivision

Have you tried to divide by 0?

In [36]:
10/0
---------------------------------------------------------------------------
ZeroDivisionError                         Traceback (most recent call last)
<ipython-input-36-242277fd9e32> in <module>()
----> 1 10/0

ZeroDivisionError: division by zero

If we know the error name, we can use it in the except clause:

In [37]:
def print_100_divided_by(n):
    try:
        print(100/n)
    except ZeroDivisionError:
        print('Do not divide by zero.')
In [38]:
print_100_divided_by(4)
25.0
In [39]:
print_100_divided_by(0)
Do not divide by zero.

Exception handling example 2: Dealing with value errors

In [40]:
while True:
    try:
        # Commented out since it may require restarting kernel later...
        i = int(input('Please enter an integer: '))
        #i = 5 # Can use instead of input if want to terminate immediately
        print('Good, you entered', i)
        break # Python keyword to exit a loop
    except ValueError:
        print('Not a valid integer. Try again...')
Please enter an integer: a
Not a valid integer. Try again...
Please enter an integer: b
Not a valid integer. Try again...
Please enter an integer: 23
Good, you entered 23

11. Reading Tuples from CSV Files

When reading a CSV file, we want to store data in a list of tuples for easier processing.

In [41]:
def tuplesFromFile(filename):
    '''Read each line from opened file, 
    strip white space, 
    split at commas, 
    convert to a tuple and 
    return a list of tuples.
    ''' 
    with open(filename, 'r') as inputFile:
        return [tuple(line.strip().split(',')) for line in inputFile] 
    
tuplesFromFile('students.csv')
Out[41]:
[('Harry Potter', '2021', 'Gryffindor', '1'),
 ('Hermione Granger', '2021', 'Gryffindor', '2'),
 ('Luna Lovegood', '2022', 'Ravenclaw', '3'),
 ('Fred Weasly', '2019', 'Gryffindor', '1'),
 ('Cho Chang', '2020', 'Ravenclaw', '2'),
 ('Cedric Diggory', '2019', 'Hufflepuff', '3'),
 ('Draco Malfoy', '2021', 'Slytherin', '1')]

NOTE: When reading from a file, everything is a string. Keep this in mind, because if you want
to perform arithmetic operations with those values, you need to convert them to integer or float before anything else.

11.1 Dissecting the code

The list comprehension allows us to read one line at a time, process that line and keep it in the list. If we use readlines, we'll see the raw data. We're doing this, because the file is small.

In [42]:
with open('students.csv', 'r') as inputFile:
    content = inputFile.readlines()
    
content
Out[42]:
['Harry Potter,2021,Gryffindor,1\n',
 'Hermione Granger,2021,Gryffindor,2\n',
 'Luna Lovegood,2022,Ravenclaw,3\n',
 'Fred Weasly,2019,Gryffindor,1\n',
 'Cho Chang,2020,Ravenclaw,2\n',
 'Cedric Diggory,2019,Hufflepuff,3\n',
 'Draco Malfoy,2021,Slytherin,1\n']

With the string method strip we can get a new string with the whitespace removed from the beginning and end (the newline or other space charaters), e.g.:

In [43]:
content[0].strip()
Out[43]:
'Harry Potter,2021,Gryffindor,1'

Next step is to split at commas. This will return a list:

In [44]:
content[0].strip().split(',')
Out[44]:
['Harry Potter', '2021', 'Gryffindor', '1']

Using the function tuple will create a new tuple from the list.

In [45]:
tuple(content[0].strip().split(','))
Out[45]:
('Harry Potter', '2021', 'Gryffindor', '1')

11.2 Making dictionaries out of tuples

Small tuples are useful in small contexts, but when using data that has many subparts (and may gain more later) or may be used in many different parts of a program, names becomes more useful than indices. Dictionararies can be a good fit.

In [46]:
# read list of students from file
students = tuplesFromFile('students.csv')

# Create a new list mapping each student tuple to a student dictionary.
studentDictList = [ {'name': st[0], 'year': st[1], 
                     'dorm': st[2], 'section': st[3]} for st in students ]

print(len(studentDictList), "students")
7 students
In [47]:
studentDictList[2]
Out[47]:
{'dorm': 'Ravenclaw', 'name': 'Luna Lovegood', 'section': '3', 'year': '2022'}

12. Exercise 2: Mapping and filtering

Here are examples of mapping and filtering with studentDictList

In [48]:
# Reminder: mapping with list comprehension
[ sd['dorm'] for sd in studentDictList ]
Out[48]:
['Gryffindor',
 'Gryffindor',
 'Ravenclaw',
 'Gryffindor',
 'Ravenclaw',
 'Hufflepuff',
 'Slytherin']
In [49]:
# Reminder: filtering with list comprehension
[ sd['name'] for sd in studentDictList if sd['year'] == '2021' ]
Out[49]:
['Harry Potter', 'Hermione Granger', 'Draco Malfoy']

Below, write the code indicated by the comments

In [50]:
# show the list of first names ['Harry', 'Hermione', ...] for all students
[ sd['name'].split()[0] for sd in studentDictList ]
Out[50]:
['Harry', 'Hermione', 'Luna', 'Fred', 'Cho', 'Cedric', 'Draco']
In [51]:
# show the list of last names and years: ["Potter '21", "Granger '21", ...]
# you can use the method format that you learned above
[ "{} '{}".format(sd['name'].split()[-1], sd['year'][2:]) for sd in studentDictList ]
Out[51]:
["Potter '21",
 "Granger '21",
 "Lovegood '22",
 "Weasly '19",
 "Chang '20",
 "Diggory '19",
 "Malfoy '21"]
In [52]:
# show the list of student names who live in 'Gryffindor': ['Harry Potter', 'Hermione Granger', ...]
[ sd['name'] for sd in studentDictList if sd['dorm'] == 'Gryffindor' ]
Out[52]:
['Harry Potter', 'Hermione Granger', 'Fred Weasly']
In [53]:
# show the list of student names whose first name is 4 letters or less: ['Fred Weasley', 'Luna Lovegood', 'Cho Chang']
[ sd['name'] for sd in studentDictList if len(sd['name'].split()[0]) <= 4 ]
Out[53]:
['Luna Lovegood', 'Fred Weasly', 'Cho Chang']

BONUS: Display the list of student dictionaries sorted first by the section number, then by name.

In [54]:
# use sorted and lambda
sorted(studentDictList, key=lambda dct: (dct['section'], dct['name']))
Out[54]:
[{'dorm': 'Slytherin', 'name': 'Draco Malfoy', 'section': '1', 'year': '2021'},
 {'dorm': 'Gryffindor', 'name': 'Fred Weasly', 'section': '1', 'year': '2019'},
 {'dorm': 'Gryffindor',
  'name': 'Harry Potter',
  'section': '1',
  'year': '2021'},
 {'dorm': 'Ravenclaw', 'name': 'Cho Chang', 'section': '2', 'year': '2020'},
 {'dorm': 'Gryffindor',
  'name': 'Hermione Granger',
  'section': '2',
  'year': '2021'},
 {'dorm': 'Hufflepuff',
  'name': 'Cedric Diggory',
  'section': '3',
  'year': '2019'},
 {'dorm': 'Ravenclaw',
  'name': 'Luna Lovegood',
  'section': '3',
  'year': '2022'}]

13. Exercise 3: Organize by year

Often, available data will not be organized in any useful way. We might want to "slice and dice" them according to our interests. That usually means creating new data structures via dictionaries and lists.

Flesh out the function organizeByYear below to create and return a new dictionary where the keys are the class years (e.g., '2018', etc.) and the values are a list of all students in that year.

{'2019': [{'dorm': 'Gryffindor',
   'name': 'Fred Weasly',
   'section': '1',
   'year': '2018'},
  {'dorm': 'Hufflepuff',
   'name': 'Cedric Diggory',
   'section': '3',
   'year': '2018'}],
 '2020': [{'dorm': 'Ravenclaw',
   'name': 'Cho Chang',
   'section': '2',
   'year': '2019'}],
 '2021': [...], # not shown here 
 '2022': [...]} # not shown here

Solution 1: Check before appending

In [55]:
def organizeByYear(stdDctLst):
    """Return a new dictionary where keys are class years
    and values a list of all students in that year.
    """
    byYearDct = {}
    
    # put your code here
    
    for std in stdDctLst:
        year = std['year']
        # check if there is a key in the dict, if not, add it
        if year not in byYearDct:
            byYearDct[year] = []

        # now that we know the year is in the dict, append to it
        byYearDct[year].append(std)
        
    return byYearDct
In [56]:
organizeByYear(studentDictList)
Out[56]:
{'2019': [{'dorm': 'Gryffindor',
   'name': 'Fred Weasly',
   'section': '1',
   'year': '2019'},
  {'dorm': 'Hufflepuff',
   'name': 'Cedric Diggory',
   'section': '3',
   'year': '2019'}],
 '2020': [{'dorm': 'Ravenclaw',
   'name': 'Cho Chang',
   'section': '2',
   'year': '2020'}],
 '2021': [{'dorm': 'Gryffindor',
   'name': 'Harry Potter',
   'section': '1',
   'year': '2021'},
  {'dorm': 'Gryffindor',
   'name': 'Hermione Granger',
   'section': '2',
   'year': '2021'},
  {'dorm': 'Slytherin',
   'name': 'Draco Malfoy',
   'section': '1',
   'year': '2021'}],
 '2022': [{'dorm': 'Ravenclaw',
   'name': 'Luna Lovegood',
   'section': '3',
   'year': '2022'}]}

Solution 2: use get

In [57]:
def organizeByYear2(stdDctLst):
    """Return a new dictionary where keys are class years
    and values a list of all students in that year.
    """
    byYearDct = {}
    
    # put your code here
    
    for std in stdDctLst:
        year = std['year']
        byYearDct[year] = byYearDct.get(year, [])
        byYearDct[year].append(std)
            
    return byYearDct
In [58]:
organizeByYear2(studentDictList)
Out[58]:
{'2019': [{'dorm': 'Gryffindor',
   'name': 'Fred Weasly',
   'section': '1',
   'year': '2019'},
  {'dorm': 'Hufflepuff',
   'name': 'Cedric Diggory',
   'section': '3',
   'year': '2019'}],
 '2020': [{'dorm': 'Ravenclaw',
   'name': 'Cho Chang',
   'section': '2',
   'year': '2020'}],
 '2021': [{'dorm': 'Gryffindor',
   'name': 'Harry Potter',
   'section': '1',
   'year': '2021'},
  {'dorm': 'Gryffindor',
   'name': 'Hermione Granger',
   'section': '2',
   'year': '2021'},
  {'dorm': 'Slytherin',
   'name': 'Draco Malfoy',
   'section': '1',
   'year': '2021'}],
 '2022': [{'dorm': 'Ravenclaw',
   'name': 'Luna Lovegood',
   'section': '3',
   'year': '2022'}]}

14. Working with JSON

JSON (JavaScript Object Notation) is a standard format for encoding into a string (possibly nested) lists and dictionaries whose “leaves” are numbers/strings/booleans.

In [59]:
hogwartsStudents = [{'dorm': 'Gryffindor', 'name': 'Hermione Granger', 
                     'psets': [100, 100, 100, 100], 'section': 2, 'year': 2020},
                    {'dorm': 'Gryffindor', 'name': 'Harry Potter',
                     'psets': [87, 92, 95, 62], 'section': 1, 'year': 2017},
                    {'dorm': 'Ravenclaw','name': 'Luna Lovegood',
                     'psets': [75, 82, 86, 93], 'section': 3, 'year': 2018}]
In [60]:
hogwartsStudents
Out[60]:
[{'dorm': 'Gryffindor',
  'name': 'Hermione Granger',
  'psets': [100, 100, 100, 100],
  'section': 2,
  'year': 2020},
 {'dorm': 'Gryffindor',
  'name': 'Harry Potter',
  'psets': [87, 92, 95, 62],
  'section': 1,
  'year': 2017},
 {'dorm': 'Ravenclaw',
  'name': 'Luna Lovegood',
  'psets': [75, 82, 86, 93],
  'section': 3,
  'year': 2018}]

14.1 JSON module has four functions: dumps, loads, dump, load

Let's invoke the module json and store our dictionary as a string.

In [61]:
import json
hogwartsJSON = json.dumps(hogwartsStudents)
hogwartsJSON
Out[61]:
'[{"section": 2, "name": "Hermione Granger", "year": 2020, "dorm": "Gryffindor", "psets": [100, 100, 100, 100]}, {"section": 1, "name": "Harry Potter", "year": 2017, "dorm": "Gryffindor", "psets": [87, 92, 95, 62]}, {"section": 3, "name": "Luna Lovegood", "year": 2018, "dorm": "Ravenclaw", "psets": [75, 82, 86, 93]}]'

Converting this string back to a dictionary is really simple with the json module:

In [62]:
hogwartsStudents2 = json.loads(hogwartsJSON)
hogwartsStudents2
Out[62]:
[{'dorm': 'Gryffindor',
  'name': 'Hermione Granger',
  'psets': [100, 100, 100, 100],
  'section': 2,
  'year': 2020},
 {'dorm': 'Gryffindor',
  'name': 'Harry Potter',
  'psets': [87, 92, 95, 62],
  'section': 1,
  'year': 2017},
 {'dorm': 'Ravenclaw',
  'name': 'Luna Lovegood',
  'psets': [75, 82, 86, 93],
  'section': 3,
  'year': 2018}]
In [63]:
hogwartsStudents == hogwartsStudents2
Out[63]:
True

14.2 Write/Read JSON files

Since JSON is just a string, it’s trivial to write it to and read it from files. We could use the methods read and write with the JSON strings, created with dumps and loads. However, it is easier to use the specific JSON functions: dump and load, that directly write into or read from a file.

In [64]:
# WRITING to a file

with open('hogwartsStudents.json', 'w') as fw:
    json.dump(hogwartsStudents, fw)
In [65]:
ls -l
total 488
-rwxrwx---@ 1 andrewdavis  staff      42 Apr 10  2018 hogwarts.txt*
-rw-r--r--  1 andrewdavis  staff     317 Apr 13 19:06 hogwartsStudents.json
-rwxrwx---@ 1 andrewdavis  staff   67329 Apr 13 19:04 lec17_files_solns.ipynb*
-rw-r--r--  1 andrewdavis  staff     179 Apr 13 19:05 memories.txt
-rwxrwx---@ 1 andrewdavis  staff     217 Nov  5 22:55 students.csv*
-rwxrwx---@ 1 andrewdavis  staff    1086 Apr 10  2018 thesis.txt*
-rwxrwx---@ 1 andrewdavis  staff  159080 Apr 10  2018 titanic.txt*
In [66]:
# READING from a file

with open('hogwartsStudents.json', 'r') as fr:
    hogwartsStudents4 = json.load(fr)
In [67]:
hogwartsStudents4 == hogwartsStudents
Out[67]:
True

14.3 Supplemental information: Formatting JSON

We can look into the file we just created and see that JSON is stored as a single big line of text.

In [68]:
more hogwartsStudents.json

We can format JSON files to make reading easier:

In [69]:
with open('hogwartsStudentsFormatted.json', 'w') as fw:
    json.dump(hogwartsStudents, fw, sort_keys= True, indent=2)
In [70]:
ls -l
total 496
-rwxrwx---@ 1 andrewdavis  staff      42 Apr 10  2018 hogwarts.txt*
-rw-r--r--  1 andrewdavis  staff     317 Apr 13 19:06 hogwartsStudents.json
-rw-r--r--  1 andrewdavis  staff     487 Apr 13 19:06 hogwartsStudentsFormatted.json
-rwxrwx---@ 1 andrewdavis  staff   67329 Apr 13 19:04 lec17_files_solns.ipynb*
-rw-r--r--  1 andrewdavis  staff     179 Apr 13 19:05 memories.txt
-rwxrwx---@ 1 andrewdavis  staff     217 Nov  5 22:55 students.csv*
-rwxrwx---@ 1 andrewdavis  staff    1086 Apr 10  2018 thesis.txt*
-rwxrwx---@ 1 andrewdavis  staff  159080 Apr 10  2018 titanic.txt*
In [71]:
more hogwartsStudentsFormatted.json