1. Opening a file

A file is an object that is created by the built-in function open.

In [1]:
myFile = open('thesis.txt', 'r') # 'r' means open the file for reading
print(myFile)
<_io.TextIOWrapper name='thesis.txt' mode='r' encoding='UTF-8'>

The object returned from open() is a type of file object called io.TextIOWrapper. In short, it is a type of file object that interprets the file as a stream of text.

In [2]:
type(myFile)
Out[2]:
_io.TextIOWrapper

The mode 'r' for reading the file is optional, because the most basic way for opening a file is for reading:

In [3]:
myFile2 = open('thesis.txt')
myFile2
Out[3]:
<_io.TextIOWrapper name='thesis.txt' mode='r' encoding='UTF-8'>

Aside: The encoding 'UTF-8' that you see here is a universal standard for representing characters from all languages. It is similar to ASCII, which we mentioned a while back, you might still remember the ASCII table: http://www.asciitable.com/? However, the Unicode table is much bigger, since it uses 32 instead of 8 bits per character. Here is where to find the Unicode table: https://unicode-table.com/en/#basic-latin.

2. Reading from a file: first cut

The simplest way to read a file is to call .read() on a file object, which returns the contents of the file as a string of characters.

In [4]:
text = myFile.read() # read (as a string) the contents of the file myFile opened above
text
Out[4]:
'Lorem ipsum dolor sit amet, consectetur adipiscing elit. Duis vel fringilla eros. Curabitur vel mollis odio. In vulputate ex et vulputate dignissim. Pellentesque vel leo et ex malesuada facilisis ac sit amet elit. Duis metus arcu, tincidunt vel suscipit quis, venenatis at ex. Duis id lacus sit amet felis tristique hendrerit vel nec nulla. Morbi odio elit, consectetur a rhoncus a, vestibulum at tortor. Suspendisse leo felis, molestie ac justo bibendum, molestie sagittis tortor. Etiam venenatis, leo eget imperdiet pharetra, nulla risus finibus arcu, et hendrerit sem enim eget tortor.\n\nVestibulum commodo euismod enim, eget lacinia eros vehicula id. Phasellus tempor odio in commodo sagittis. Aliquam non condimentum orci. Curabitur gravida blandit nulla id suscipit. Aenean lacus nisl, convallis at felis et, interdum pellentesque eros. Nunc tincidunt, purus at rutrum elementum, ipsum nulla pretium nibh, id placerat odio libero quis libero. In pulvinar tortor cursus orci scelerisque aliquet. Etiam neque purus, posuere ac interdum a, euismod eu tortor. Vestibulum in arcu odio.\n'

Reading the contents of a open file object mutates it. In most cases, this means that subsequent read operations on the same open file will return the empty string:

In [5]:
myFile.read()
Out[5]:
''

If we want to read the file again, we'd need to open a new Python file object for that file. But before we do that, we should close the file; see the next section.

3. Closing Files

It's important to close a file when you are done with it, to make sure that its contents get saved (if you have written to it) and to avoid taking up operating system resources (if you are just reading from it).

There are two ways to open/close files: one is explicit, the other is implicit. We prefer the implicit approach in CS111.

3.1 Open and Close explicitly

In [6]:
f = open('thesis.txt', 'r')  # open the file
text = f.read()              # read the contents of the file as a string
f.close()                    # close the file
text # return the file contents
Out[6]:
'Lorem ipsum dolor sit amet, consectetur adipiscing elit. Duis vel fringilla eros. Curabitur vel mollis odio. In vulputate ex et vulputate dignissim. Pellentesque vel leo et ex malesuada facilisis ac sit amet elit. Duis metus arcu, tincidunt vel suscipit quis, venenatis at ex. Duis id lacus sit amet felis tristique hendrerit vel nec nulla. Morbi odio elit, consectetur a rhoncus a, vestibulum at tortor. Suspendisse leo felis, molestie ac justo bibendum, molestie sagittis tortor. Etiam venenatis, leo eget imperdiet pharetra, nulla risus finibus arcu, et hendrerit sem enim eget tortor.\n\nVestibulum commodo euismod enim, eget lacinia eros vehicula id. Phasellus tempor odio in commodo sagittis. Aliquam non condimentum orci. Curabitur gravida blandit nulla id suscipit. Aenean lacus nisl, convallis at felis et, interdum pellentesque eros. Nunc tincidunt, purus at rutrum elementum, ipsum nulla pretium nibh, id placerat odio libero quis libero. In pulvinar tortor cursus orci scelerisque aliquet. Etiam neque purus, posuere ac interdum a, euismod eu tortor. Vestibulum in arcu odio.\n'

If you try to perform operations on a closed file, you'll get an error.

In [7]:
f.read()
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-7-571e9fb02258> in <module>
----> 1 f.read()

ValueError: I/O operation on closed file.

3.2 Automatic closing with the syntax: with ... as ...

This automatically closes the file opened by open upon exiting the with statement.

In [8]:
with open('thesis.txt', 'r') as f:
    contents = f.read()
# f is implicitly closed here
contents # return the contents
Out[8]:
'Lorem ipsum dolor sit amet, consectetur adipiscing elit. Duis vel fringilla eros. Curabitur vel mollis odio. In vulputate ex et vulputate dignissim. Pellentesque vel leo et ex malesuada facilisis ac sit amet elit. Duis metus arcu, tincidunt vel suscipit quis, venenatis at ex. Duis id lacus sit amet felis tristique hendrerit vel nec nulla. Morbi odio elit, consectetur a rhoncus a, vestibulum at tortor. Suspendisse leo felis, molestie ac justo bibendum, molestie sagittis tortor. Etiam venenatis, leo eget imperdiet pharetra, nulla risus finibus arcu, et hendrerit sem enim eget tortor.\n\nVestibulum commodo euismod enim, eget lacinia eros vehicula id. Phasellus tempor odio in commodo sagittis. Aliquam non condimentum orci. Curabitur gravida blandit nulla id suscipit. Aenean lacus nisl, convallis at felis et, interdum pellentesque eros. Nunc tincidunt, purus at rutrum elementum, ipsum nulla pretium nibh, id placerat odio libero quis libero. In pulvinar tortor cursus orci scelerisque aliquet. Etiam neque purus, posuere ac interdum a, euismod eu tortor. Vestibulum in arcu odio.\n'

The file has been closed by with, even though we didn't close it explicitly:

In [9]:
f.read()
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-9-571e9fb02258> in <module>
----> 1 f.read()

ValueError: I/O operation on closed file.

4. Three ways to read: read, readlines, and readline

These methods read data from the file, but their outputs are different.

read

As shown above, this method returns a single string with the entire contents of the file. For small files, it makes it easy to access the words with only one split command. This method is not recommended for big files.

In [10]:
with open('cities.txt', 'r') as inputFile:
    allText = inputFile.read()
    
allText
Out[10]:
'Wilmington\nPhiladelphia\nBoston\nCharlotte\n'
In [11]:
allText.split()
Out[11]:
['Wilmington', 'Philadelphia', 'Boston', 'Charlotte']

readlines

This method returns a list all lines in the file, where each line is a string ending in the newline character (except possibly the last line). If a list of lines is desired, this is easier than splitting the result of read. Note: This creates a list of the lines in the file. If the file is big, this is a big list that needs to be stored in the memory.

In [12]:
with open('cities.txt', 'r') as inputFile:
    allLines = inputFile.readlines()
    
allLines
Out[12]:
['Wilmington\n', 'Philadelphia\n', 'Boston\n', 'Charlotte\n']

readline

This returns the next line in the file as a string, and it keeps the newline character. Conceptually, it also moves a cursor in the file object to the beginning of the next line, and the next call to readline will read the line starting at that cursor.

In [13]:
with open('cities.txt', 'r') as inputFile:
    line1 = inputFile.readline()
    line2 = inputFile.readline()

(line1, line2)
Out[13]:
('Wilmington\n', 'Philadelphia\n')

Preferred reading method: for loop

Most of the time, we will not be using any of the three methods introduced above. The file object is an iterator that, when used in a for loop, will iterate over the lines of the file without using .readline() or .readlines() explicitly.

In [14]:
def linesFromFile(filename):
    '''Returns a list of all the lines from a file with the given filename. 
    In each line, the terminating newline has been removed.
    '''    
    with open(filename, 'r') as inputFile:
        lines = []
        for line in inputFile: # notice we're not using a method here
            lines.append(line.strip())
        return lines # file still closes even with return in `with` block

print(linesFromFile('cities.txt'))
['Wilmington', 'Philadelphia', 'Boston', 'Charlotte']

5. OS Commands

Python Jupyter notebooks allow us to use some simple operating systems (OS) commands to query the computer's filesystem.

In order to work, these commands must be used in a cell with no other Python code.

pwd

The pwd (print working directory) command shows which directory (folder) in the computer we're currently connected to. Other commands will operate on this directory.

In [15]:
pwd
Out[15]:
'/Users/emustafa/Documents/CS111-Fall21/cs111-site/content/lectures/lec_files/files/lec_files_solns'

ls

The ls command lists the files in the current working directory.

In [16]:
ls
births2020.txt         lec_files_solns.ipynb  studentCities.txt
cities.txt             memories.txt           thesis.txt
lec_files_solns.html   nums.txt               yob2020.csv

The 'ls -l' command lists the files with extra information, including their size (in bytes) and a timestamp of when they were last modified.

In [17]:
ls -l
total 2200
-rw-r--r--  1 emustafa  staff     116 Oct 30 17:20 births2020.txt
-rw-r--r--@ 1 emustafa  staff      41 Aug 13 05:41 cities.txt
-rw-r--r--  1 emustafa  staff  665845 Sep  5 12:42 lec_files_solns.html
-rw-r--r--  1 emustafa  staff   29859 Oct 30 17:25 lec_files_solns.ipynb
-rw-r--r--  1 emustafa  staff     148 Oct 30 17:12 memories.txt
-rw-r--r--@ 1 emustafa  staff      45 Aug 13 05:41 nums.txt
-rw-r--r--@ 1 emustafa  staff     171 Aug 13 05:41 studentCities.txt
-rw-r--r--@ 1 emustafa  staff    1086 Aug 13 05:41 thesis.txt
-rw-rw-r--@ 1 emustafa  staff  399727 Mar 11  2021 yob2020.csv

more

The more command displays the contents of a file. In a Jupyter notebook, they appear in a pop-up window at the bottom of the notebook page; the pop-up can be closed by clicking on the X in its upper right corner.

In [18]:
more cities.txt

Note that the file name does not appear in quotes.

6. Writing Files

To open a file for writing, we use open with the mode 'w'.

The following code will create a new file named memories.txt in the current working directory and write in it several lines of text.

In [19]:
with open('memories.txt', 'w') as memfileW:
    memfileW.write('get coffee\n') # need newlines
    memfileW.write('do CS111 homework\n')
    memfileW.write('vote!\n')

We can use ls -l to see that a new file memories.txt has been created:

In [20]:
ls -l
total 2200
-rw-r--r--  1 emustafa  staff     116 Oct 30 17:20 births2020.txt
-rw-r--r--@ 1 emustafa  staff      41 Aug 13 05:41 cities.txt
-rw-r--r--  1 emustafa  staff  665845 Sep  5 12:42 lec_files_solns.html
-rw-r--r--  1 emustafa  staff   29859 Oct 30 17:25 lec_files_solns.ipynb
-rw-r--r--  1 emustafa  staff      35 Oct 30 17:27 memories.txt
-rw-r--r--@ 1 emustafa  staff      45 Aug 13 05:41 nums.txt
-rw-r--r--@ 1 emustafa  staff     171 Aug 13 05:41 studentCities.txt
-rw-r--r--@ 1 emustafa  staff    1086 Aug 13 05:41 thesis.txt
-rw-rw-r--@ 1 emustafa  staff  399727 Mar 11  2021 yob2020.csv

In your notebook, you should see the new file memories.txt that was just created and has the timestamp to prove it.

Use the OS command more to view the contents of the file:

In [21]:
more memories.txt

Alternatively, go to Finder (on a Mac) or Windows Explorer (PC) to view the contents of the file.

7. Using f-strings when reading/writing files

Let's write formatted strings to our files. Recall that recently we learned about f-strings as a mechanism for generating complex strings without concatenation. Below is an example, notice the letter f before the start of the string.

In [22]:
g, n = ('celery', 34) # tuple assignment
print(f"I need to buy {n} pounds of {g}.\n") # notice the curly braces for the variables
I need to buy 34 pounds of celery.

The great thing about f-strings is that Python will automatically convert non-string values to strings that will fill the holes (the curly braces) in the f-string.

In [23]:
groceries = [('celery', 34), 
             ('rice', 27), 
             ('cholocate chips', 3.5), 
             ('sorbet', 73)]

with open('memories.txt', 'w') as memfile:
    for g, n in groceries:
        memfile.write(f"I need to buy {n} pounds of {g}.\n")
In [24]:
more memories.txt

Note that writing to an existing file erases the previous contents and replaces it by the new contents.

In [25]:
ls -l
total 2200
-rw-r--r--  1 emustafa  staff     116 Oct 30 17:20 births2020.txt
-rw-r--r--@ 1 emustafa  staff      41 Aug 13 05:41 cities.txt
-rw-r--r--  1 emustafa  staff  665845 Sep  5 12:42 lec_files_solns.html
-rw-r--r--  1 emustafa  staff   29859 Oct 30 17:25 lec_files_solns.ipynb
-rw-r--r--  1 emustafa  staff     148 Oct 30 17:27 memories.txt
-rw-r--r--@ 1 emustafa  staff      45 Aug 13 05:41 nums.txt
-rw-r--r--@ 1 emustafa  staff     171 Aug 13 05:41 studentCities.txt
-rw-r--r--@ 1 emustafa  staff    1086 Aug 13 05:41 thesis.txt
-rw-rw-r--@ 1 emustafa  staff  399727 Mar 11  2021 yob2020.csv

8. Exercise 1: Print formatted strings from cities.txt

Take the contents of cities.txt and print out the following:

Line 1: Wilmington
Line 2: Philadelphia
Line 3: Boston
Line 4: Charlotte

Your implementation should safely open and close cities.txt. You should avoid using .read or .readlines(). Remember that the lines will end with \n.

In [26]:
# Your code here
with open('cities.txt', 'r') as citiesFile:
    lineNumber = 0
    for cityLine in citiesFile:
        lineNumber += 1
        print(f"Line {lineNumber}: {cityLine.strip()}")
Line 1: Wilmington
Line 2: Philadelphia
Line 3: Boston
Line 4: Charlotte

9. Appending to files

How do we add lines to the end of an existing file?

We can't open the file in write mode (with a 'w'), because that erases all previous contents and starts with an empty file.

Instead, we open the file in append mode (with an 'a'). Any subsequent writes are made after the existing contents.

In [27]:
with open('memories.txt', 'a') as memfileA:
    memfileA.write('win Nobel prize\n')
    memfileA.write('eat big sundae\n')

Open the file memories.txt again, using the OS command more:

In [28]:
more memories.txt

10. Exceptions

Suppose we misspelled a file name...

In [29]:
memories = linesFromFile("memory.txt")
---------------------------------------------------------------------------
FileNotFoundError                         Traceback (most recent call last)
<ipython-input-29-6b83ddd729d4> in <module>
----> 1 memories = linesFromFile("memory.txt")

<ipython-input-14-109cea94c623> in linesFromFile(filename)
      3     In each line, the terminating newline has been removed.
      4     '''    
----> 5     with open(filename, 'r') as inputFile:
      6         lines = []
      7         for line in inputFile: # notice we're not using a method here

FileNotFoundError: [Errno 2] No such file or directory: 'memory.txt'

An error like this will terminate the execution of a program, which we'd like to avoid.

Can we somehow handle the error programmatically, within our program?

Yes! There are two approaches.

10.1 Avoid the error with if ... else

One way of avoiding the error is by using if ... else in conjunction with the os.path.exists function from the Python os library. This function indicates whether a file or subdirectory exists in the current working directory.

In [30]:
import os

def getLines(filename):
    """If filename names an existing file, return its lines. 
    Otherwise return the empty list."""
   
    if os.path.exists(filename):
        return linesFromFile(filename)
    else:
        return []
In [31]:
getLines('memories.txt')
Out[31]:
['I need to buy 34 pounds of celery.',
 'I need to buy 27 pounds of rice.',
 'I need to buy 3.5 pounds of cholocate chips.',
 'I need to buy 73 pounds of sorbet.',
 'win Nobel prize',
 'eat big sundae']
In [32]:
getLines('memory.txt') # function succeeds with empty list rather than terminating with error.
Out[32]:
[]

10.2 Exception Handling with try ... except

try: statements1 except exceptionType : statements2 first executes statements1. If statements1 executes without error, statements2 are not executed. But if executing statements1 encounters an error, statements2 will be executed.

Dealing with IOError

In [33]:
def getLines(filename):
    """If filename names an existing file, return its lines. 
    Otherwise return the empty list."""
    
    try: 
        return linesFromFile(filename)
    except IOError:
        return []
In [34]:
getLines('memories.txt')
Out[34]:
['I need to buy 34 pounds of celery.',
 'I need to buy 27 pounds of rice.',
 'I need to buy 3.5 pounds of cholocate chips.',
 'I need to buy 73 pounds of sorbet.',
 'win Nobel prize',
 'eat big sundae']
In [35]:
getLines('memory.txt')
Out[35]:
[]

Exception handling example 1: Dealing with ZeroDivision

Have you tried to divide by 0?

In [36]:
10/0
---------------------------------------------------------------------------
ZeroDivisionError                         Traceback (most recent call last)
<ipython-input-36-e574edb36883> in <module>
----> 1 10/0

ZeroDivisionError: division by zero

If we know the error name, we can use it in the except clause:

In [37]:
def print_100_divided_by(n):
    try:
        print(100/n)
    except ZeroDivisionError:
        print('Do not divide by zero.')
In [38]:
print_100_divided_by(4)
25.0
In [39]:
print_100_divided_by(0)
Do not divide by zero.

Exception handling example 2: Dealing with value errors

In [41]:
while True:
    try:
        # Commented out since it may require restarting kernel later...
        i = int(input('Please enter an integer: '))
        #i = 5 # Can use instead of raw_input if want to terminate immediately
        print('Good, you entered', i)
        break # Python keyword to exit a loop
    except ValueError:
        print('Not a valid integer. Try again...')
Please enter an integer: aa
Not a valid integer. Try again...
Please enter an integer: 10a
Not a valid integer. Try again...
Please enter an integer: 10
Good, you entered 10

Exercise 2: City Count

The file studentCities.txt contains a fictional list of US cities from where students hail. Write a function called cityCount that takes a city as a string and returns the number of students who come from that city. You should successfully open and close the file studentCities.txt. Take a look at the contents of studentCities.txt.

In [42]:
more studentCities.txt

Define your function below:

In [43]:
# Your code here
def cityCount(targetCity):
    with open("studentCities.txt", "r") as file:
        count = 0
        for city in file:
            city = city.strip()
            if city == targetCity:
                count += 1
        return count
In [44]:
cityCount("Philadelphia") # should return 4
Out[44]:
4
In [45]:
cityCount("Boston") # should return 3
Out[45]:
3
In [46]:
cityCount("Chicago") # should return 0
Out[46]:
0

Exercise 3: Number List Average

Write a function that takes the numbers in nums.txt and returns the average of those numbers.

In [47]:
more nums.txt
In [48]:
# Your code here
def averageNums():
    with open('nums.txt', 'r') as file:
        summation = 0
        numLines = 0
        for numRow in file:
            num = float(numRow.strip())
            summation += num
            numLines += 1
        return summation/numLines
In [49]:
averageNums() # should return 6.9316666...
Out[49]:
6.9316666666666675

Bonus: Sum Integers

Write a function that sums only the integers in nums.txt. One possible solution is to use try/except to sum all the rows by using the int function. The int function will throw a Value Error if the string is a float.

In [50]:
# Your code here
def intSum():
    with open('nums.txt', 'r') as file:
        summation = 0
        for numRow in file:
            try: 
                num = int(numRow.strip())
                summation += num
            except ValueError:
                pass
        return summation
In [51]:
intSum() # should be 19
Out[51]:
19

Challenge Exercise: reading and writing data

A CSV file is a file that stores data in rows of comma-separated values, displaying a 2D table of information. This is a format that is read/exported by Excel and Google Sheets. We will work more with CSV-s in the coming weeks. For now take a look at the file "yob2020.csv". This file contains real-world data provided by the Social Security Administration (SSA) about the babies born in 2020. In order to provide a Social Security Number to each newborn, the law requires that US states issue birth certificates that include among others the biological sex assigned at birth. The SSA publishes every year a file with all first baby names that were used more than five times, accompanied by the biological sex label (F - female, M - male). The list contains rows in the format:

name,sex,total

Open the file below to see its content.

In [1]:
more yob2020.csv

Your task is to write a function that will:

  1. read in the data from the file,
  2. calculate how many female babies and male babies were born in 2020
  3. write the results into a new file

If done correctly your new file should read as follows:

The total number of female babies born in 2020 is 1598836.
The total number of male babies born in 2020 is 1706423.

You might find splitting using .split by commas useful instead of whitespace.

Your function should use the following helper function:

In [53]:
def printBabies(sex, total):
    """Helper function for printing."""
    return f"The total number of {sex} babies born in 2020 is {total}.\n"
In [54]:
def reportBirthStats():
    # Your code here
    # Read the file, iterate through it and keep track of baby name totals.
    with open("yob2020.csv", "r") as inputF:
        femaleTotal, maleTotal = 0, 0 # accumulator variables
        for line in inputF:
            name, sex, totalName = line.strip().split(',') # tuple assignment
            if sex == 'F':
                femaleTotal += int(totalName)
            elif sex == 'M':
                maleTotal += int(totalName)
               
    # write file with statistics
    with open("births2020.txt", "w") as outputF:
        outputF.write(printBabies("female", femaleTotal))
        outputF.write(printBabies("male", maleTotal))
In [55]:
reportBirthStats()
In [56]:
more births2020.txt

That's all folks!