Instructions for fileops

(produced at 05:35 a.m. UTC on 2021-11-10)

This task is part of ps07 which is due at 23:59 EST on 2021-11-09.

You have the option to work with a partner on this task if you wish. Working with a partner requires more work to coordinate schedules, but if you work together and make sure that you are both understanding the code you write, you will make progress faster and learn more.

You can submit this task using this link.

Put all of your work for this task into the file fileops.py
(which is provided among the starter files)

Note: This is the completed version of the fileops problem.

On 2021/11/07, some new notes were added (search for 2021/11/07) and some rubric items were removed or made more lenient.

This task involves defining several functions that leverage Python's ability to read and write files as well as list the contents of a directory (i.e., folder). For two of the functions, you will define an associated testing function that uses optimism to test that your functions have the correct behavior.

The problem set folder contains several files and directories whose purpose is to be used in test cases for this problem. Here are the directories and files that are provided for testing:

|-- books
|   |-- jane_austen
|   |   |-- Pride_and_Prejudice.txt
|   |   |-- Mansfield_Park.txt
|   |-- charlotte_perkins_gilman
|   |   |-- The_Yellow_Wallpaper.txt
|   |   |-- Women_and_Economics.txt
|   |-- oscar_wilde
|   |   |-- The_Importance_of_Being_Earnest.txt
|   |   |-- The_Picture_of_Dorian_Gray.txt
|-- testdir
|   |-- remember
|   |   |-- persistent.py
|   |   |-- memories.txt
|   |   |-- ephemeral.py
|   |-- hyp.py
|   |-- pubs.json
|   |-- pset
|   |   |-- scene
|   |   |   |-- cs1graphics.py
|   |   |-- shrub
|   |   |   |-- images
|   |   |   |   |-- shrub2.png
|   |   |   |   |-- shrub1.png
|   |   |   |-- shrub.py
|   |-- .numbers
|   |-- tracks.csv
|-- titanic.txt
|-- tolkien-poem.txt
|-- yob2020.csv

A few notes:

  • The books directory is used as an example of a directory in the File Directories lecture. The .txt files nested within the books directory contain the full text of several novels
  • The testdir directory is also used as an example of a directory in the File Directories lecture.
  • The titanic.txt file has info about all passengers on the Titanic's last voyage.
  • The tolkien-poem file contains J. R. R. Tolkien's "Rings" poem from Lord of the Rings.
  • The yob2020.csv file contains first names of babies born in the US during 2020, as published by the Social Security Administration. This file is used as an example in the File Input/Output lecture.

File Functions To Define

There are five functions to define in this task, whose specs are below.

1. fileStats(filePath)

Assume that filePath is a string that could denote a file path, like 'data.txt' or 'projects/2021/nov/names.csv'. If filePath is the path of a an actual file, fileStats returns a triple (3-element tuple) of:

  1. the number of lines in the file;
  2. the number of "words" in the file (where a line can be split into words using .split())
  3. the total number of characters in the file (including newline characters).

If anything goes wrong when opening or reading filePath, fileStats returns (not prints) the string 'file error']

These examples of fileStats show its expected behavior.

Notes:

(added 2021/10/07) File encoding issues

Some of the test files are encoded with an encoding known as UTF-8. Depending on your Python settings, open function might or might not be able to read these files. (You may especially have problems on Windows computers.) If you are having trouble reading any of the sample files in the ps07 directory, try the extra keyword argument encoding='utf-8' to open as shown below:

with open(..., 'r', encoding='utf-8') as ... 

If you continue to have trouble, email Lyn or Peter.

try/except

Recall from the Files lecture that exceptional cases can be handled by try/except. Examples from that lecture showed checking for particular errors, such as IOError and ZeroDivisionError in the following examples:

try:
    memories = linesFromFile('memory.txt')
except IOError:
    memories = []  # Use empty list in this case

a = 0
try:
    x = 8/a
    print(x)
except ZeroDivisionError:
    print('Do not divide by zero.')

But it's OK to not list specific errors if you want to catch them all:

try:
    memories = linesFromFile('memory.txt')
except:
    memories = []  # Handle any error that can occure in linesFromFile

try:
    x = 8/a
    print(x)
except:
    print('Do not divide by zero.')

2. testFileStats()

In lab and in the Testing and Debugging lecture, you have learned how to use functions from the optimism module to express input/output test cases. In particular, if optimism is imported via

import optimism as opt

then opt.testCase is used to declare an expression to test and opt.expectResult is used to declare the expected result of the most recently declared test case.

The testFileStats function is a zero-argument function that should express at least the six test cases for fileStats given in the testing examples for fileStats via six pairs of calls to opt.testCase and opt.expectResult. You are encouraged to add extra test case pairs in addition to these required six.

For example, here is how testing of the call to fileStats on 'titanic.txt' should be expressed in the body of testFileStats.

opt.testCase(fileStats('titanic.txt'))
opt.expectResult( (2208, 18889, 159080) )

This example of testFileStats shows its expected behavior.

3. writeLinesContaining(searchTerm, inFilePath, outFilePath)

Assume that searchTerm is a string and inFilePath and outFilePath are strings that could denote file paths, like 'data.txt' or 'projects/2021/nov/names.csv'.

This function creates a new output file named by outFilePath. For each line in the file named by inputFilePath in which searchTerm is found, it writes to the output file the line number (starting at 1), followed by a colon, followed by the line (including the trailing newline character).

The searchTerm should be matched against each line in a case-insensitive way. For example, the search term 'Computer' should be "found" in the the lines 'My computer is there.', 'They are a Computer Science major', and 'I *love* COMPUTERS!!!'.

In addition to creating the output file, writeLinesContaining should return a pair (2-tuple) of (1) the number of lines in the file in which the search term is found and (2) the total number of lines in the file.

There are two error cases that should be handled:

  1. If there is an error in opening the file named by inFilePath, no output file is created and the message 'input file error' is returned (not printed).

  2. If there is an error in opening the file named by outFilePath, no output file is created and the message 'output file error' is returned. (not printed).

These examples of writeLinesContaining show the values it returns. But you also need to examined the output files in each case to confirm that they contain the correct lines.

Notes:

(added 2021/10/07) File encoding issues

As in the fileStats problem, on some systems (especially Windows), you may encounter issues when reading and writing files. If you are having trouble reading any of the sample files in the ps07 directory, try the extra keyword argument encoding='utf-8' to open as shown below:

with open(..., 'r', encoding='utf-8') as ... 
or
with open(..., 'w', encoding='utf-8') as ... 

Also, if you are encountering these problems, you should change all instance of open in the testing file testWriteLinesContaining to include the extra keyword argument encoding='utf-8'.

If you continue to have trouble, email Lyn or Peter.

printFile

You can examine the contents of the output file by opening it an editor. One way to do this is to double-click on the filename in a file browser (Finder on a Mac, File Explorer in Windows), which will open up a default text editor on the file. Another way is to use Thonny: select File>Open, set the Filter to all files (*), navigate to the file you want to read, and click on it.

To facilitate displaying the contents of files (especially the ones written by writeLinesContaining), we have also provided you with a printFile function that takes one required argument (a file path) followed by an optional argument (the number of lines n to be printed). It begins by displaying a header indicated how many of the total lines it will print from the file. Then it prints the first n lines of the file (or all of the lines of the file, if there are fewer than n lines). If the second argument is not provided, all lines of the file will be printed. If the given file cannot be opened, it prints 'file error'.

For example, printFile('tolkien-poem.txt', 4) prints:

printing the first 4 of the 8 lines in tolkien-poem.txt
------------------------------------------------------------
Three Rings for the Elven-kings under the sky,
Seven for the Dwarf-lords in their halls of stone,
Nine for Mortal Men doomed to die,
One for the Dark Lord on his dark throne

and printFile('tolkien-poem.txt') prints:

printing all of the 8 lines in tolkien-poem.txt
------------------------------------------------------------
Three Rings for the Elven-kings under the sky,
Seven for the Dwarf-lords in their halls of stone,
Nine for Mortal Men doomed to die,
One for the Dark Lord on his dark throne
In the Land of Mordor where the Shadows lie.
One Ring to rule them all, One Ring to find them,
One Ring to bring them all and in the darkness bind them
In the Land of Mordor where the Shadows lie.

And after writeLinesContaining('ring', 'tolkien-poem.txt', 'tolkien-poem-ring.txt') is executed, printFile('tolkien-poem-ring.txt') prints:

printing all of the 3 lines in toklien-poem-ring.txt
------------------------------------------------------------
1:Three Rings for the Elven-kings under the sky,
6:One Ring to rule them all, One Ring to find them,
7:One Ring to bring them all and in the darkness bind them

testWriteLineContaining

Since it is tedious to check the output files of writeLinesContaining for correctness, we have provided a zero-argument function testWriteLinesContaining function to help you. Calling testWriteLinesContaining will run your definition of writeLinesContaining on all of the test cases in these examples of writeLinesContaining and report whether or not it behaves correctly.

4. fileSizes(dirPath)

Assume that dirPath is a string that could denote a path to a directory, like 'books' or 'projects/2021/november'. If dirPath is a path to an actual directory, return a list of pairs (2-tuples), where each pair has information about only regular files in the directory. (I.e., any subdirectories in the directory are not examined). Each pair has:

  1. the name of a regular file in the directory;
  2. the size in bytes of this regular file, as determined by os.path.getsize

If dirPath is the name of an existing file but it's not a directory, return the string 'file exists, but it is not a directory'.

If dirPath is not the name of any file, return the string 'directory not found'.

These examples of fileSizes show its expected behavior.

(added 2021/10/07) Notes:

  • If your system displays results in a different order than the examples, that's OK!

  • If your system displays extra hidden files (like '.DS_Store') not shown in the examples, that's OK!

  • If you have a Windows system, and os.path.getfilesize gives different file sizes than those shown here, that's OK! It turns out Windows systems use two characters rather than one character to represent a newline, so the file sizes on Window will appear larger.

5. testFileSizes()

Like testFileStats, testFileSizes is a zero-argument function for running optimism tests, except in this case the tests are for the fileSizes function. It should have at least 10 pairs of opt.testCase and opt.expectResult that express all of the example test cases for fileSizes. You are encouraged to add extra test case pairs in addition to these required 10.

This example of testFileSizes shows its expected behavior.

(added 2021/10/07) Note: If your system (1) displays results in a different order than the examples; (2) displays extra hidden files (like '.DS_Store') not shown in the examples; (3) is a Windows system that displays larger file size than those in the examples, that's OK! Just make sure that your expectResult calls in optimism match exactly the results your system generates, so that calling testFileSizes() passes all the example test cases on your system.

Examples

fileStats Examples

These examples show how fileStats works.

In []:
fileStats('titanic.txt')
Out[]:
(2208, 18889, 159080)
In []:
fileStats('yob2020.csv')
Out[]:
(31271, 31271, 368456)
In []:
fileStats('books/jane_austen/Pride_and_Prejudice.txt')
Out[]:
(14580, 124743, 775716)
In []:
fileStats('tyrannic.txt')
Out[]:
'file error'
In []:
fileStats('books')
Out[]:
'file error'
In []:
fileStats('books/jane_austen')
Out[]:
'file error'

testFileStats Examples

These examples show how testFileStats works. There should be a check mark for each of the 6 (or more) test cases that pass. The number after <soln>/fileops.py; is the line number in the file fileops.py for the particular call to opt.testCase that is being tested. Your lines numbers will almost certainly be different from these.

In []:
testFileStats()
Logs
✓ <soln>/fileops.py:160 ✓ <soln>/fileops.py:165 ✓ <soln>/fileops.py:169 ✓ <soln>/fileops.py:174 ✓ <soln>/fileops.py:178 ✓ <soln>/fileops.py:181

writeLinesContaining Examples

These examples show the values returned by writeLinesContaining, but do not show the content of the output files

In []:
writeLinesContaining('ring', 'tolkien-poem.txt', 'tolkien-poem-ring.txt')
Out[]:
(3, 8)
In []:
writeLinesContaining('Cook', 'titanic.txt', 'titanic-cook.txt')
Out[]:
(41, 2208)
In []:
writeLinesContaining('bella', 'yob2020.csv', 'yob2020-bella.txt')
Out[]:
(77, 31271)
In []:
writeLinesContaining( 'prejudice', 'books/jane_austen/Pride_and_Prejudice.txt', 'prejudice.txt' )
Out[]:
(13, 14580)
In []:
writeLinesContaining('ring', 'ring-poem.txt', 'ring.txt')
Out[]:
'input file error'
In []:
writeLinesContaining( 'ring', 'tolkien-poem.txt', 'outfiles/tolkien-poem-ring.txt' )
Out[]:
'output file error'

As noted in the description of writeLinesContaining, you can view the contents of the output files in a text editor, or you can use the provided printFile function to display the contents of these files. For example:

In []:
printFile('tolkien-poem-ring.txt')
Prints
printing all of the 3 lines in tolkien-poem-ring.txt ------------------------------------------------------------ 1:Three Rings for the Elven-kings under the sky, 6:One Ring to rule them all, One Ring to find them, 7:One Ring to bring them all and in the darkness bind them
In []:
printFile('prejudice.txt')
Prints
printing all of the 13 lines in prejudice.txt ------------------------------------------------------------ 1:The Project Gutenberg eBook of Pride and Prejudice, by Jane Austen 11:Title: Pride and Prejudice 24:*** START OF THE PROJECT GUTENBERG EBOOK PRIDE AND PREJUDICE *** 34: Pride and Prejudice 3635: “And never allow yourself to be blinded by prejudice?” 7532: strong prejudice against everything he might say, she began his 7665: partial, prejudiced, absurd. 8287: most natural consequence of the prejudices I had been 8301: prejudice against Mr. Darcy is so violent, that it would be the 9037: highly amused by the kind of family prejudice to which he 12763: circumstance which must prejudice her against him. 13468: all her former prejudices had been removed. 14229:*** END OF THE PROJECT GUTENBERG EBOOK PRIDE AND PREJUDICE ***

A simple way to test your writeLinesContaining function is to call the zero-argument testWriteLinesContaining function that has been provided for you. This will test your function on all the above test cases, and will report correct vs incorrect behavior. For example:

In []:
testWriteLinesContaining()
Prints
------------------------------------------------------------ Testing writeLinesContaining('ring', 'tolkien-poem.txt', 'tolkien-poem-ring.txt') CORRECT output file CORRECT returned result: (3, 8) ------------------------------------------------------------ Testing writeLinesContaining('Cook', 'titanic.txt', 'titanic-cook.txt') CORRECT output file CORRECT returned result: (41, 2208) ------------------------------------------------------------ Testing writeLinesContaining('bella', 'yob2020.csv', 'yob2020-bella.txt') CORRECT output file CORRECT returned result: (77, 31271) ------------------------------------------------------------ Testing writeLinesContaining('prejudice', 'books/jane_austen/Pride_and_Prejudice.txt', 'prejudice.txt') CORRECT output file CORRECT returned result: (13, 14580) ------------------------------------------------------------ Testing writeLinesContaining('ring', 'ring-poem.txt', 'ring.txt') ===WARNING: the output file does not exist! CORRECT returned result: input file error ------------------------------------------------------------ Testing writeLinesContaining('ring', 'tolkien-poem.txt', 'outfiles/tolkien-poem-ring.txt') ===WARNING: the output file does not exist! CORRECT returned result: output file error

fileSizes Examples

These examples show how fileSizes works. On some systems, the order of the tuples might differ, but their sizes should always be the same.a

(added 2021/11/07): On some systems, extra so-called "hidden" files might be shown, such as the '.DS_Store' file automatically generated for many directories on a Mac. This is OK, and you don't need to worry about removing such extra hidden files from your output.

In []:
fileSizes('books')
Out[]:
[]
In []:
fileSizes('books/jane_austen')
Out[]:
[('Mansfield_Park.txt', 928708), ('Pride_and_Prejudice.txt', 799645)]
In []:
fileSizes('testdir')
Out[]:
[('.numbers', 21), ('hyp.py', 94), ('pubs.json', 107062), ('tracks.csv', 18778)]
In []:
fileSizes('testdir/pset')
Out[]:
[]
In []:
fileSizes('testdir/pset/scene')
Out[]:
[('cs1graphics.py', 212018)]
In []:
fileSizes('testdir/pset/shrub')
Out[]:
[('shrub.py', 1842)]
In []:
fileSizes('testdir/pset/shrub/images')
Out[]:
[('shrub1.png', 27248), ('shrub2.png', 13697)]
In []:
fileSizes('titanic.txt')
Out[]:
'file exists, but it is not a directory'
In []:
fileSizes('books/jan_austen')
Out[]:
'directory not found'
In []:
fileSizes('testdir/pset/bush')
Out[]:
'directory not found'

testFileSizes Examples

These examples show how testFileSizes works. There should be a check mark for each of the 10 (or more) test cases that pass. The number after <soln>/fileops.py; is the line number in the file fileops.py for the particular call to opt.testCase that is being tested. Your lines numbers will almost certainly be different from these.

In []:
testFileSizes()
Logs
✓ <soln>/fileops.py:214 ✓ <soln>/fileops.py:220 ✓ <soln>/fileops.py:223 ✓ <soln>/fileops.py:226 ✓ <soln>/fileops.py:229 ✓ <soln>/fileops.py:237 ✓ <soln>/fileops.py:240 ✓ <soln>/fileops.py:243 ✓ <soln>/fileops.py:246 ✓ <soln>/fileops.py:249

Rubric

 
unknown Procedure Requirements
What code you use to solve the problem.
 
unknown Core goals
Complete all core goals for core credit. Get partial credit for completing at least half, and more partial credit for completing at least 90%.
 
unknown Define fileStats
Use def to define fileStats
 
unknown Use a loop
Within the definition of fileStats, use any kind of loop in at least once place.
 
unknown Use a open file expression
Within the definition of fileStats, use open(_) or open(_,_) in at least once place.
 
unknown Use a .split() expression
Within the definition of fileStats, use _.split() or _.split(_) in at least once place.
 
unknown Use a return statement
Within the definition of fileStats, use return _ in at least once place.
 
unknown Define testFileStats
Use def to define testFileStats
 
unknown Use a opt.testCase call
Within the definition of testFileStats, use opt.testCase(_) in at least once place.
 
unknown Use a opt.expectResult call
Within the definition of testFileStats, use opt.expectResult(_) in at least once place.
 
unknown Define writeLinesContaining
Use def to define writeLinesContaining
 
unknown Use a loop
Within the definition of writeLinesContaining, use any kind of loop in at least once place.
 
unknown Use open file expressions
Within the definition of writeLinesContaining, use open(_) or open(_,_) in exactly 2 places.
 
unknown Use a .write call
Within the definition of writeLinesContaining, use _.write() or _.write(_) in at least once place.
 
unknown Use a return statement
Within the definition of writeLinesContaining, use return _ in at least once place.
 
unknown Define fileSizes
Use def to define fileSizes
 
unknown Use a loop
Within the definition of fileSizes, use any kind of loop in at least once place.
 
unknown Use a return statement
Within the definition of fileSizes, use return _ in at least once place.
 
unknown Use a os.listsdir call
Within the definition of fileSizes, use os.listdir(_) in exactly one place.
 
unknown Use a os.path.getsize call
Within the definition of fileSizes, use os.path.getsize(_) in at least once place.
 
unknown Define testFileSizes
Use def to define testFileSizes
 
unknown Use opt.testCase calls
Within the definition of testFileSizes, use opt.testCase(_) in at least 10 places.
 
unknown Use opt.expectResult calls
Within the definition of testFileSizes, use opt.expectResult(_) in at least 10 places.
 
unknown Extra goals
Complete all extra goals in addition to the core goals for a perfect score.
 
unknown Define fileStats
Use def to define fileStats
 
unknown Use a loop
Within the definition of fileStats, use any kind of loop in exactly one place.
 
unknown Use a try-except statement
Within the definition of fileStats, use try: ___ except: ___ or try: ___ except _: ___ in exactly one place.
 
unknown Use a open file expression
Within the definition of fileStats, use open(_) or open(_,_) in exactly one place.
 
unknown Use a with/as statement
Within the definition of fileStats, use with _ as _: ___ in exactly one place.
 
unknown Define writeLinesContaining
Use def to define writeLinesContaining
 
unknown Use a loop
Within the definition of writeLinesContaining, use any kind of loop in exactly one place.
 
unknown Use try-except statements
Within the definition of writeLinesContaining, use try: ___ except: ___ or try: ___ except _: ___ in exactly 2 places.
 
unknown Use with/as statements
Within the definition of writeLinesContaining, use with _ as _: ___ in exactly 2 places.
 
unknown Define fileSizes
Use def to define fileSizes
 
unknown Use a loop
Within the definition of fileSizes, use any kind of loop in exactly one place.
 
unknown Use a try-except statement
Within the definition of fileSizes, use try: ___ except: ___ or try: ___ except _: ___ in exactly one place.
 
unknown Product Requirements
Your code's result values.
 
unknown Core goals
Complete all core goals for core credit. Get partial credit for completing at least half, and more partial credit for completing at least 90%.
 
unknown fileStats returns the correct triples
The triples returned when your fileStats function is run must match the solution result.
 
unknown testFileStats returns the correct result
The result returned when your testFileStats function is run must match the solution result.
 
unknown printFile returns the correct result
The result returned when your printFile function is run must match the solution result.
 
unknown writeLinesContaining returns the correct pairs
The pairs returned when your writeLinesContaining function is run must match the solution result.
 
unknown fileSizes returns the correct pairs
The pairs returned when your fileSizes function is run must match the solution result.
 
unknown testFileSizes returns the correct result
The result returned when your testFileSizes function is run must match the solution result.
 
unknown Extra goals
Complete all extra goals in addition to the core goals for a perfect score.
 
unknown Correct error messages for fileStats
The error message returned when your fileStats function encounters an error must match the solution result.
 
unknown Correct error messages for writeLinesContaining
The error message returned when your writeLinesContaining function encounters an error must match the solution result.
 
unknown Correct error messages for fileSizes
The error message returned when your fileSizes function encounters an error must match the solution result.