@extends('template') @section('title') Lab 11: Practice with File Reading and Writing @stop @section('content') # Lab 11: Practice with File Reading and Writing Your `lab11` folder contains a bunch of files (`*.py`, and `*.txt`) for you to use today. Useful reference: + [File Operations lecture notes](http://cs111.wellesley.edu/content/lectures/lec_files/files/lec_files.pdf) ## Part 0: What's in the lab11 folder? Let's take a minute and look at the set up of the `lab11` folder. There are two subfolders called `inputFiles` (this one contains files that you will use as inputs to your functions) and `outputFiles` (this is where you will create new files today). Set your working directory in Canopy to be your `lab11` folder. When you want to specify a file as input, you'll need to preface it with the folder name `inputFiles`, e.g. `inputFiles/passwords.txt`. We call this the **file path** (it includes the folder name as well as the file name). The same will hold true when creating a new file, those files will all be created in the `outputFiles` subfolder, e.g. `outputFiles/myNewFile.txt`. ## Part 1, Task 1: Count lines in a file Open the `lab11Part1.py` file in your `lab11` folder. Note that a `linesFromFile` function is provided for you. `linesFromFile()` returns a list of all the lines from the given file name. Note: `strip` returns a copy of the string with leading and trailing whitespace removed. ```py def linesFromFile(filename): '''Returns a list of all the lines from a file with the given filename. In each line, the terminating newline and leading/ending whitespace been removed.''' with open(filename, 'r') as inputFile: # open the file # read each line in the file strippedLines = [line.strip() for line in inputFile] return strippedLines ``` Your task is to write a function called `countLines(filename)` that **prints** the number of lines in the given file. Hint: `countLines` should invoke `linesFromFile`. Below are some sample executions of `countLines`. Note that the files are in your `lab11/inputFiles` sub-folder. ```py In []: countLines('inputFiles/drawOwls.py') The file inputFiles/drawOwls.py has 8 lines. In []: countLines('inputFiles/cs1graphics.py') The file inputFiles/cs1graphics.py has 4991 lines. In []: countLines('inputFiles/bunnyFlag.py') The file inputFiles/bunnyFlag.py has 108 lines. ``` ## Part 1, Task 1a: `analyzeCode(filename)` Write a function called `analyzeCode(filename)` that will take a python file name and print statistics such as: + how many lines are comments (a line counts as a comment if the first char is #) + how many lines contain more than 80 characters + the total number of lines in the file How to determine if a line is a comment? We used the criteria that the first non-empty string in the line is a "#". You might find it helpful to write a helper function, say, `isCommentLine` that returns `True` or `False` if a given line of the file is a comment line. Below is some sample output that is printed to the screen: ```py analyzeCode('inputFiles/drawOwls.py') Filename: inputFiles/drawOwls.py : comment lines = 3 lines > 80 characters = 0 Total number of lines = 8 --------- analyzeCode('inputFiles/cs1graphics.py') Filename: inputFiles/cs1graphics.py : comment lines = 213 lines > 80 characters = 675 Total number of lines = 4991 --------- analyzeCode('inputFiles/bunnyFlag.py') Filename: inputFiles/bunnyFlag.py : comment lines = 13 lines > 80 characters = 0 Total number of lines = 108 --------- ``` (Note: your long lines for cs1graphics.py might be either 165 or 685 instead of 675. There are subtle but important reasons for these two close-to-correct answers.) ## Part 1, Task 1b: `analyzeCodeWrite` This task is a variation of Task 1a above. Instead of printing the analysis of the python program to the console, your `analyzeCodeWrite` will **create a new file** and write the analysis to the new file. The user specifies the name of the new file. Tip: make sure your working directory is set correctly, because that is where your new file will be created. Note: Because `analyzeCode` prints its output instead of returning it, we can't simply call that function from within `analyzeCodeWrite`. We will have to either copy/paste that code and modify it, or move it to a helper funciton that *returns* a multi-line string. (Scroll right to see full invocation) ```py In[]: analyzeCodeWrite('inputFiles/cs1graphics.py','outputFiles/info_about_cs1graphics.txt') ``` And then, look inside your `outputFiles` folder (which is inside your `lab11` folder), and you should see a newly created file called `info_about_cs1graphics.txt`. Open this file in Canopy and check that the information was written as expected. ## Part 1, Task 2: `readData(filename,separator)` Write a function called `readData` that takes a file and the separator: this function reads the contents and returns a list of lists. You can either: 1) make a new version of `linesFromFile` that splits each string using the given separator, or 2) invoke `linesFromFile` and split each item in that list. ```py # Contents of file inputFiles/enrollments.txt cs111-124 cs230-85 cs240-93 cs251-18 ``` ```py In []: readData('inputFiles/enrollments.txt', '-') Out[]: [['cs111', '124'], ['cs230', '85'], ['cs240', '93'], ['cs251', '18']] ``` ```py # contents of inputFiles/passwords.txt Harry Potter::hpotter::foreheadBOLT Ron Weasley::rweasley::myheart=hermione Cho Chang::cchang::ravenXclaw Fred Weasley::fweasley::NOTgeorge! George Weasley::gweasley::BetterThanFred8 Hermione Granger::hgranger::you*GOT*this! ``` ```py In []: readData('inputFiles/passwords.txt', '::') Out[]: [['Harry Potter', 'hpotter', 'foreheadBOLT'], ['Ron Weasley', 'rweasley', 'myheart=hermione'], ['Cho Chang', 'cchang', 'ravenXclaw'], ['Fred Weasley', 'fweasley', 'NOTgeorge!'], ['George Weasley', 'gweasley', 'BetterThanFred8'], ['Hermione Granger', 'hgranger', 'you*GOT*this!']] ``` ## Part 1, Task 3A: `spellcheck(origfile)` Write a function called `spellcheck` that will spell check the given file and print the set of misspelled words. How to spell check? In your `lab11Part1.py` file, there's a provided function called `loadDictionary` and a pre-defined variable `largeWordSet` (~180,000 words). This variable is a **set** of dictionary words that you can use for spell checking. For our simple spell checker that you will write today, if a word is not in the `largeWordSet`, then it is considered misspelled. Talk over a strategy with your partner for how to look at each word in the given file and check if it is in the `largeWordSet`. Discuss how to accumulate a set of mispelled words. ```py # Spell check In []: spellcheck('inputFiles/silly.txt') Out[]: {'Ken-el-ration', 'coca-cola', 'frigidaire.', "hershey's", 'lipton', 'sixty-five', 'star.', 'tea.', "texaco's", 'underwear.', "wife's", 'wrigley'} ``` Note that some of these words are not actually misspelled (e.g., `star` and `tea`) but they have punctuation attached to them. You can clean up your words by removing the punctuation, if you want to (not required). ```py # Spell check with punctuation removal In []: spellcheck('inputFiles/silly.txt') Out[]: {'Ken-el-ration', 'coca-cola', 'frigidaire', 'hersheys', 'lipton', 'sixty-five', 'texacos', 'wrigley'} ``` Open `cookies.txt` in Canopy. Read it through and predict which words are likely to **not** be in the `largeWordSet` (in other words, what will not count as correctly spelled). Then run your `spellcheck` on `cookies.txt` (remember that it is in the `inputFiles` folder) and see if your predictions are right. Try spell-checking these files: `siriAndHerGirls.txt` (how Siri enforces gender bias), and `applePie.txt` (another recipe). ## Part 1, Task 3B: `spellcheckWrite(origfile, resultFile)` Write a function called `spellcheckWrite` that will spell check the given file and instead of merely printing the set of misspelled words, will write them to a new file named resultFile. For example, ```py spellcheckWrite('inputFiles/cookies.txt', 'outputFiles/misspelledCookies.txt') ``` If you open the new file `misspelledCookies.txt`, it should have these words in it, one per line (or perhaps more misspellings if you did not correct for punctuation). ```py arent 12 Guittard Doubletree COM doubletree snickerdoodles thats 14 Guiradelli Single-Use theyre Ive BOMB-DIGGITTY gluten-free BOM 3 Reeses Theyre -- ``` ## Part 1, Task 4: `mergeHeaders(listOfFilenames, headerlines, outFilename)` Write a function called `mergeHeaders` that will take a **list of filenames** and grab the `headerlines` from the top of each file and write them to a new file called `outFilename`. Here are the contents of some silly text files: ```py # Contents of inputFiles/green.txt: This is the GREEN file Here are some green things: grass money lizards grasshoppers ``` You can view similar files named `red.txt`, `blue.txt` and `yellow.txt` in the `inputFiles` folder in Canopy. ```py In []: myfiles = ['inputFiles/red.txt', 'inputFiles/green.txt', 'inputFiles/blue.txt', 'inputFiles/yellow.txt'] # list of files number of header lines outputfile In []: mergeHeaders(myfiles, 3, 'outputFiles/rainbow.txt') ``` Produces: ```py # Contents of newly written file rainbow.txt # with 3 headerlines taken from each file This is the RED file Here are some red things: strawberries This is the GREEN file Here are some green things: grass This is the BLUE file Here are some blue things: blueberries This is the YELLOW file Here are some yellow things: the sun ``` **Hint**: you might want to first start with a function that simply takes the number of headerlines from a single file, and once that works, build up to processing a list of file names. ## Part 1, Task 5: `distort(filename)` Write a malicious function that takes in a filename, and somehow distorts it (you and your partner get to decide exactly how) and re-writes the file back with the same name. Check that your function works by viewing the file before you invoke your `distort` function and afterwards. @include('/labs/lab11/_toc') @stop