@extends('template') @section('title') Lab 10, File I/O with JSON @stop @section('content') # Lab 10, File I/O with JSON Every time the `getNameDictionary` function from Part 1 is invoked, it opens the approriate SSA data file and processes the content into a dictionary: ```py def getNameDictionary(year, sex): # Get the data filename = 'names/yob' + str(year) + '.txt' lines = linesFromFile(filename) nameDictionary = {} # A new, empty dictionary rank = 1 # Start with #1 ranked name for line in lines: # For every name in the file # Break the CSV line (e.g. Isabella, F, 17490) into a list where # 0 = name, 1 = sex, 2 = count of babies that had that name splitLine = line.strip().split(',') if splitLine[1] == sex: nameDictionary[splitLine[0]] = rank rank += 1 return nameDictionary ``` In this part of today's lab, we want to expand on this by writing the results of `getNameDictionary` to a file, creating a “cache” of results for a given year/sex. >> *“In computing, a **cache** is a hardware or software component that **stores data so future requests for that data can be served faster**; the data stored in a cache might be the result of an earlier computation [...]. ”* [-ref](https://en.wikipedia.org/wiki/Cache_(computing)) Once we do this, we'll have a more efficient way of fetching the same data repeatedly— rather than invoking `getNameDictionary` repeatedly, we can simply read the data from a cache file (if it exists). If a cache file doesn't exist, then we will have to invoke `getNameDictionary` to fetch/process the original data, *but* we can then create a cache file with the results so that the next time we need that data we have it. This exercise will give us practice with the following concepts: 1. Reading/writing files 2. Working with JSON data in Python ## Task 1. *loadNameDictionary* At the end of `lab10/names.py`, you'll define a new function called `loadNameDictionary` that takes the parameters `year` and `sex`. This function should return a dictionary mapping baby names to their rank for the given year. In short, this function needs to return the same thing that `getNameDictionary` does, e.g. ```py results = loadNameDictionary(2013, 'F') print results ``` ```xml {'Amyrikal': 16699, 'Andreah': 11057, 'Bailea': 14854, 'Camyra': 14931, 'Jojo': 9858, 'Kaegan': 13820, 'Laryiah': 13961, 'Lyberti': 14017, 'Poetry': 8913, 'Rory': 898 [...etc...] } ``` **However**, in contrast to `getNameDictionary`, here's how `loadNameDictionary` function should work: First, check if there's a cached version of the requested data in the form of a .json file. Assuming the following: + The .json cache files are stored in a subdirectory in your lab10 folder called `cache`. + The .json cache files are named with the year followed by the sex, e.g. `cache/2015f.json`
If the .json cache file __does not exist__, the function should... + Invoke `getNameDictionary` to build the name dictionary and + Write the resulting dictionary as JSON data to a cache file (e.g. `cache/2015f.json`). (See hints below!)
If the .json cache file __does exist__, the function should... + Load the contents of the .json cache file as a dictionary (See hints below!)

Finally, regardless of *how* the data is loaded (be it from the cache or invoking `getNameDictionary`), the resulting dictionary should be returned. __Read the following hints before starting work on your function.__ ### Hint 1 - Does the cache file exist? Here's an example of how you can check to see if a file exists: ```py os.path.exists('cache/2014f.json') # Returns boolean as to whether file exists ``` ref: [Lecture 15, slide 7: File System Operations: path.exists](http://cs111.wellesley.edu/content/lectures/lecture15/files/15_file-system-ops-solns.pdf) ### Hint 2 - Reading and writing the cache file As discussed in [Lecture 15, Slide 20](http://cs111.wellesley.edu/content/lectures/lecture16/files/16_files_4up.pdf), JSON (JavaScript Object Notation) is a standard format we can use to encode lists and dictionaries as strings. These JSON-formatted strings can then be stored in text files, effectively giving us a “snapshot” of a Python dictionary or list. To work with JSON in Python, you need to import the `json` module at the top of your file: ```py import json ``` Here's an example showing how you can **write** a Python dictionary to a .json file: ```py stateCapitalDict = { 'MA': 'Boston', 'NY': 'Albany', 'NH': 'Concord' } with open('stateCapitals.json', 'w') as inputFile: json.dump(stateCapitalDict, inputFile) ```
Here's an example showing how you can **read** a .json file and load the contents as a Python dictionary. ```py with open('stateCapitals.json', 'r') as inputFile: stateCapitalDict = json.load(inputFile) ``` ## Task 2. Confirm *loadNameDictionary* is working as expected A simple test of `loadNameDictionary` would be to invoke it and confirm you're getting a dictionary with the appropriate data back, e.g.: ```py results = loadNameDictionary(2013, 'F') print results ``` This confirms *what* this function is producing, but we also want to confirm *how* it's producing it. To do this, edit your `loadNameDictionary` function so that... __If the data is being loaded from the original data__ (using `getNameDictionary)`), it should print the message : >> *Loading dictionary from original data...* __If the data is being loaded from cache__, it should print the message: >> *Loading dictionary from cache...* Test your function out using some year/sex combination you have not used before: ``` # Change year/sex to a combination you have not used before results = loadNameDictionary(1994, 'F') ```` On first run, it should print *Loading dictionary from original data...* Then, run the same code again, and this time it should print: *Loading dictionary from cache...*
(Furthermore, you should see the resulting cache .json files appearing in `lab10/cache` for each test you run.) @include('/labs/lab10/_toc') @stop