@extends('template')
@section('title')
Lab 10, File I/O with JSON
@stop
@section('content')
# Lab 10, File I/O with JSON
Every time the `getNameDictionary` function from Part 1 is invoked, it opens the approriate SSA data file and processes the content into a dictionary:
```py
def getNameDictionary(year, sex):
# Get the data
filename = 'names/yob' + str(year) + '.txt'
lines = linesFromFile(filename)
nameDictionary = {} # A new, empty dictionary
rank = 1 # Start with #1 ranked name
for line in lines: # For every name in the file
# Break the CSV line (e.g. Isabella, F, 17490) into a list where
# 0 = name, 1 = sex, 2 = count of babies that had that name
splitLine = line.strip().split(',')
if splitLine[1] == sex:
nameDictionary[splitLine[0]] = rank
rank += 1
return nameDictionary
```
In this part of today's lab, we want to expand on this by writing the results of `getNameDictionary` to a file, creating a “cache” of results for a given year/sex.
>> *“In computing, a **cache** is a hardware or software component that **stores data so future requests for that data can be served faster**; the data stored in a cache might be the result of an earlier computation [...]. ”* [-ref](https://en.wikipedia.org/wiki/Cache_(computing))
Once we do this, we'll have a more efficient way of fetching the same data repeatedly— rather than invoking `getNameDictionary` repeatedly, we can simply read the data from a cache file (if it exists).
If a cache file doesn't exist, then we will have to invoke `getNameDictionary` to fetch/process the original data, *but* we can then create a cache file with the results so that the next time we need that data we have it.
This exercise will give us practice with the following concepts:
1. Reading/writing files
2. Working with JSON data in Python
## Task 1. *loadNameDictionary*
At the end of `lab10/names.py`, you'll define a new function called `loadNameDictionary` that takes the parameters `year` and `sex`.
This function should return a dictionary mapping baby names to their rank for the given year.
In short, this function needs to return the same thing that `getNameDictionary` does, e.g.
```py
results = loadNameDictionary(2013, 'F')
print results
```
```xml
{'Amyrikal': 16699,
'Andreah': 11057,
'Bailea': 14854,
'Camyra': 14931,
'Jojo': 9858,
'Kaegan': 13820,
'Laryiah': 13961,
'Lyberti': 14017,
'Poetry': 8913,
'Rory': 898
[...etc...]
}
```
**However**, in contrast to `getNameDictionary`, here's how `loadNameDictionary` function should work:
First, check if there's a cached version of the requested data in the form of a .json file. Assuming the following:
+ The .json cache files are stored in a subdirectory in your lab10 folder called `cache`.
+ The .json cache files are named with the year followed by the sex, e.g. `cache/2015f.json`
If the .json cache file __does not exist__, the function should...
+ Invoke `getNameDictionary` to build the name dictionary and
+ Write the resulting dictionary as JSON data to a cache file (e.g. `cache/2015f.json`). (See hints below!)
If the .json cache file __does exist__, the function should...
+ Load the contents of the .json cache file as a dictionary (See hints below!)
Finally, regardless of *how* the data is loaded (be it from the cache or invoking `getNameDictionary`), the resulting dictionary should be returned.
__Read the following hints before starting work on your function.__
### Hint 1 - Does the cache file exist?
Here's an example of how you can check to see if a file exists:
```py
os.path.exists('cache/2014f.json') # Returns boolean as to whether file exists
```
ref: [Lecture 15, slide 7: File System Operations: path.exists](http://cs111.wellesley.edu/content/lectures/lecture15/files/15_file-system-ops-solns.pdf)
### Hint 2 - Reading and writing the cache file
As discussed in [Lecture 15, Slide 20](http://cs111.wellesley.edu/content/lectures/lecture16/files/16_files_4up.pdf),
JSON (JavaScript Object Notation) is a standard format we can use to encode lists and dictionaries as strings. These JSON-formatted strings can then be stored in text files, effectively giving us a “snapshot” of a Python dictionary or list.
To work with JSON in Python, you need to import the `json` module at the top of your file:
```py
import json
```
Here's an example showing how you can **write** a Python dictionary to a .json file:
```py
stateCapitalDict = {
'MA': 'Boston',
'NY': 'Albany',
'NH': 'Concord'
}
with open('stateCapitals.json', 'w') as inputFile:
json.dump(stateCapitalDict, inputFile)
```
Here's an example showing how you can **read** a .json file and load the contents as a Python dictionary.
```py
with open('stateCapitals.json', 'r') as inputFile:
stateCapitalDict = json.load(inputFile)
```
## Task 2. Confirm *loadNameDictionary* is working as expected
A simple test of `loadNameDictionary` would be to invoke it and confirm you're getting a dictionary with the appropriate data back, e.g.:
```py
results = loadNameDictionary(2013, 'F')
print results
```
This confirms *what* this function is producing, but we also want to confirm *how* it's producing it.
To do this, edit your `loadNameDictionary` function so that...
__If the data is being loaded from the original data__ (using `getNameDictionary)`), it should print the message :
>> *Loading dictionary from original data...*
__If the data is being loaded from cache__, it should print the message:
>> *Loading dictionary from cache...*
Test your function out using some year/sex combination you have not used before:
```
# Change year/sex to a combination you have not used before
results = loadNameDictionary(1994, 'F')
````
On first run, it should print *Loading dictionary from original data...*
Then, run the same code again, and this time it should print: *Loading dictionary from cache...*
(Furthermore, you should see the resulting cache .json files appearing in `lab10/cache` for each test you run.)
@include('/labs/lab10/_toc')
@stop