Problem Set 9 - Due Tue, Apr 18 at 23:59 EST

Reading

Lecture slides and notebook: Lec 17 Web APIs - Part 1
Lecture slides and notebook: Lec 18 Web APIs - Part 2
HTML/CSS Lab notes (PS08 Task 1)
Lab 11 Web APIs notes

About this Problem Set

This problem set is intended to give you practice working with Web APIs and generating HTML pages from them, as well as get some debugging practice.

In Task 1 (Individual Task) you will develop counterexamples that show why buggy solutions to an Exam 1 problem do not work.
In Task 2 (Partner Task) you will write several functions that will allow you to complete the Google Books API application explained below. Use this shared Google Doc to find a pair programming partner. Remember that you can work with the same partner a maximum of TWICE this semester.

Carefully study the lecture and lab materials, which show how to do everything you need to do for the Web API task.

The CS111 Problem Set Guide gives an overview of psets, including a detailed description of individual and partner tasks.

In Fall 2016, for Task 2, students spent an average of 4.83 hours (min = 1 hour, max = 10 hours). Task 1 was not part of the assignment in Fall 2016, so we don't have data for it.

All code for this assignment is available in the ps09 folder in the cs111/download directory within your cs server account.

Task 1: Bug Hunting

Background: The `correct` function

On Problem 1 of Exam 1 this semester, you were asked to write a function that takes two arguments (a pivot string and a list of strings wordlist) and returns a list of all the strings of wordlist that precede pivot in dictionary order. The words in the returned list should have the same relative order that they did in the original list.

Here is a function named correct that correctly solves this problem using the filtering pattern studied in class.

def correct(pivot, wordlist):
    '''Correct solution to Exam 1, Problem 1: a function that takes a pivot 
    string and a list of strings wordlist and returns a list that includes 
    only those elements of wordlist that precede pivot in dictionary order.
    '''
    result = []
    for word in wordlist:
        if word < pivot: # If word precedes pivot in dictionary order
            result.append(word)
    return result

Note that word < pivot is a simple way in Python to test if the string word comes before pivot in dictionary order.

Here's a sample use of the correct function:

In [1]: correct('cat', ['bunny', 'elephant', 'dog', 'bat', 'ant', 'dog', 'bat'])
Out[1]: ['bunny', 'bat', 'ant', 'bat']

Note that elements in the output list have the same relative order they did in the input list. Also note that if a string preceding the pivot (such as 'bat') occurs multiple times in the input list, it should occur multiple times in the output list.

Background: counterexamples

In this problem you will study ten incorrect programs for this problem and determine why they don't work. Incorrect programs are said to be buggy because they contain bugs = reasons why they don't work. So your goal in this problem is to track down the bugs in each of the ten programs.

One way to show that a program is buggy is to provide a counterexample, which is a particular set of inputs for which the program does not behave correctly. For a counterexample, the buggy program might return a wrong answer, or it might raise an exception when the correct program does not.

First buggy function example: `buggy0a`

Here is a function buggy0a that is an incorrect version of the correct function:

def buggy0a(pivot, wordlist):
    result = []
    for word in wordlist:
        if word > pivot: 
            result.append(word)
    return result

buggy0a illustrates a common bug for this function on the exam: it performs the wrong comparison between word and pivot, and so returns all the words in wordlist that come after the pivot, not before the pivot.

In the case of buggy0a, counterexamples are very easy to find. Any pivot with a nonempty wordlist list that is not just a singleton list (a list of length 1) containing just the pivot is a counterexample. For instance, ('c', ['a', 'd', 'c', 'e', 'b']) is a counterexample, because correct and buggy0 return different results for these inputs:

In [2]: correct('c', ['b', 'd', 'a', 'c', 'e'])
Out[2]: ['b', 'a']

In [3]: buggy0a('c', ['b', 'd', 'a', 'c', 'e'])
Out[3]: ['d', 'e'] # Different answer from the correct one

However, ('c', []) and ('c', ['c']) are not counterexamples, because both correct and buggy0a return the same results for these:

In [4]: correct('c', [])
Out[4]: []

In [5]: buggy0a('c', [])
Out[5]: [] # The same answer as correct

In [6]: correct('c', ['c'])
Out[6]: []

In [7]: buggy0a('c', ['c'])
Out[7]: [] # The same answer as correct

For the exam problem, we will say that a counterexample is minimal if it uses the shortest wordlist that a counterexample can have. For buggy0a, the shortest wordlist is one that contains a single string that is not the pivot. Here are two examples of minimal counterexamples for buggy0a:

In [8]: correct('c', ['a']) # Can use any string before 'c' in dictionary order. 
Out[8]: ['a']

In [9]: buggy0('c', ['a'])
Out[9]: [] # A different answer from correct

In [10]: correct('c', ['d']) # Can use any string after 'c' in dictionary order.
Out[10]: []

In [11]: buggy0('c', ['d'])
Out[11]: ['d'] # A different answer from correct

Second buggy function example: `buggy0b`

As a more complex buggy funtion, consider buggy0b:

def buggy0b(pivot, wordlist):
    result = []
    for word in wordlist:
        minlen = min(len(word), len(pivot))
        i = 0
        while i < minlen and word[i] <= pivot[i]:
            i += 1
        if i == minlen and (word[:minlen] != pivot[:minlen] or (len(word) < len(pivot))):
            result.append(word)
    return result

This function is rather complex, so to think about it, let's start by assuming that word and pivot have the same length and ignoring the condition (word[:minlen] != pivot[:minlen] or (len(word) < len(pivot))) at the end.

What does the while loop do? It counts (in the variable i) the number of consecutive times that an index in word has a letter that is less than or equal to the letter at the corresponding index in pivot. The counting stops when either i reaches minlen or word[i] is strictly greater than pivot[i]. Only if every letter in the first minlen letters of word is less than or equal to the corresponding letter in pivot will word be added to the result list. For example, if pivot is 'do', then the word 'an' satisfies this condition, because 'a' <= 'd' and 'n' <= 'o'. But the word 'at' does not satisfy this condition, since 't' is not less than or equal to 'o'.

This is problematic, since 'at' comes before 'do' in alphabetical order, but buggy0b will not include it in the result list:

In [12]: correct('do', ['an', 'at', 'do', 'ex'])
Out[12]: ['an', 'at']

In [13]: buggy0b('do', ['an', 'at', 'do', 'ex'])
Out[13]: ['an']

A minimal counterexample for buggy0b involves a list with one string that is included by correct but not by buggy0b:

In [14]: correct('do', ['at'])
Out[14]: ['at']

In [15]: buggy0b('do', ['at'])
Out[15]: []

Now let's consider some of the complexities of buggy0b that we ignored at first.

What is the purpose of minlen? The while loop includes the test word[i] <= pivot[i]. If one of word or pivot is longer than the other (e.g., pivot is 'dog' and word is 'an'), it's important to ensure that an index only legal in the longer word is never used in the shorter word. For example, 'dog[2]' makes sense, but 'at[2]' would raise an index-out-of-bounds exception. By guaranteeing that the limit of i is controlled by the shorter of the two words, an index-out-of-bounds exception is prevented.
What is the purpose of the condition (word[:minlen] != pivot[:minlen] or (len(word) < len(pivot)))? In some solutions it is necessary to specially handle cases where word is a prefix of pivot or vice versa. A string p is said to be a prefix of string s if there is some (possibly empty) string q such that p + q == s. For example, if pivot is 'dog', then the words 'd', 'do' and 'dog' are prefixes of the pivot, and pivot is a prefix of the words 'dog', 'dogs', and 'doggy'.

Which words in the word list ['d', 'do', 'dog', 'dogs', 'doggy'] should be in the correct result for the pivot 'dog'? Only 'd' and 'do' come before 'dog' in dictionary order, so only these two words should be in the result. But all five words will satisfy the condition i == minlen in buggy0b because only the characters of word[:minlen] and pivot[:minlen] are compared, and these are guaranted to be equal when word and pivot are in a prefix relationship.

What can be done to keep only the two smaller prefixes? The condition word[:minlen] != pivot[:minlen] is true when word is not a prefix of pivot (or vice versa), so words not in a prefix relationship with pivot (that also satisfy i == minlen) can be included in the result list. But if word and pivot are in a prefix relationship, only words that are shorter than pivot should be included, which is the rationale behind the condition len(word) < len(pivot). This means that buggy0b behaves correctly on words in a prefix relationship with pivot:
```
In [16]: correct('dog', ['d', 'do', 'dog', 'dogs', 'doggy'])
Out[16]: ['d', 'do']

In [17]: buggy0b('dog', ['d', 'do', 'dog', 'dogs', 'doggy'])
Out[17]: ['d', 'do']
```
So prefixes are not helpful in designing counterexamples for buggy0b. But they are helpful in some of the other buggy functions below.

Finally, when debugging a complex function like buggy0b, it is often very helpful to add print statements to a function to better understand how it works. For example, here is a version of buggy0b augmented with numerous print statements:

def buggy0bWithPrints(pivot, wordlist):
    result = []
    for word in wordlist:
        minlen = min(len(word), len(pivot))
        print '-'*50
        print 'word ->', word, '; pivot ->', pivot, '; minlen ->', minlen
        i = 0
        if i < minlen: # while loop will only execute when i < minlen 
            print 'While loop: i ->', i, '; word[i] ->', word[i], '; pivot[i] ->', pivot[i],\
                 '; word[i] <= pivot[i] ->', word[i] <= pivot[i]
        while i < minlen and word[i] <= pivot[i]:
            i += 1
            if i < minlen: # while loop will only execute when i < minlen 
                print 'While loop: i ->', i, '; word[i] ->', word[i], '; pivot[i] ->', pivot[i],\
                      '; word[i] <= pivot[i] ->', word[i] <= pivot[i]
        print 'After while loop: i ->', i, '; i == minlen ->', i == minlen, ';'
        print '                  word[:minlen] != pivot[:minlen] ->', word[:minlen] != pivot[:minlen], ';'
        print '                  (len(word) < len(pivot)) ->', (len(word) < len(pivot))
        if i == minlen and (word[:minlen] != pivot[:minlen] or (len(word) < len(pivot))):
            print 'result.append ->', word
            result.append(word)
    return result

Running buggy0bWithPrints on an example provides a lot of insight into how it works:

In [18]: buggy0bWithPrints('do', ['a', 'an', 'ash', 'd', 'do', 'dog', 'egg'])
--------------------------------------------------
word -> a ; pivot -> do ; minlen -> 1
While loop: i -> 0 ; word[i] -> a ; pivot[i] -> d ; word[i] <= pivot[i] -> True
After while loop: i -> 1 ; i == minlen -> True ;
                  word[:minlen] != pivot[:minlen] -> True ;
                  (len(word) < len(pivot)) -> True
result.append -> a
--------------------------------------------------
word -> an ; pivot -> do ; minlen -> 2
While loop: i -> 0 ; word[i] -> a ; pivot[i] -> d ; word[i] <= pivot[i] -> True
While loop: i -> 1 ; word[i] -> n ; pivot[i] -> o ; word[i] <= pivot[i] -> True
After while loop: i -> 2 ; i == minlen -> True ;
                  word[:minlen] != pivot[:minlen] -> True ;
                  (len(word) < len(pivot)) -> False
result.append -> an
--------------------------------------------------
word -> ash ; pivot -> do ; minlen -> 2
While loop: i -> 0 ; word[i] -> a ; pivot[i] -> d ; word[i] <= pivot[i] -> True
While loop: i -> 1 ; word[i] -> s ; pivot[i] -> o ; word[i] <= pivot[i] -> False
After while loop: i -> 1 ; i == minlen -> False ;
                  word[:minlen] != pivot[:minlen] -> True ;
                  (len(word) < len(pivot)) -> False
--------------------------------------------------
word -> d ; pivot -> do ; minlen -> 1
While loop: i -> 0 ; word[i] -> d ; pivot[i] -> d ; word[i] <= pivot[i] -> True
After while loop: i -> 1 ; i == minlen -> True ;
                  word[:minlen] != pivot[:minlen] -> False ;
                  (len(word) < len(pivot)) -> True
result.append -> d
--------------------------------------------------
word -> do ; pivot -> do ; minlen -> 2
While loop: i -> 0 ; word[i] -> d ; pivot[i] -> d ; word[i] <= pivot[i] -> True
While loop: i -> 1 ; word[i] -> o ; pivot[i] -> o ; word[i] <= pivot[i] -> True
After while loop: i -> 2 ; i == minlen -> True ;
                  word[:minlen] != pivot[:minlen] -> False ;
                  (len(word) < len(pivot)) -> False
--------------------------------------------------
word -> dog ; pivot -> do ; minlen -> 2
While loop: i -> 0 ; word[i] -> d ; pivot[i] -> d ; word[i] <= pivot[i] -> True
While loop: i -> 1 ; word[i] -> o ; pivot[i] -> o ; word[i] <= pivot[i] -> True
After while loop: i -> 2 ; i == minlen -> True ;
                  word[:minlen] != pivot[:minlen] -> False ;
                  (len(word) < len(pivot)) -> False
--------------------------------------------------
word -> egg ; pivot -> do ; minlen -> 2
While loop: i -> 0 ; word[i] -> e ; pivot[i] -> d ; word[i] <= pivot[i] -> False
After while loop: i -> 0 ; i == minlen -> False ;
                  word[:minlen] != pivot[:minlen] -> True ;
                  (len(word) < len(pivot)) -> False
Out[18]: ['a', 'an', 'd']

Your Task

The file ps09/debugging.py contains 10 buggy implementations of the correct function, which are shown below. For each function buggyi, you should do two things:

Define the variable counterExamplei to be a pair (2-tuple) of (1) a pivot string and (2) a wordlist. The pair counterExamplei should be a minimal counterexample that indicates why buggyi is incorrect.
```
# Minimal counterexample for buggy0a
counterExample0a = ('c', ['a'])

# Minimal counterexample for buggy0b
counterExample0b = ('do', ['at']) 
```

Define the variable explanationi to be a string that briefly explains why the function is buggy. For example,

# An explanation of why buggy0a is buggy
explanation0a = """
Returns all words *after* pivot rather than *before* pivot.
"""

# An explanation of why buggy0b is buggy
explanation0b = """
Requires *each* index shared by word and pivot to contain a letter in word
that is less than or equal to the letter at the corresponding index of pivot.
But words that should correctly precede pivot in dictionary order and are not
prefixes of pivot only need to have *one* index at which the letter in word
is less than the letter in pivot (following a sequence of indices at which the
letters are equal). So the result list excludes some words it should include. 
"""

Explanation strings should be delimited by three double-quotes, which allows them to contain multiple lines.

Notes:

How do you find counterexamples for a buggy function? Start by understanding in detail how the function works on some simple inputs. To do this:
- use the code execution models we've used in class (especially iteration tables)
- add print statements to the buggy functions to understand better exactly how they work. (See buggy0bWithPrints above for an example of this.)
In this particular problem you need to focus on how a word from the word list is processed relative to the pivot. To keep things simple, start with a small pivot (a two-character string, say) and then consider small words (one to three characters long) that come before and after the pivot in dictionary order.

Write down on paper details of the comparison between the pivot and the sample words to understand what each function is doing, and use this knowledge to find pairs of pivots and words where the function misbehaves.
There are no Otter Inspect test cases for this problem. We want you to invoke the correct and buggy functions on counterexamples within Canopy to get a more visceral experience for the debugging process.
In addition to the correct and buggy functions returning different results, a counterexample can also cause one function (but not the other) to raise an exception, or cause the correct and buggy functions to raise different exceptions.
You can't directly call the correct or buggyi functions on counterExamplei, because they expect two separate arguments. But a tuple is a single argument, so this leads to a type error:
```
In [19]: correct(counterExample0)
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-3-9511ff6b0e29> in <module>()
----> 1 correct(counterExample0)
```
However, Python has a special tuple-unpacking notation that can be used to avoid the type error in this context. Putting a * before a variable that denotes a tuple in the argument position of a function tells Python to unpack the components of the tuple, and treat them as separate arguments to the function.
```
In [20]: correct(*counterExample0a)
Out[20]: ['a']

In [21]: buggy0a(*counterExample0a)
Out[21]: []
```

The 10 Buggy Functions

The ten buggy functions can be found in the file ps09/debugging.py, which also contains definitions for the correct, buggy0a, and buggy0b function.

Below each buggyi function function is a counterExamplei variable that you should define as a pair (2-tuple) of (1) a pivot string and (2) a minimum length wordlist that serves as a counterexample relative to the pivot.

Below each buggyi function function is also a explanationi variable that you should define as a string that explains why buggyi is buggy.

Most of these buggy programs are based on incorrect programs that students wrote for Problem 1 on Exam 1. They have been cleaned up in various ways, but they illustrate reasoning issues that students actually exhibit in practice. Most commonly, students did not understand that strings could use < in Python to compare two strings in dictionary order, and came up with complex solutions involving loops that tested the strings letter-by-letter. It is possible to define a correct loop involving letter-by-letter comparisons, but it is rather challenging to do so, as several of the following functions indicate.

# ------------------------------------------------------------------------------
def buggy1(pivot, wordlist):
    result = []
    for word in wordlist:
        if word <= pivot: 
            result.append(word)
    return result

# ------------------------------------------------------------------------------
def buggy2(pivot, wordlist):
    result = []
    for word in wordlist:
        if word[0] < pivot[0]:
            result.append(word)
    return result

# ------------------------------------------------------------------------------
def buggy3(pivot, wordlist):
    result = []
    for word in wordlist:
        if word < pivot:
            result.append(word)
        return result

# ------------------------------------------------------------------------------                                                         
def buggy4(pivot, wordlist):
    result = []
    for word in wordlist:
        if word < pivot and word not in result:
            result.append(word)
    return result

# ------------------------------------------------------------------------------
def buggy5(pivot, wordlist):
    result = []
    for word in wordlist:
        minlen = min(len(word), len(pivot))
        for i in range(minlen):
            if word[i] < pivot[i]:
                result.append(word)            
            elif word[i] > pivot[i]:
                break # exit inner for loop if word comes after pivot
        if len(word) < len(pivot) and word == pivot[:minlen]:
            result.append(word) # Handle prefixes of pivot specially
    return result

# ------------------------------------------------------------------------------
def buggy6(pivot, wordlist):
    result = []
    for word in wordlist:
        minlen = min(len(word), len(pivot))
        allLess = True
        for i in range(minlen):
            if word[i] >= pivot[i]:            
                allLess = False
        if allLess:
            result.append(word)
    return result

# ------------------------------------------------------------------------------
def buggy7(pivot, wordlist):
    result = []
    for word in wordlist:
        minlen = min(len(word), len(pivot))
        i = 0
        while i < minlen and word[i] == pivot[i]:
            i += 1
        # After the while loop, either (1) i is equal to minlen or 
        # (2) i is the first index at which word and pivot differ
        if (i == minlen or word[i] < pivot[i]) and pivot != word: 
            result.append(word)
    return result

# ------------------------------------------------------------------------------
def buggy8(pivot, wordlist):
    result = []
    for word in wordlist:
        minlen = min(len(word), len(pivot))
        i = 0
        while i < minlen and word[i] == pivot[i]:
            i += 1
        # After the while loop, either (1) i is equal to minlen or 
        # (2) i is the first index at which word and pivot differ
        if i < minlen and word[i] < pivot[i]:
            result.append(word)
    return result

# ------------------------------------------------------------------------------
def buggy9(pivot, wordlist):
    result = []
    for word in wordlist:
        minlen = min(len(word), len(pivot))
        for i in range(minlen):
            if word[i] < pivot[i]:
                result.append(word)            
                break # Exit inner for loop
        if len(word) < len(pivot) and word == pivot[:minlen]:
            result.append(word) # Handle prefixes of pivot specially
    return result

# ------------------------------------------------------------------------------
def buggy10(pivot, wordlist):
    result = []
    for word in wordlist:
        for i in range(len(word)):
            if word[i] < pivot[i]:
                result.append(word)
                break # Exit the inner for loop
            elif word[i] > pivot[i]:
                break # Exit the inner for loop
            # Otherwise word[i] == pivot[i], and we continue loop
    return result

Task 2: Google Book Search

In this problem you will use the Google Books web API to create a program that finds and displays books by searching for a string that appears in the title. For example, here is the URL for querying the Google Books web API for books whose title contains the string universe:

https://www.googleapis.com/books/v1/volumes?q=intitle:universe&langRestrict=en&maxResults=40

(link shown in two lines only because it's too long to fit in a single line)

Based on the results returned from this query, your program should display a web page for all the books that have book cover images. Click on the following links for examples of how your web pages might look for a few various search terms:

Your task is to write a program that makes use of the Google Books API and does the following:

Prompts the user (using raw_input) for any search term such as the ones above;
sends a request to the Google Books API to get results for the query;
processes the JSON results to extract the desired information;
and generates an HTML page as in the examples above, displaying formatted information for each book whose title contains the search term, as determined by the Google Books API.

As you can imagine, this is a complex task, because there are so many different pieces that need to be coordinated. We have broken down the task into subtasks to guide your problem solving. Additionally, the ps09 folder contains a file named googlebooksSearch.py with some helper functions defined for you. Flesh out the remaining functions following their contracts. We recommend that you complete each subtask before moving to the next one. Read all the subtasks to get a general idea of what you have to do (and manage your time), and then focus on each subtask one by one. This is also a good way to pace yourselves. You'll need 2-3 sessions of work to complete the entire task, thus, plan accordingly.

Subtask A: Starting the HTML Template

In the last.fm example in lecture and the lab tasks, we always provided you with the HTML template (the file that contains HTML code mixed with Python code). In this Pset, we expect you to write this HTML template yourself. You should write the template in the provided file named gbooksTemplate.html, which is initially empty.

Start by drawing on paper the structure of the web page, using boxes for the different fields of text. An example of such drawing is slide 18-11 in the Web API II lecture. Open the universe page and draw the different elements. Notice that the top of the page has three pieces of information which are different from page to page:

  <img src="/content/psets/ps09/files/header.png" width=400>

Then, each book representation has the same structure, there are seven fields that need to be filled with the data from the API call results. Draw boxes and give them names based on their meaning: book title, year published, etc.

Open the source code of HTML pages in the sample_output folder (either do View Source on the browser, or open with an editor). Start copying HTML code from this page and add it to the gbooksTemplate.html. Because you'll generate the whole page automatically with code, you only need some of the HTML tags, what is sufficient to recreate one single book entry.

Your newly created HTML file (based on sample outputs) will not contain any Python code yet, but you should insert placeholders for the different slots where Python variables will be added later. For example, after you modify the HTML for the first book entry using a sample picture onebook.png, you might get something that looks like this:

  <img src="/content/psets/ps09/files/onebook.png" width=600>

That means, you now have the HTML part in place, and once you create the dictionary containing the data for the page, you can come back to this file and incrementally modify it to insert data from your Python dictionary with the jinja2syntax. In the example line above, you will later want to replace the hardcoded example image path by inserting a URL value with jinja.

Subtask B: Sending the API Request

The Google Books API takes the following parameters:

q: The query. To search for a word in the title, the value of the query should be intitle:searchterm (replace searchterm as needed; the search term can have spaces.)
langRestrict: Restrict results to a certain language. To restrict to English, use 'en' as the value of this parameter.
maxResults: The maximum number of results that should be returned, as an integer. 40 is the maximum that can be returned by the API. You should hardwire this parameter to 40 in all of your searches.

Study the structure of an API call on slide 18-21 in the Web API II lecture. Then, break down the elements of the Google Books API URL:

https://www.googleapis.com/books/v1/volumes?q=intitle:universe&langRestrict=en&maxResults=40

into components such as parameter names (see listed above) and corresponding values. Be especially careful with the first parameter, q. Once you've done this, you can proceed with the coding tasks:

Flesh out the definition for requestBooks
Write the statements you need in main to be able to call requestBooks (see an example of a main function in slide 18-36.)
Follow the instruction 1 in the main docstring to call the helper function writeJSONforExploration that will create a JSON file with the results of the API request.

OPTIONAL: To view the content of this file nicely, you can install the JSON Formatter Chrome plugin. Then, you need to check the box "Allow access to file URLs" in the Extension page for this extension. Finally, drag the JSON file into your browser. You can click on the triangles to control how much content you see.

Subtask C: Extracting the information for the books

In this subtask, you'll inspect the JSON file to find the key:value pairs that you need to access to get the information, and you'll flesh out the body of extractBookInfo and sortByPublishedDate, which together create a dictionary that contains most of the information needeed for filling the HTML template.

Formal Requirements for the page

You have already "figured out" these requirements in Subtask A, but here they are for completeness.

The web page you'll generate should have the header Searching Google Books followed by information that includes: (1) the search term (2) the total number of entries in the Google Books database that match the search term (this is part of the result returned from the Google Books API, and can be much bigger than 40), and (3) the number of books displayed on the page (i.e., those returned from the Google Books API request that have cover images), which can be up to 40.

Below this header information, the page should show a list of at most 40 English books. It should only show books for which there are cover images, so the page in practice will often contain fewer than 40 books.

The displayed list should be sorted by publication year. For each book in the list, display the following, in the same HTML format as the samples.

The title
The name of the author(s). Use the text "Unknown" if no author information is in the response.
The image of the book cover. This is the URL corresponding to "thumbnail".
The publication year. Use "Missing" if the publication date is not available. Books with missing publication years should appear at the bottom of the list.
The description, up to a maximum of 700 characters. Use "No description available" if unavailable. If the description contains more than 700 characters, display only the first 700 characters followed by '...'.
The page count of the book. Use "Not available" if that information is unavailable.
Finally, a link to the Google Books preview page for the book.

JSON Response

Open the JSON file you created in Subtask B and study its content. Specifically, given that the response can be converted into a Python dictionary, identify the combination of keys that will allow you to extract the desired info. Consult the Notebook for Lecture 18 to use the "tunnel down" method.

Try to implement the extractBookInfo incrementally. First create a dictionary that contains only the total number of books, then add to it the list of dictionaries (one dict for book) containing the book titles, and so on. Everytime, print out the output and check that it contains the information you expect.

You can avoid running the main function and going through the process of repeating the API call over and over again by the following strategy, once you have stored the JSON file.

with open("books4universe.json", 'r') as fIn:
    jsonResponse = fIn.read()

extractBookInfo(jsonResponse)

You can write these two lines in the if __name__ == '__main__': block and comment out main and work with the extraction process until it works. Then, you can go back to main, when testing different query words.

Sorting books

The books in the given pages are ordered by the publication year in reverse chronological order. Remember to split the year from the publication date, to not affect the ordering. Because some books will have the date "Missing", you will not be able to perform a simple sorting of the books. To understand this, see this example:

In [1]: years = ["2016", "2010", "Missing", "1987", "Missing"]
In [2]: sorted(years)
Out[2]: ['1987', '2010', '2016', 'Missing', 'Missing']
In [3]: sorted(years, reverse=True)
Out[3]: ['Missing', 'Missing', '2016', '2010', '1987']

Notice how the "Missing" year shows at the start of the list, not at the end, as we want. You'll need to deal with this. Also, keep in mind that you'll be sorting a list of dictionaries and dictionaries cannot be sorted with the function sorted. You'll use a strategy similar to what we have seen in the past, see simple example below:

In [4]: dictList = [{'a': 3, 'y': 10}, {'a': 5, 'y': 20}, {'a': 1, 'y': 15}]
In [5]: pairs = [(dct['y'], dct) for dct in dictList]
In [6]: sorted(pairs)
Out[6]: [(10, {'a': 3, 'y': 10}), (15, {'a': 1, 'y': 15}), (20, {'a': 5, 'y': 20})]

You should take it from here and think of the next steps for fleshing out the function sortedByPublicationYear, which needs to be called toward the end of the body of extractBookInfo. Once your dictionary with the values to fill the HTML template is ready, you can move on to make the final changes to the HTML template and generate the page.

Subtask D: Updating the HTML template

Now that you have the dictionary of data you should go back to the gBooksTemplate.html page and make the needed modifications to the code, by adding Python variables into the HTML code. You'll have a for loop that iterates over the list of book dictionaries. Write variables that refer to the keys of the book dictionaries to display the information in the desired slots. The helper function fillHTMLTemplate accepts two arguments: the filename of the template file and the dictionary with the data for the template. The search term needs to be added to this dictionary, because it's not part of the dictionary returned by the function extractBookInfo.

Again, work incrementally. Don't try to add all content at once. First make the book title appear, then the authors, the image, etc. You will notice mistakes you have made in Subtask C, such as not storing the authors names as a string (it's by default a list), and you'll need to go back and fix the code in extractBookInfo. By focusing at one slot of info at a time, you'll make sure to address each requirement correctly. Periodically, look at the provided HTML page, to make sure that your result is getting really close to it.

The fillHTMLTemplate function returns the string of the HTML page, which is ready to be stored into a file, in Subtask F.

Subtask F: Write file and open page

In the main function call the helper function that writes the HTML page into a file, as well as the one that opens the page in the browser. Make sure to not have space in the name of files for query terms with more than one words.

For the sample search terms universe, feminism, and operating system, your program should generate the files books4universe.html, books4feminism.html, and books4operatingsystem.html that are as close as possible to the .html files in the sample_output directory (but see Notes below for exceptions.)

Notes

You only need worry about generating the correct HTML for each page. The pages are styled by a file style.css that you have been given. You should not edit style.css!
Google Books API is being constantly updated. Occasionally, the results that it returns will differ either slightly or considerably from the examples we have provided. Especially the total number of books is susceptible to this change. Do not worry! As long as your page displays correctly the received information, other details don't matter. We'll be testing your code with new queries (not the ones provided here), and at the moment of the test your page should match our page which will also be generated in the same time.
When you complete this task, the ps09 folder should contain these files related to this task: googlebooksSearch.py, gBooksTemplate.html, style.css, the three JSON files for the three search queries, and the three HTML files with the completed pages.

Task 3: Honor Code Form and Final Checks

As in the previous problem sets, your honor code submission for this pset will involve defining entering values for the variables in the honorcode.py file. This is a Python file, so your values must be valid Python code (strings or numbers).

If you wrote any function invocations or print statements in your Python files to test your code, please remove them, comment them out before you submit, or wrap them in a if __name__=='__main__' block. Points will be deducted for isolated function invocations or superfluous print statements.

This pset does not have any Otter Inspector tests.

How to turn in this Problem Set

Save your final debugging.py. Each team member should also save their solution for googlebooksSearch.py, gBooksTemplate.html, the three JSON files for the three search queries, and the three HTML files with the completed pages.
Save your filled-out honorcode.py file in ps09 folder as well.
Note: It is critical that the name of the folder you submit is ps09, and your submitted files are debugging.py, googlebooksSearch.py, gBooksTemplate.html, and honorcode.py. In other words, do not rename the folder that you downloaded, and do not delete or re-name any of the existing files in this folder or the files that should be created by your code in Task 2. Improperly named files or functions will incur penalties.
Drop your entire ps09 folder in your drop folder on the cs server using Cyberduck by 11:59pm on Tuesday, Apr 18, 2017.
Failure to submit your code before the deadline will result in zero credit for the code portion of PS09.