Problem Set 9 - Due Tue, Apr 18 at 23:59 EST
Reading
- Lecture slides and notebook: Lec 17 Web APIs - Part 1
- Lecture slides and notebook: Lec 18 Web APIs - Part 2
- HTML/CSS Lab notes (PS08 Task 1)
- Lab 11 Web APIs notes
About this Problem Set
This problem set is intended to give you practice working with Web APIs and generating HTML pages from them, as well as get some debugging practice.
-
In Task 1 (Individual Task) you will develop counterexamples that show why buggy solutions to an Exam 1 problem do not work.
-
In Task 2 (Partner Task) you will write several functions that will allow you to complete the Google Books API application explained below. Use this shared Google Doc to find a pair programming partner. Remember that you can work with the same partner a maximum of TWICE this semester.
Carefully study the lecture and lab materials, which show how to do everything you need to do for the Web API task.
The CS111 Problem Set Guide gives an overview of psets, including a detailed description of individual and partner tasks.
In Fall 2016, for Task 2, students spent an average of 4.83 hours (min = 1 hour, max = 10 hours). Task 1 was not part of the assignment in Fall 2016, so we don't have data for it.
All code for this assignment is available in the ps09
folder in
the cs111/download
directory within your cs
server account.
Task 1: Bug Hunting
Background: The correct
function
On Problem 1 of Exam 1 this semester, you were asked to write a function
that takes two arguments (a pivot
string and a list of strings wordlist
)
and returns a list of all the strings of wordlist
that precede pivot
in dictionary order. The words in the returned list should have the same relative order
that they did in the original list.
Here is a function named correct
that correctly solves this problem
using the filtering pattern studied in class.
def correct(pivot, wordlist):
'''Correct solution to Exam 1, Problem 1: a function that takes a pivot
string and a list of strings wordlist and returns a list that includes
only those elements of wordlist that precede pivot in dictionary order.
'''
result = []
for word in wordlist:
if word < pivot: # If word precedes pivot in dictionary order
result.append(word)
return result
Note that word < pivot
is a simple way in Python to test if the string word
comes before pivot
in dictionary order.
Here's a sample use of the correct
function:
In [1]: correct('cat', ['bunny', 'elephant', 'dog', 'bat', 'ant', 'dog', 'bat'])
Out[1]: ['bunny', 'bat', 'ant', 'bat']
Note that elements in the output list have the same relative order they did
in the input list. Also note that if a string preceding the pivot (such as 'bat'
) occurs
multiple times in the input list, it should occur multiple times in the output list.
Background: counterexamples
In this problem you will study ten incorrect programs for this problem and determine why they don't work. Incorrect programs are said to be buggy because they contain bugs = reasons why they don't work. So your goal in this problem is to track down the bugs in each of the ten programs.
One way to show that a program is buggy is to provide a counterexample, which is a particular set of inputs for which the program does not behave correctly. For a counterexample, the buggy program might return a wrong answer, or it might raise an exception when the correct program does not.
First buggy function example: buggy0a
Here is a function buggy0a
that is an incorrect version of the correct
function:
def buggy0a(pivot, wordlist):
result = []
for word in wordlist:
if word > pivot:
result.append(word)
return result
buggy0a
illustrates a common bug for this function on the exam: it performs the wrong comparison
between word
and pivot
, and so returns all the words in wordlist
that come
after the pivot, not before the pivot.
In the case of buggy0a
, counterexamples are very easy to find. Any pivot with
a nonempty wordlist list that is not just a singleton list (a list of length 1) containing just the pivot
is a counterexample. For instance, ('c', ['a', 'd', 'c', 'e', 'b'])
is a counterexample,
because correct
and buggy0
return different results for these inputs:
In [2]: correct('c', ['b', 'd', 'a', 'c', 'e'])
Out[2]: ['b', 'a']
In [3]: buggy0a('c', ['b', 'd', 'a', 'c', 'e'])
Out[3]: ['d', 'e'] # Different answer from the correct one
However, ('c', [])
and ('c', ['c'])
are not counterexamples, because both
correct
and buggy0a
return the same results for these:
In [4]: correct('c', [])
Out[4]: []
In [5]: buggy0a('c', [])
Out[5]: [] # The same answer as correct
In [6]: correct('c', ['c'])
Out[6]: []
In [7]: buggy0a('c', ['c'])
Out[7]: [] # The same answer as correct
For the exam problem, we will say that a counterexample is minimal if it uses
the shortest wordlist that a counterexample can have. For buggy0a
, the shortest wordlist
is one that contains a single string that is not the pivot. Here are two examples of
minimal counterexamples for buggy0a
:
In [8]: correct('c', ['a']) # Can use any string before 'c' in dictionary order.
Out[8]: ['a']
In [9]: buggy0('c', ['a'])
Out[9]: [] # A different answer from correct
In [10]: correct('c', ['d']) # Can use any string after 'c' in dictionary order.
Out[10]: []
In [11]: buggy0('c', ['d'])
Out[11]: ['d'] # A different answer from correct
Second buggy function example: buggy0b
As a more complex buggy funtion, consider buggy0b
:
def buggy0b(pivot, wordlist):
result = []
for word in wordlist:
minlen = min(len(word), len(pivot))
i = 0
while i < minlen and word[i] <= pivot[i]:
i += 1
if i == minlen and (word[:minlen] != pivot[:minlen] or (len(word) < len(pivot))):
result.append(word)
return result
This function is rather complex, so to think about it, let's start by assuming that word
and pivot
have the same length and ignoring the condition (word[:minlen] != pivot[:minlen] or (len(word) < len(pivot)))
at the end.
What does the while
loop do? It counts (in the variable i
) the number of consecutive times that an
index in word
has a letter that is less than or equal to the letter at the corresponding index in pivot
.
The counting stops when either i
reaches minlen
or word[i]
is strictly greater than pivot[i]
.
Only if every letter in the first minlen
letters of word
is less than or equal
to the corresponding letter in pivot
will word
be added to the result list. For example, if pivot
is 'do'
, then the
word 'an'
satisfies this condition, because 'a' <= 'd'
and 'n' <= 'o'
.
But the word 'at'
does not satisfy this condition, since 't'
is not
less than or equal to 'o'
.
This is problematic, since 'at'
comes before 'do'
in alphabetical order,
but buggy0b
will not include it in the result list:
In [12]: correct('do', ['an', 'at', 'do', 'ex'])
Out[12]: ['an', 'at']
In [13]: buggy0b('do', ['an', 'at', 'do', 'ex'])
Out[13]: ['an']
A minimal counterexample for buggy0b
involves a list with one string that
is included by correct
but not by buggy0b
:
In [14]: correct('do', ['at'])
Out[14]: ['at']
In [15]: buggy0b('do', ['at'])
Out[15]: []
Now let's consider some of the complexities of buggy0b
that we ignored at first.
-
What is the purpose of
minlen
? Thewhile
loop includes the testword[i] <= pivot[i]
. If one ofword
orpivot
is longer than the other (e.g.,pivot
is'dog'
andword
is'an'
), it's important to ensure that an index only legal in the longer word is never used in the shorter word. For example,'dog[2]'
makes sense, but'at[2]'
would raise an index-out-of-bounds exception. By guaranteeing that the limit ofi
is controlled by the shorter of the two words, an index-out-of-bounds exception is prevented. -
What is the purpose of the condition
(word[:minlen] != pivot[:minlen] or (len(word) < len(pivot)))
? In some solutions it is necessary to specially handle cases whereword
is a prefix ofpivot
or vice versa. A stringp
is said to be a prefix of strings
if there is some (possibly empty) stringq
such thatp + q == s
. For example, ifpivot
is'dog'
, then the words'd'
,'do'
and'dog'
are prefixes of thepivot
, andpivot
is a prefix of the words'dog'
,'dogs'
, and'doggy'
.Which words in the word list
['d', 'do', 'dog', 'dogs', 'doggy']
should be in the correct result for the pivot'dog'
? Only'd'
and'do'
come before'dog'
in dictionary order, so only these two words should be in the result. But all five words will satisfy the conditioni == minlen
inbuggy0b
because only the characters ofword[:minlen]
andpivot[:minlen]
are compared, and these are guaranted to be equal whenword
andpivot
are in a prefix relationship.What can be done to keep only the two smaller prefixes? The condition
word[:minlen] != pivot[:minlen]
is true whenword
is not a prefix ofpivot
(or vice versa), so words not in a prefix relationship withpivot
(that also satisfyi == minlen
) can be included in the result list. But ifword
andpivot
are in a prefix relationship, only words that are shorter thanpivot
should be included, which is the rationale behind the conditionlen(word) < len(pivot)
. This means thatbuggy0b
behaves correctly on words in a prefix relationship withpivot
:In [16]: correct('dog', ['d', 'do', 'dog', 'dogs', 'doggy']) Out[16]: ['d', 'do'] In [17]: buggy0b('dog', ['d', 'do', 'dog', 'dogs', 'doggy']) Out[17]: ['d', 'do']
So prefixes are not helpful in designing counterexamples for
buggy0b
. But they are helpful in some of the other buggy functions below.
Finally, when debugging a complex function like buggy0b
,
it is often very helpful to add print
statements to a function to
better understand how it works. For example, here is a version of buggy0b
augmented
with numerous print
statements:
def buggy0bWithPrints(pivot, wordlist):
result = []
for word in wordlist:
minlen = min(len(word), len(pivot))
print '-'*50
print 'word ->', word, '; pivot ->', pivot, '; minlen ->', minlen
i = 0
if i < minlen: # while loop will only execute when i < minlen
print 'While loop: i ->', i, '; word[i] ->', word[i], '; pivot[i] ->', pivot[i],\
'; word[i] <= pivot[i] ->', word[i] <= pivot[i]
while i < minlen and word[i] <= pivot[i]:
i += 1
if i < minlen: # while loop will only execute when i < minlen
print 'While loop: i ->', i, '; word[i] ->', word[i], '; pivot[i] ->', pivot[i],\
'; word[i] <= pivot[i] ->', word[i] <= pivot[i]
print 'After while loop: i ->', i, '; i == minlen ->', i == minlen, ';'
print ' word[:minlen] != pivot[:minlen] ->', word[:minlen] != pivot[:minlen], ';'
print ' (len(word) < len(pivot)) ->', (len(word) < len(pivot))
if i == minlen and (word[:minlen] != pivot[:minlen] or (len(word) < len(pivot))):
print 'result.append ->', word
result.append(word)
return result
Running buggy0bWithPrints
on an example provides a lot of insight into how it works:
In [18]: buggy0bWithPrints('do', ['a', 'an', 'ash', 'd', 'do', 'dog', 'egg'])
--------------------------------------------------
word -> a ; pivot -> do ; minlen -> 1
While loop: i -> 0 ; word[i] -> a ; pivot[i] -> d ; word[i] <= pivot[i] -> True
After while loop: i -> 1 ; i == minlen -> True ;
word[:minlen] != pivot[:minlen] -> True ;
(len(word) < len(pivot)) -> True
result.append -> a
--------------------------------------------------
word -> an ; pivot -> do ; minlen -> 2
While loop: i -> 0 ; word[i] -> a ; pivot[i] -> d ; word[i] <= pivot[i] -> True
While loop: i -> 1 ; word[i] -> n ; pivot[i] -> o ; word[i] <= pivot[i] -> True
After while loop: i -> 2 ; i == minlen -> True ;
word[:minlen] != pivot[:minlen] -> True ;
(len(word) < len(pivot)) -> False
result.append -> an
--------------------------------------------------
word -> ash ; pivot -> do ; minlen -> 2
While loop: i -> 0 ; word[i] -> a ; pivot[i] -> d ; word[i] <= pivot[i] -> True
While loop: i -> 1 ; word[i] -> s ; pivot[i] -> o ; word[i] <= pivot[i] -> False
After while loop: i -> 1 ; i == minlen -> False ;
word[:minlen] != pivot[:minlen] -> True ;
(len(word) < len(pivot)) -> False
--------------------------------------------------
word -> d ; pivot -> do ; minlen -> 1
While loop: i -> 0 ; word[i] -> d ; pivot[i] -> d ; word[i] <= pivot[i] -> True
After while loop: i -> 1 ; i == minlen -> True ;
word[:minlen] != pivot[:minlen] -> False ;
(len(word) < len(pivot)) -> True
result.append -> d
--------------------------------------------------
word -> do ; pivot -> do ; minlen -> 2
While loop: i -> 0 ; word[i] -> d ; pivot[i] -> d ; word[i] <= pivot[i] -> True
While loop: i -> 1 ; word[i] -> o ; pivot[i] -> o ; word[i] <= pivot[i] -> True
After while loop: i -> 2 ; i == minlen -> True ;
word[:minlen] != pivot[:minlen] -> False ;
(len(word) < len(pivot)) -> False
--------------------------------------------------
word -> dog ; pivot -> do ; minlen -> 2
While loop: i -> 0 ; word[i] -> d ; pivot[i] -> d ; word[i] <= pivot[i] -> True
While loop: i -> 1 ; word[i] -> o ; pivot[i] -> o ; word[i] <= pivot[i] -> True
After while loop: i -> 2 ; i == minlen -> True ;
word[:minlen] != pivot[:minlen] -> False ;
(len(word) < len(pivot)) -> False
--------------------------------------------------
word -> egg ; pivot -> do ; minlen -> 2
While loop: i -> 0 ; word[i] -> e ; pivot[i] -> d ; word[i] <= pivot[i] -> False
After while loop: i -> 0 ; i == minlen -> False ;
word[:minlen] != pivot[:minlen] -> True ;
(len(word) < len(pivot)) -> False
Out[18]: ['a', 'an', 'd']
Your Task
The file ps09/debugging.py
contains 10 buggy implementations of the correct
function,
which are shown below. For each function buggy
i, you should do two things:
-
Define the variable
counterExample
i to be a pair (2-tuple) of (1) a pivot string and (2) a wordlist. The paircounterExample
i should be a minimal counterexample that indicates whybuggy
i is incorrect.# Minimal counterexample for buggy0a counterExample0a = ('c', ['a']) # Minimal counterexample for buggy0b counterExample0b = ('do', ['at'])
-
Define the variable
explanation
i to be a string that briefly explains why the function is buggy. For example,# An explanation of why buggy0a is buggy explanation0a = """ Returns all words *after* pivot rather than *before* pivot. """ # An explanation of why buggy0b is buggy explanation0b = """ Requires *each* index shared by word and pivot to contain a letter in word that is less than or equal to the letter at the corresponding index of pivot. But words that should correctly precede pivot in dictionary order and are not prefixes of pivot only need to have *one* index at which the letter in word is less than the letter in pivot (following a sequence of indices at which the letters are equal). So the result list excludes some words it should include. """
Explanation strings should be delimited by three double-quotes, which allows them to contain multiple lines.
Notes:
-
How do you find counterexamples for a buggy function? Start by understanding in detail how the function works on some simple inputs. To do this:
-
use the code execution models we've used in class (especially iteration tables)
- add
print
statements to the buggy functions to understand better exactly how they work. (Seebuggy0bWithPrints
above for an example of this.)
In this particular problem you need to focus on how a word from the word list is processed relative to the pivot. To keep things simple, start with a small pivot (a two-character string, say) and then consider small words (one to three characters long) that come before and after the pivot in dictionary order.
Write down on paper details of the comparison between the pivot and the sample words to understand what each function is doing, and use this knowledge to find pairs of pivots and words where the function misbehaves.
-
-
There are no Otter Inspect test cases for this problem. We want you to invoke the correct and buggy functions on counterexamples within Canopy to get a more visceral experience for the debugging process.
-
In addition to the correct and buggy functions returning different results, a counterexample can also cause one function (but not the other) to raise an exception, or cause the correct and buggy functions to raise different exceptions.
-
You can't directly call the
correct
orbuggy
i functions oncounterExample
i, because they expect two separate arguments. But a tuple is a single argument, so this leads to a type error:In [19]: correct(counterExample0) --------------------------------------------------------------------------- TypeError Traceback (most recent call last) <ipython-input-3-9511ff6b0e29> in <module>() ----> 1 correct(counterExample0)
However, Python has a special tuple-unpacking notation that can be used to avoid the type error in this context. Putting a
*
before a variable that denotes a tuple in the argument position of a function tells Python to unpack the components of the tuple, and treat them as separate arguments to the function.In [20]: correct(*counterExample0a) Out[20]: ['a'] In [21]: buggy0a(*counterExample0a) Out[21]: []
The 10 Buggy Functions
The ten buggy functions can be found in the file ps09/debugging.py
, which also contains
definitions for the correct
, buggy0a
, and buggy0b
function.
Below each buggy
i function function is a counterExample
i variable that you should
define as a pair (2-tuple) of (1) a pivot string and (2) a minimum length
wordlist that serves as a counterexample relative to the pivot.
Below each buggy
i function function is also a explanation
i variable that you should
define as a string that explains why buggy
i is buggy.
Most of these buggy programs are based on incorrect programs
that students wrote for Problem 1 on Exam 1. They have been cleaned up
in various ways, but they illustrate reasoning issues that students actually
exhibit in practice. Most commonly, students did not understand that
strings could use <
in Python to compare two strings in dictionary order, and
came up with complex solutions involving loops that tested the strings
letter-by-letter. It is possible to define a correct loop involving
letter-by-letter comparisons, but it is rather challenging to do so,
as several of the following functions indicate.
# ------------------------------------------------------------------------------
def buggy1(pivot, wordlist):
result = []
for word in wordlist:
if word <= pivot:
result.append(word)
return result
# ------------------------------------------------------------------------------
def buggy2(pivot, wordlist):
result = []
for word in wordlist:
if word[0] < pivot[0]:
result.append(word)
return result
# ------------------------------------------------------------------------------
def buggy3(pivot, wordlist):
result = []
for word in wordlist:
if word < pivot:
result.append(word)
return result
# ------------------------------------------------------------------------------
def buggy4(pivot, wordlist):
result = []
for word in wordlist:
if word < pivot and word not in result:
result.append(word)
return result
# ------------------------------------------------------------------------------
def buggy5(pivot, wordlist):
result = []
for word in wordlist:
minlen = min(len(word), len(pivot))
for i in range(minlen):
if word[i] < pivot[i]:
result.append(word)
elif word[i] > pivot[i]:
break # exit inner for loop if word comes after pivot
if len(word) < len(pivot) and word == pivot[:minlen]:
result.append(word) # Handle prefixes of pivot specially
return result
# ------------------------------------------------------------------------------
def buggy6(pivot, wordlist):
result = []
for word in wordlist:
minlen = min(len(word), len(pivot))
allLess = True
for i in range(minlen):
if word[i] >= pivot[i]:
allLess = False
if allLess:
result.append(word)
return result
# ------------------------------------------------------------------------------
def buggy7(pivot, wordlist):
result = []
for word in wordlist:
minlen = min(len(word), len(pivot))
i = 0
while i < minlen and word[i] == pivot[i]:
i += 1
# After the while loop, either (1) i is equal to minlen or
# (2) i is the first index at which word and pivot differ
if (i == minlen or word[i] < pivot[i]) and pivot != word:
result.append(word)
return result
# ------------------------------------------------------------------------------
def buggy8(pivot, wordlist):
result = []
for word in wordlist:
minlen = min(len(word), len(pivot))
i = 0
while i < minlen and word[i] == pivot[i]:
i += 1
# After the while loop, either (1) i is equal to minlen or
# (2) i is the first index at which word and pivot differ
if i < minlen and word[i] < pivot[i]:
result.append(word)
return result
# ------------------------------------------------------------------------------
def buggy9(pivot, wordlist):
result = []
for word in wordlist:
minlen = min(len(word), len(pivot))
for i in range(minlen):
if word[i] < pivot[i]:
result.append(word)
break # Exit inner for loop
if len(word) < len(pivot) and word == pivot[:minlen]:
result.append(word) # Handle prefixes of pivot specially
return result
# ------------------------------------------------------------------------------
def buggy10(pivot, wordlist):
result = []
for word in wordlist:
for i in range(len(word)):
if word[i] < pivot[i]:
result.append(word)
break # Exit the inner for loop
elif word[i] > pivot[i]:
break # Exit the inner for loop
# Otherwise word[i] == pivot[i], and we continue loop
return result
Task 2: Google Book Search
In this problem you will use the Google Books web API to create a program that finds and
displays books by searching for a string that appears in the title. For example, here is
the URL for querying the Google Books web API for books whose title contains the string universe
:
https://www.googleapis.com/books/v1/volumes?q=intitle:universe&langRestrict=en&maxResults=40
(link shown in two lines only because it's too long to fit in a single line)
Based on the results returned from this query, your program should display a web page for all the books that have book cover images. Click on the following links for examples of how your web pages might look for a few various search terms:
Your task is to write a program that makes use of the Google Books API and does the following:
- Prompts the user (using
raw_input
) for any search term such as the ones above; - sends a request to the Google Books API to get results for the query;
- processes the JSON results to extract the desired information;
- and generates an HTML page as in the examples above, displaying formatted information for each book whose title contains the search term, as determined by the Google Books API.
As you can imagine, this is a complex task, because there are so many different pieces that
need to be coordinated.
We have broken down the task into subtasks to guide your problem solving. Additionally,
the ps09
folder contains a file named googlebooksSearch.py
with some helper functions defined for you.
Flesh out the remaining functions following their contracts. We recommend that you complete
each subtask before moving to the next one. Read all the subtasks to get a general idea
of what you have to do (and manage your time), and then focus on each subtask one by one.
This is also a good way to pace yourselves. You'll need 2-3 sessions of work to complete
the entire task, thus, plan accordingly.
Subtask A: Starting the HTML Template
In the last.fm example in lecture and the lab tasks, we always provided you with the
HTML template (the file that contains HTML code mixed with Python code). In this Pset,
we expect you to write this HTML template yourself. You should write the template in
the provided file named gbooksTemplate.html
, which is initially empty.
Start by drawing on paper the structure of the web page, using boxes for the different fields of text. An example of such drawing is slide 18-11 in the Web API II lecture. Open the universe page and draw the different elements. Notice that the top of the page has three pieces of information which are different from page to page:
<img src="/content/psets/ps09/files/header.png" width=400>
Then, each book representation has the same structure, there are seven fields that need to be filled with the data from the API call results. Draw boxes and give them names based on their meaning: book title, year published, etc.
Open the source code of HTML pages in the sample_output
folder (either do View Source on the browser, or open with
an editor). Start copying HTML code from this page
and add it to the gbooksTemplate.html
. Because you'll generate the whole page automatically with code,
you only need some of the HTML tags, what is sufficient to recreate one single book entry.
Your newly created HTML file (based on sample outputs) will not contain any Python code yet, but you should insert placeholders
for the different slots where Python variables will be added later. For example, after
you modify the HTML for the first book entry using a sample picture onebook.png
, you might get something that looks like this:
<img src="/content/psets/ps09/files/onebook.png" width=600>
That means, you now have the HTML part in place, and once you create the dictionary
containing the data for the page, you can come back to this file and incrementally modify it
to insert data from your Python dictionary with the jinja2
syntax. In the example line above,
you will later want to replace the hardcoded example image path by inserting a URL value with jinja.
Subtask B: Sending the API Request
The Google Books API takes the following parameters:
- q: The query. To search for a word in the title, the value of the query should be
intitle:searchterm
(replacesearchterm
as needed; the search term can have spaces.) - langRestrict: Restrict results to a certain language. To restrict to English, use 'en' as the value of this parameter.
- maxResults: The maximum number of results that should be returned, as an integer. 40 is the maximum that can be returned by the API. You should hardwire this parameter to 40 in all of your searches.
Study the structure of an API call on slide 18-21 in the Web API II lecture. Then, break down the elements of the Google Books API URL:
https://www.googleapis.com/books/v1/volumes?q=intitle:universe&langRestrict=en&maxResults=40
into components such as parameter names (see listed above) and corresponding values. Be especially careful with the first parameter, q. Once you've done this, you can proceed with the coding tasks:
- Flesh out the definition for
requestBooks
- Write the statements you need in
main
to be able to callrequestBooks
(see an example of amain
function in slide 18-36.) - Follow the instruction 1 in the
main
docstring to call the helper functionwriteJSONforExploration
that will create a JSON file with the results of the API request.
OPTIONAL: To view the content of this file nicely, you can install the JSON Formatter Chrome plugin.
Then, you need to check the box "Allow access to file URLs" in the Extension page for this extension.
Finally, drag the JSON file into your browser. You can click on the triangles to control how much content you see.
Subtask C: Extracting the information for the books
In this subtask, you'll inspect the JSON file to find the key:value pairs
that you need to access to get the information, and you'll flesh out the body of extractBookInfo
and
sortByPublishedDate
, which together create a dictionary that contains most of the information
needeed for filling the HTML template.
Formal Requirements for the page
You have already "figured out" these requirements in Subtask A, but here they are for completeness.
The web page you'll generate should have the header Searching Google Books followed by information that includes: (1) the search term (2) the total number of entries in the Google Books database that match the search term (this is part of the result returned from the Google Books API, and can be much bigger than 40), and (3) the number of books displayed on the page (i.e., those returned from the Google Books API request that have cover images), which can be up to 40.
Below this header information, the page should show a list of at most 40 English books. It should only show books for which there are cover images, so the page in practice will often contain fewer than 40 books.
The displayed list should be sorted by publication year. For each book in the list, display the following, in the same HTML format as the samples.
- The title
- The name of the author(s). Use the text "Unknown" if no author information is in the response.
- The image of the book cover. This is the URL corresponding to "thumbnail".
- The publication year. Use "Missing" if the publication date is not available. Books with missing publication years should appear at the bottom of the list.
- The description, up to a maximum of 700 characters. Use "No description available" if unavailable. If the description contains more than 700 characters, display only the first 700 characters followed by '...'.
- The page count of the book. Use "Not available" if that information is unavailable.
- Finally, a link to the Google Books preview page for the book.
JSON Response
Open the JSON file you created in Subtask B and study its content. Specifically, given that the response can be converted into a Python dictionary, identify the combination of keys that will allow you to extract the desired info. Consult the Notebook for Lecture 18 to use the "tunnel down" method.
Try to implement the extractBookInfo
incrementally. First create a dictionary that contains
only the total number of books, then add to it the list of dictionaries (one dict for book)
containing the book titles,
and so on. Everytime, print out the output and check that it contains the information you expect.
You can avoid running the main
function and going through the process of repeating the API
call over and over again by the following strategy, once you have stored the JSON file.
with open("books4universe.json", 'r') as fIn:
jsonResponse = fIn.read()
extractBookInfo(jsonResponse)
You can write these two lines in the if __name__ == '__main__':
block and comment out
main
and work with the extraction process until it works.
Then, you can go back to main, when testing different query words.
Sorting books
The books in the given pages are ordered by the publication year in reverse chronological order. Remember to split the year from the publication date, to not affect the ordering. Because some books will have the date "Missing", you will not be able to perform a simple sorting of the books. To understand this, see this example:
In [1]: years = ["2016", "2010", "Missing", "1987", "Missing"]
In [2]: sorted(years)
Out[2]: ['1987', '2010', '2016', 'Missing', 'Missing']
In [3]: sorted(years, reverse=True)
Out[3]: ['Missing', 'Missing', '2016', '2010', '1987']
Notice how the "Missing" year shows at the start of the list, not at the end, as we want.
You'll need to deal with this. Also, keep in mind that you'll be sorting a list of
dictionaries and dictionaries cannot be sorted with the function sorted
. You'll use
a strategy similar to what we have seen in the past, see simple example below:
In [4]: dictList = [{'a': 3, 'y': 10}, {'a': 5, 'y': 20}, {'a': 1, 'y': 15}]
In [5]: pairs = [(dct['y'], dct) for dct in dictList]
In [6]: sorted(pairs)
Out[6]: [(10, {'a': 3, 'y': 10}), (15, {'a': 1, 'y': 15}), (20, {'a': 5, 'y': 20})]
You should take it from here and think of the next steps for fleshing out the function
sortedByPublicationYear
, which needs to be called toward the end of the body of extractBookInfo
.
Once your dictionary with the
values to fill the HTML template is ready, you can move on to make the
final changes to the HTML template and generate the page.
Subtask D: Updating the HTML template
Now that you have the dictionary of data you should go back to the gBooksTemplate.html
page
and make the needed modifications to the code, by adding Python variables into the HTML code.
You'll have a for loop that iterates over the list of book dictionaries. Write variables that
refer to the keys of the book dictionaries to display the information in the desired slots.
The helper function fillHTMLTemplate
accepts two arguments: the filename of the template file
and the dictionary with the data for the template. The search term needs to be added to this
dictionary, because it's not part of the dictionary returned by the function extractBookInfo
.
Again, work incrementally. Don't try to add all content at once. First make the book title appear,
then the authors, the image, etc. You will notice mistakes you have made in Subtask C, such as
not storing the authors names as a string (it's by default a list), and you'll need to go back
and fix the code in extractBookInfo
. By focusing at one slot of info at a time, you'll make
sure to address each requirement correctly. Periodically, look at the provided HTML page,
to make sure that your result is getting really close to it.
The fillHTMLTemplate
function returns the string of the HTML page, which is ready to be stored
into a file, in Subtask F.
Subtask F: Write file and open page
In the main
function call the helper function that writes the HTML page into a file, as well as
the one that opens the page in the browser. Make sure to not have space in the name of files for
query terms with more than one words.
For the sample search terms universe
, feminism
, and operating system
, your program
should generate the files books4universe.html
, books4feminism.html
, and books4operatingsystem.html
that are as close as possible to the .html
files in the sample_output
directory (but see Notes
below for exceptions.)
Notes
-
You only need worry about generating the correct HTML for each page. The pages are styled by a file
style.css
that you have been given. You should not editstyle.css
! -
Google Books API is being constantly updated. Occasionally, the results that it returns will differ either slightly or considerably from the examples we have provided. Especially the total number of books is susceptible to this change. Do not worry! As long as your page displays correctly the received information, other details don't matter. We'll be testing your code with new queries (not the ones provided here), and at the moment of the test your page should match our page which will also be generated in the same time.
- When you complete this task, the
ps09
folder should contain these files related to this task:googlebooksSearch.py
,gBooksTemplate.html
,style.css
, the three JSON files for the three search queries, and the three HTML files with the completed pages.
Task 3: Honor Code Form and Final Checks
As in the previous problem sets, your honor code submission for this pset will involve
defining entering values for the variables in the honorcode.py
file.
This is a Python file, so your values must be valid Python code (strings or numbers).
If you wrote any function invocations or print statements in your Python files to test your code,
please remove them, comment them out before you submit, or wrap them in a if __name__=='__main__'
block.
Points will be deducted for isolated function invocations or superfluous print statements.
This pset does not have any Otter Inspector tests.
How to turn in this Problem Set
- Save your final
debugging.py
. Each team member should also save their solution forgooglebooksSearch.py
,gBooksTemplate.html
, the three JSON files for the three search queries, and the three HTML files with the completed pages. - Save your filled-out
honorcode.py
file inps09
folder as well. - Note: It is critical that the name of the folder you submit is
ps09
, and your submitted files aredebugging.py
,googlebooksSearch.py
,gBooksTemplate.html
, andhonorcode.py
. In other words, do not rename the folder that you downloaded, and do not delete or re-name any of the existing files in this folder or the files that should be created by your code in Task 2. Improperly named files or functions will incur penalties. - Drop your entire
ps09
folder in yourdrop
folder on thecs
server using Cyberduck by 11:59pm on Tuesday, Apr 18, 2017. - Failure to submit your code before the deadline will result in zero credit for the code portion of PS09.