Table of Contents
requests
module¶The module requests
makes it very easy to send an HTTP request to a web server. Then, the results that comes from the server is stored as a special object, which organizes the response according to a certain structure specified by the HTTP protocol.
Note: Some Python installations seem to not include the module requests
, if that is the case, you can easily install it with the following command.
!pip install requests
import requests
url = "http://cs111.wellesley.edu/content/info/simple.html"
httpResponse = requests.get(url) # Send the request
type(httpResponse) # What did we receive?
Exploring the HTTP response
httpResponse # the object
print(dir(httpResponse)) # show all properties or methods of the object
httpResponse.status_code # show status of the HTTP response. For example, 200 means OK, 404, means Page not found.
httpResponse.url # show url from which the content was received
httpResponse.headers # show metainformation about the page
There are two properties of the object that are useful to retrieve the content of the response:
httpResponse.content
httpResponse.text
These look the same, are they the same?
httpResponse.content == httpResponse.text
Let's check the types:
type(httpResponse.content)
type(httpResponse.text)
The first type is a new Python type, bytes. It exists because often the files that we will be requesting are not text files, but some other type such as PNG, JPEG, GIF, PDF, etc. We can see the difference with the example of a GIF file, which we will download from this link.
url2 = "http://cs111.wellesley.edu/content/info/wait_32.gif"
httpResponse2 = requests.get(url2)
httpResponse2
httpResponse2.content
We can save this content in bytes into a file, notice the mode of writing in the file:
with open("ourgif.gif", 'wb') as outputf: # notice 'wb' - write bytes
outputf.write(httpResponse2.content)
We can then display this file here in our notebook, because it is has been saved on our own machine:
from IPython.display import Image
Image(filename='ourgif.gif')
Meanwhile, if we try to the same with the text property, things look a bit different, because this property has encoded the bytes as UNICODE characters, instead of leaving them alone as a stream of bytes:
httpResponse2.text
with open("ourgif_text.gif", 'w') as outputf:
outputf.write(httpResponse2.text)
from IPython.display import Image
Image(filename='ourgif_text.gif')
Summary: If we are retrieving mostly text-based files (such as HTML), it's better to use the property text
, but if we don't know what our data are going to be, it's better to use the property content
.
lyrics.ovh is a website that provides an API to access song lyrics saved in their database.
The URL for retrieving a song has this structure:
https://api.lyrics.ovh/v1/artist/title
where we need to replace artist with an artist name and title with a song title. Click on the link below to see an example.
https://api.lyrics.ovh/v1/coldplay/yellow
Below we provide a function that packages the process of sending the request and getting the response in a function, that makes it easy for us to provide the artist name and song title.
def get_lyrics(artist, title):
"""Create a URL to send a request to the API server.
"""
url = f"https://api.lyrics.ovh/v1/{artist}/{title}"
response = requests.get(url)
if response.status_code == 200:
return response.text
else:
print("There was an error: ", response.reason)
return ''
Now let's try it out:
get_lyrics("alanis morissette", 'ironic')
get_lyrics("beyonce", "hold up")
get_lyrics("shakira", "hips don't lie")
get_lyrics("3 doors down", "here without you and") # I added an extra word to the title to show what happens
When a song is not found, we receive a 404 error (this is a common HTTP error to convey that the resource was not found on the server).
.json
method to convert string to JSON format¶Most often, APIs will return data formatted as JSON. Our example above is a dictionary with only one key, lyrics. Python's request
module provides a method to convert the response to JSON, so that we can access the structure more easily.
Let's see an example:
url = "https://api.lyrics.ovh/v1/coldplay/yellow"
response = requests.get(url)
response.json()
We can see that this is a dictionary, so we can access the key:
yellowSong = response.json()
print(yellowSong['lyrics'])
We will modify the function we wrote, so that we convert the result into JSON right away:
def get_lyrics_JSON(artist, title):
"""Create a URL to send a request to the API server.
"""
url = f"https://api.lyrics.ovh/v1/{artist}/{title}"
response = requests.get(url)
if response.status_code == 200:
return response.json() # CHANGE: this was changed from .text to .json()
else:
print("There was an error: ", response.reason)
return ''
Let's test it again:
abba = get_lyrics_JSON("abba", "dancing queen")
if abba:
print(abba['lyrics'])
Now is your turn to put everything together. Write a function, show_lyrics
with the following characterics:
get_lyrics_JSON
with the appropriate argumentsHere is a screenshot of the first call of this function:
Advice: First write a version of the function that executes the two input calls only once. When that works, you can add the While loop.
# Your code here
def show_lyrics():
"""A function that retrieves lyrics in a loop.
"""
while True:
input1 = input("Enter an artist name or q (to quit):")
if input1 == 'q':
return
else:
artist = input1
song = input(f"Enter a song from {artist}:")
results = get_lyrics_JSON(artist, song)
if results:
print(results['lyrics'])
else:
print("Sorry, we couldn't find this song, try again!")
# Test the function
show_lyrics()
The Internet Archive is the biggest free library of many artefacts, such as books, movies, tv segments, music, images, etc. In addition to providing access to these resources through its website, it also provides several APIs, such as the Open Library API.
To use the API, one needs to have some iformation about the entities to retrieve. For example, the ISBN of the books, or the OpenLibrary ID for authors, etc. Once that information is known, the API makes it simple to retrieve the desired data.
Here are some examples:
https://openlibrary.org/isbn/9780140328721.json
https://openlibrary.org/authors/OL34184A.json
Because there are similarities in the two URLs, below we show a function that can be used to retrieve information for both books with known ISBN and authors with known OpenLibrary ids. Notice that the function parameter is then used in generating the URL for the request.
def get_entity_JSON(entityPath):
"""Create a URL to send a request to the Open Library API server.
"""
url = f"https://openlibrary.org{entityPath}.json"
response = requests.get(url)
if response.status_code == 200:
return response.json()
else:
print("There was an error: ", response.reason)
return ''
Get info about a book
book = get_entity_JSON('/isbn/9780140328721')
book['title'], book['publish_date']
Get info about an author
author = get_entity_JSON('/authors/OL34184A')
author['name'], author['birth_date']
How do we know what keys to use for books or authors?
We can lok them up.
book.keys()
author.keys()
YOUR Turn: Find the author from the book
Above we used an ISBN value and an OpenLibrary ID for the author. While the ISBN info is easy to find on the Web (simply Google for book name and word ISBN), how do we find the OpenLibrary IDs? It turns out that this information is part of the dictionary of the book, in the field authors.
Given the ISBN number "0156628708" find the name of the book and its author (by using the information in the book fields).
Algorithm
get_entity_json
to retrieve the info for the book that has ISBN 0156628708. Print its title.authors
. Find how you can access the value you need to get the author's info (it's a bit nested in the result). Store it in a variable.get_entity_json
again, now for the author, then print the author's nameNo need to write a function, simply write statements that do the above things one by one.
# Your code here
# Step 1, get the book
book = get_entity_JSON("/isbn/0156628708")
print(book['title'])
print(book['authors'])
# Step 2, get the author key
authorID = book['authors'][0]['key']
# Step 3, get the author name
author = get_entity_JSON(authorID)
print(author['name'])
Write a function, sorted_by_publish_date
, which takes as a parameter a list of ISBN values, retrieves the information about the books from Open Library API and returns a list of dictionaries, each of them containing the title of the book and its publication date. Because occasionally a key might be missing, always use the method get
to access keys from JSON data from APIs. Finally, the list of dictionaries must be sorted by the publication date. (You will need to write a simple helper function to help with sorting.)
For the list of ISBNs given below, you should expect the following result:
[{'title': "The Russian Debutante's Handbook", 'publish_date': 'Apr 29, 2003'},
{'title': 'All the light we cannot see', 'publish_date': '2014'},
{'title': 'The fifth season', 'publish_date': 'Sep 08, 2016'},
{'title': 'The idiot', 'publish_date': '2017'},
{'title': 'Asymmetry', 'publish_date': '2018'},
{'title': 'How to do Nothing', 'publish_date': '2019'}]
Notice that occasionally dates contain more than the year. That will affect how you write your helper function for sorted
.
isbnList = ['9781501166761', # Asymmetry, by Lisa Halliday
'9781594205613', # The Idiot by Elif Batuman
'9781573229883', # The Russian Debutante's Handbook by Gary Shteyngart
'9781612197494', # How to do nothing by Jenny Oddell
'9781476746586', # All the lights we cannot see by Anthony Doerr
'9780356508191', # The fifth season by N.K. Jemisin
]
# Your code here
def byYear(bookDct):
"""Helper function for sorted."""
return int(bookDct['publish_date'][-4:]) # the last four characters belong to the year
def get_publication_date(listISBN):
"""For a give list of ISBN values, get title and first sentence.
"""
results = [] # to store the results
for isbnVal in isbnList:
book = get_entity_JSON(f"/isbn/{isbnVal}")
title = book.get('title', '')
date = book.get('publish_date', '')
results.append({'title': title,
'publish_date': date
})
return sorted(results, key=byYear)
# test the function, it may require a few seconds to get all results
get_publication_date(isbnList)
Write the function, isBookAuthorAlive
, which, given a list of book ISBNs, will print out whether the book author is alive or deceased. For example, given the list of ISBNs in the cell below, we should get the following printout:
Ursula K. Le Guin is deceased.
Neal Stephenson is alive.
Octavia E. Butler is deceased.
Connie Willis is alive.
Kurt Vonnegut is deceased.
Couldn't find author's ID for book '9781594205613'.
Occasionally, an author's information might not be available, so the code should take that into account.
Challenge
Students who want to challenge themselves can try to also generate this more detailed printout.
The author of 'The left hand of darkness', Ursula K. Le Guin, died at the age of 89.
The author of 'Seveneves', Neal Stephenson, is alive.
The author of 'Parable of the sower', Octavia E. Butler, died at the age of 59.
The author of 'Doomsday book', Connie Willis, is alive.
The author of 'Slaughterhouse-five, or, The children's crusade', Kurt Vonnegut, died at the age of 85.
Couldn't find author's ID for book '9781594205613'.
Advice
Don't write the function right away, first make sure sure you can get the field 'death_year' for each author and write a message based on its existence. Once you have figured out the code, you can package it as a function.
The reason for not writing the function right away is that you cannot access the local variables for inspection outside the body of the function.
isbnList = ['0802713025', # The left hand of darkness by Ursula LeGuin
'1469246864', # Seveneves by Neal Stephenson
'0941423999', # Parable of the sower by Octavia Butler
'0553081314', # Doomsday book by Connie Willis
'0385312083', # Slaughterhouse-five by Kurt Vonnegut
'9781594205613' # The Idiot by Elif Batuman
]
# Solution nr. 1 (show simple message)
# Your code here
def isBookAuthorAlive(listISBN):
"""A function that prints out a message about the author of a book.
"""
for isbnVal in listISBN:
book = get_entity_JSON(f"/isbn/{isbnVal}")
try:
authorPath = book['authors'][0]['key']
except:
print(f"Couldn't find author's ID for book '{isbnVal}'.")
authorPath = ''
if authorPath:
author = get_entity_JSON(authorPath)
if author.get('death_date',''):
print(f"{author.get('name','')} is deceased.")
else:
print(f"{author.get('name','')} is alive.")
isBookAuthorAlive(isbnList)
# Solution nr. 2 (show detailed message)
# Your code here
def calculate_age(author):
"""Helper function to calculate the age of an author.
"""
birth = author.get('birth_date', '')
death = author.get('death_date', '')
if birth and death:
age = int(death[-4:]) - int(birth[-4:])
return age
return None
def isBookAuthorAlive(listISBN):
"""A function that prints out a detailed message about the author of a book.
"""
for isbnVal in listISBN:
book = get_entity_JSON(f"/isbn/{isbnVal}")
title = book.get('title', '')
# It's good to use try/except to avoid problems with missing keys
try:
authorPath = book['authors'][0]['key']
except:
print(f"Couldn't find author's ID for book '{isbnVal}'.")
authorPath = ''
if authorPath:
author = get_entity_JSON(authorPath)
# Generate first half of the message to print out
toPrint1 = f"The author of '{title}', {author.get('name','')},"
# Generate the second part of the message to print out
if author.get('death_date',''):
age = calculate_age(author)
toPrint2 = f" died at the age of {age}."
else:
toPrint2 = " is alive."
print(toPrint1+toPrint2)
isBookAuthorAlive(isbnList)
This is the end of this notebook!