Problem Set 8 - Due Tue, April 28, 2020 at 23:59pm

Reading material and Resources

Reading

  1. Slides and notebooks from:
  2. Lab 10 on Dictionaries
  3. Think Python, Ch. 11: Dictionaries
  4. Python documentation on dictionary operations

About this Problem Set

This problem set will give you practice with dictionaries, which are an important data structure, and also some practice with sorting and using .format to print. There is also a challenge problem if you want more dictionary practice.

The challenge task is entirely optional; there is only one required task for this problem set.

If you want to find a partner to work with (on any or all parts of the pset), you can use this Piazza post.

Other notes:

All code for this assignment is available in the ps08 folder in the cs111/download directory within your cs server account.


Task 0: Upload a photo you've taken

Upload a photo that you have taken since Wellesley moved to remote learning to this shared folder. Your photo can be anything (well, PG-13 anything). We're going to assemble them into a collective cs111 quilt. Instructors (and tutors) have uploaded theirs to the folder already.


Task 1: Yelp Data Analysis

The task 1 rubric shows how this task will be graded, and can be used as a checklist to make sure you are done with the task (but remember that all we require now is an honest attempt).

Background

The data we are working in this pset is actual Yelp data from 11 different metropolitan areas across 4 countries. Yelp data is stored in a dictionary (the Yelp dictionaries provided for you have been slightly modified from their original format to be simpler).

You are provided with two different dictionaries, named smallBusinessDict and mediumBusinessDict with 39 and 159 Yelp businesses, respectively. Those dictionaries are stored in files with the same name in your ps08 folder.

Open up yelp.py in your ps08 folder. This file imports the two dictionary variables that you will use for testing your code.




Data Analysis

General Overview

The smallBusinessDict file stores a dictionary, with business ids (long strings of mixed numbers and letters) as keys and a dictionary for the business. Each business dictionary has the following keys: name, stars, review_count, address, city, state, categories. You can open the files and view them in Canopy.

Here is one example of a key:value pair (and a screenshot of the restaurant on Yelp):

'o1fTwfqN0sDFNpV1CkOPPg': {'address': '4032 Butler St',
                           'categories': ['Restaurants', 'Sandwiches', 'Coffee & Tea','Food'],
                           'city': 'Pittsburgh',
                           'name': 'Crazy Mocha Coffee'",
                           'review_count': 16,
                           'stars': 3.5,
                           'state': 'PA'}

Tip: scroll right to see all the code above

Take a minute to get familiar with the contents of the data in the original Yelp dictionary. What are the keys and values? Note that Canopy prints the key:value pairs in alphabetical order, but within the dictionary, there is no order.

High Level Overview: what will you do in this problem set?

Bottom line: you will use real-world Yelp data to display sorted, category-based business list ranked by categories and star ratings and review counts. Note there is a dependency upon earlier tasks working correctly. Specifically, Task 3 depends on the result from Task 2, Task 4 depends on the result from Task 3, and so on.

Getting started

You can print out all the names of the businesses in the provided dictionaries by filling in the blanks in the code snippet below (this code snippet uses smallBusinessDict; you could do something similar with mediumBusinessDict). This is just an exercise to get you in Dictionary mode; this code snippet is not part of your pset. If you get stuck here, go back and review the Dictionary Lectures and Lab and then come back here.

for biz in smallBusinessDict:
    print smallBusinessDict[______][______]
McDonald's
Panera Bread
Star Nursery
Chula Taberna Mexicana
Kool Pool Care & Repair
Alize Catering
Ciao Baby Catering
Bubbly Nails
Showmars Government Center
T & Y Nail Spa
Bampot House of Tea & Board Games
Soccer Zone
Olsen Firearms
Starbucks
Maxim Bakery & Restaurant
Detailing Gone Mobile
T & T Bakery and Cafe
Brick House Tavern + Tap
Toast Cafe
Charr An American Burger Bar
CubeSmart Self Storage
BDJ Realty
TCBY
William Jon Salon & Spa
Sunnyside Grill
Alfredo's Jewelry
DAVIDsTEA
Rock of Ages
TSA Checkpoint T-4 A - Phoenix Sky Harbor International Airport
Messina
Stephen Szabo Salon
Senior's Barber Shop
Brewster's Pub
Sports Authority
Sportster's
Pampered Hair Passionate about Hair
Western Motor Vehicle
Any Given Sundae
Carrabba's Italian Grill

Note:

Task 1: Build a locations:count dictionary from a given Yelp dictionary

Write a function called makeLocationDict that takes a Yelp business dictionary and returns a dictionary where each key is a city in the business dictionary, and the value is the count that represents the number of businesses that are located in that city.

Examples

In []: smallCityDict = makeLocationDict(smallBusinessDict)

In []: len(smallBusinessDict)
Out[]: 39

In []: len(smallCityDict)
Out[]: 23

Below, we'll print the contents of the smallCityDict. Dictionaries are not ordered, however, when Canopy prints a dictionary, it prints the keys in alphabetical order.

In []: smallCityDict
Out[]: 
{'Cave Creek': 1,
 'Chandler': 1,
 'Charlotte': 1,
 'Cuyahoga Falls': 1,
 'Davidson': 1,
 'Elyria': 1,
 'Fort Mill': 2,
 'Frazer': 1,
 'Goodyear': 1,
 'Henderson': 2,
 'Las Vegas': 5,
 'Madison': 1,
 'Markham': 1,
 'McMurray': 1,
 'Munroe Falls': 1,
 'Peoria': 1,
 'Phoenix': 5,
 'Richmond Hill': 1,
 'Scottsdale': 1,
 'Stuttgart': 1,
 'Tempe': 1,
 'Toronto': 7,
 'Wexford': 1}

Testing hint: when you invoke makeLocationDict with mediumBusinessDict, there should be 30 businesses in Las Vegas, 16 in Toronto and 1 in Munroe Falls.

Task 2: Make a list of business tuples from the dictionary

In this task, your goal is to write a function called makeTupleList that takes the dictionary of Yelp businesses (smallBusinessDict or mediumBusinessDict) and returns a list of tuples, one tuple per key:value pair in the dictionary.

Note that only certain parts of the business dictionary are extracted into the tuple. Here are two examples.

# Example 1: Starbucks key:value pair from smallBusinessDict
'lHYiCS-y8AFjUitv6MGpxg': {'address': '85 Hanna Avenue',
  'categories': ['Food', 'Coffee & Tea'],
  'city': 'Toronto',
  'name': 'Starbucks',
  'review_count': 21,
  'stars': 4.0,
  'state': 'ON'}
# Example 1: corresponding Starbucks tuple
#    name         city     state revs stars  categories
('Starbucks', 'Toronto', 'ON', 21, 4.0, ['Food', 'Coffee & Tea'])
# Example 2: Panera key:value pair from smallBusinessDict
 'Dj0S-Oe4ytRJzMGUPgYUkw': {'address': '38295 Chestnut Ridge Rd',
  'categories': ['Soup', 'Salad', 'Sandwiches', 'Restaurants'],
  'city': 'Elyria',
  'name': 'Panera Bread',
  'review_count': 4,
  'stars': 2.0,
  'state': 'OH'},

Tip: scroll right to see all the data

# Example 2: corresponding Panera tuple
#    name         city     state revs stars  categories
('Panera Bread', 'Elyria', 'OH', 4, 2.0, ['Soup', 'Salad', 'Sandwiches', 'Restaurants'])

Examples

# Sample invocation
In[ ]: smallTupleList = makeTupleList(smallBusinessDict)
In[ ]: smallTupleList
Out[]: [("McDonald's",
  'Phoenix',
  'AZ',
  10,
  1.0,
  ['Fast Food', 'Burgers', 'Restaurants']),
 ('Star Nursery',
  'Las Vegas',
  'NV',
  25,
  3.5,
  ['Nurseries & Gardening', 'Home & Garden', 'Shopping']),
 ('Chula Taberna Mexicana',
  'Toronto',
  'ON',
  39,
  3.5,
  ['Tiki Bars', 'Nightlife', 'Mexican', 'Restaurants', 'Bars']),
 ('Kool Pool Care & Repair',
  'Phoenix',
  'AZ',
  5,
  5.0,
  ['Home Services',
   'Contractors',
   'Pool & Hot Tub Service',
   'Pool Cleaners']),
   ...
   ]

Task 3: Group the businesses by category

It would be useful to organize the businesses by category, for example, maybe you feel like Brazilian food tonight, or you need a coffee fix now. In this task, you will write a function called groupBusinessesByCategories that will take the list of tuples generated in Task 2 above (smallTupleList) and return a new dictionary. In this new dictionary, the keys are the categories and the values are the business tuples. If a particular restaurant has multiple categories, then the tuple for that restaurant should appear as part of the list for each of those categories. For example, in the categories shown below, note that Any Given Sundae appears in the list of values for Coffee & Tea and also in the list of values for Ice Cream & Frozen Yogurt (as well as one other category that is not shown below).

Hint: do not hard-code the categories in advance, rather, if you encounter a category that is not already in your dictionary, create a new key:value pair and proceed.

For example, here is a key:value pair with the key Coffee & Tea. The value is a list of tuples. Turns out there are 5 businesses that fall under that Coffee & Tea category. Each tuple contains the star rating, the number of reviews, the name, the city and the state, in that exact order.

'Coffee & Tea': [ (4.0, 55, 'Bampot House of Tea & Board Games', 'Toronto', 'ON'),
                  (4.0, 21, 'Starbucks', 'Toronto', 'ON'),
                  (3.5, 6, 'Toast Cafe', 'Fort Mill', 'SC'),
                  (4.0, 6, 'DAVIDsTEA', 'Toronto', 'ON'),
                  (5.0, 15, 'Any Given Sundae', 'Wexford', 'PA')]

Here is an example with the key Ice Cream & Frozen Yogurt. Note that Any Given Sundae also appears in this list of businesses (it appears in the list above for Coffee & Tea as well).

'Ice Cream & Frozen Yogurt': [(5.0, 15, 'Any Given Sundae', 'Wexford', 'PA'),
                              (3.5, 3, 'TCBY', 'Davidson', 'NC')]

Here's another example with the key Nightlife and the list of tuples contains 4 businesses:

'Nightlife': [ (3.5, 116, 'Brick House Tavern + Tap', 'Cuyahoga Falls', 'OH'),
               (3.5, 39, 'Chula Taberna Mexicana', 'Toronto', 'ON'),
               (4.0, 4, "Brewster's Pub", 'Munroe Falls', 'OH'),
               (2.5, 7, "Sportster's", 'Toronto', 'ON')]

Examples

# Sample invocation
# Remember that Canopy prints alphabetically, but dictionaries are not sorted.
In[ ]: smallCatDict = groupBusinessesByCategories(smallTupleList)
In[ ]: smallCatDict
Out[]: 
{'American (New)': [(3.5, 116, 'Brick House Tavern + Tap', 'Cuyahoga Falls', 'OH')],
 'American (Traditional)': [(3.5, 7, 'Showmars Government Center', 'Charlotte', 'NC'),
  (3.5, 116, 'Brick House Tavern + Tap', 'Cuyahoga Falls', 'OH'),
  (3.5, 6, 'Toast Cafe', 'Fort Mill', 'SC')],
 'Arts & Entertainment': [(4.0, 213, 'Rock of Ages', 'Las Vegas', 'NV')],
 'Auto Detailing': [(5.0, 7, 'Detailing Gone Mobile', 'Henderson', 'NV')],
 'Automotive': [(5.0, 7, 'Detailing Gone Mobile', 'Henderson', 'NV')],
 'Bagels': [(4.0, 38, 'T & T Bakery and Cafe', 'Markham', 'ON')],
 'Bakeries': [(3.5, 34, 'Maxim Bakery & Restaurant', 'Richmond Hill', 'ON'),
  (4.0, 38, 'T & T Bakery and Cafe', 'Markham', 'ON')],
 'Barbers': [(5.0, 65, "Senior's Barber Shop", 'Goodyear', 'AZ')],
 'Bars': [(3.5, 116, 'Brick House Tavern + Tap', 'Cuyahoga Falls', 'OH'),
  (3.5, 39, 'Chula Taberna Mexicana', 'Toronto', 'ON'),
  (4.0, 4, "Brewster's Pub", 'Munroe Falls', 'OH'),
  (2.5, 7, "Sportster's", 'Toronto', 'ON')],
 'Beauty & Spas': [(3.0, 11, 'Stephen Szabo Salon', 'McMurray', 'PA'),
  (4.0, 17, 'Bubbly Nails', 'Fort Mill', 'SC'),
  (4.5, 14, 'William Jon Salon & Spa', 'Madison', 'WI'),
  (5.0, 3, 'Pampered Hair Passionate about Hair', 'Henderson', 'NV'),
  (3.0, 20, 'T & Y Nail Spa', 'Peoria', 'AZ'),
  (5.0, 65, "Senior's Barber Shop", 'Goodyear', 'AZ')],

  ...  # many key:value pairs omitted for conciseness

 'Tiki Bars': [(3.5, 39, 'Chula Taberna Mexicana', 'Toronto', 'ON')],
 'Watch Repair': [(4.5, 23, "Alfredo's Jewelry", 'Las Vegas', 'NV')]
  }

Task 4: Sort the businesses by most popular category (and sort the business tuples by stars)

In Task 3 above, a new dictionary was created (smallCatDict) to store the businesses by category, but there is no order in the dictionary. In this task, you will write a function called catDictToSortedTuples that will take the dictionary from Task 3 and returns list of tuples. This list of tuples will be sorted so that the most popular category (i.e., the category with the most businesses) is at the top of the list, and those categories with just one business are at the bottom of the list. If the number of categories is the same, then the tuples should be sorted by the category name.

Note: You'll need to write a helper function to sort the tuples into the desired order. There is a template for this helper function sortByCountCategoryName inside the starter yelp.py file. It's important that you do not change the name of this helper function.

If you look at one category, for example Restaurants, note that the list of 13 tuples that fall under that category are sorted by ratings (so the highest star rating is listed first). Your function will return a list of tuples, and within each category tuple, the lists of business tuples must be sorted from highest star rating (Sunnyside Grill) to lowest star rating (McDonald's), as shown below.

Examples

# Sample invocation
smallSortedTuples = catDictToSortedTuples(smallCatDict)

The format of each category tuple looks like this:

# (CATEGORY_NAME, NUM_BUSINESSES_IN_CATEGORY, LIST_OF_BUSINESS_TUPLES )
(('Restaurants', 13,  [(5.0, 3, 'Sunnyside Grill', 'Toronto', 'ON'),
                        (4.0, 55, 'Bampot House of Tea & Board Games', 'Toronto', 'ON'),
                        (4.0, 5, 'Messina', 'Stuttgart', 'BW'),
             ...
                         ])

and each business tuple looks like this:

#   ( STARS, NUM_REVIEWS,         NAME,        CITY,        STATE )
        (5.0,       3,       'Sunnyside Grill',   'Toronto',  'ON')

The list of sorted tuples should look like this:

[('Restaurants',
  13,
  [(5.0, 3, 'Sunnyside Grill', 'Toronto', 'ON'),
   (4.0, 55, 'Bampot House of Tea & Board Games', 'Toronto', 'ON'),
   (4.0, 5, 'Messina', 'Stuttgart', 'BW'),
   (3.5, 116, 'Brick House Tavern + Tap', 'Cuyahoga Falls', 'OH'),
   (3.5, 39, 'Chula Taberna Mexicana', 'Toronto', 'ON'),
   (3.5, 34, 'Maxim Bakery & Restaurant', 'Richmond Hill', 'ON'),
   (3.5, 25, "Carrabba's Italian Grill", 'Frazer', 'PA'),
   (3.5, 7, 'Showmars Government Center', 'Charlotte', 'NC'),
   (3.5, 6, 'Toast Cafe', 'Fort Mill', 'SC'),
   (3.0, 232, 'Charr An American Burger Bar', 'Phoenix', 'AZ'),
   (3.0, 12, 'Alize Catering', 'Toronto', 'ON'),
   (2.0, 4, 'Panera Bread', 'Elyria', 'OH'),
   (1.0, 10, "McDonald's", 'Phoenix', 'AZ')]),
 ('Food',
  8,
  [(5.0, 15, 'Any Given Sundae', 'Wexford', 'PA'),
   (4.0, 55, 'Bampot House of Tea & Board Games', 'Toronto', 'ON'),
   (4.0, 38, 'T & T Bakery and Cafe', 'Markham', 'ON'),
   (4.0, 21, 'Starbucks', 'Toronto', 'ON'),
   (4.0, 6, 'DAVIDsTEA', 'Toronto', 'ON'),
   (3.5, 34, 'Maxim Bakery & Restaurant', 'Richmond Hill', 'ON'),
   (3.5, 6, 'Toast Cafe', 'Fort Mill', 'SC'),
   (3.5, 3, 'TCBY', 'Davidson', 'NC')]),
 ('Beauty & Spas',
  6,
  [(5.0, 65, "Senior's Barber Shop", 'Goodyear', 'AZ'),
   (5.0, 3, 'Pampered Hair Passionate about Hair', 'Henderson', 'NV'),
   (4.5, 14, 'William Jon Salon & Spa', 'Madison', 'WI'),
   (4.0, 17, 'Bubbly Nails', 'Fort Mill', 'SC'),
   (3.0, 20, 'T & Y Nail Spa', 'Peoria', 'AZ'),
   (3.0, 11, 'Stephen Szabo Salon', 'McMurray', 'PA')]),

   ...

 ('Tanning', 1, [(4.5, 14, 'William Jon Salon & Spa', 'Madison', 'WI')]),
 ('Tiki Bars', 1, [(3.5, 39, 'Chula Taberna Mexicana', 'Toronto', 'ON')]),
 ('Watch Repair', 1, [(4.5, 23, "Alfredo's Jewelry", 'Las Vegas', 'NV')])]
 ]

Task 5: Print the sorted tuples in an easy-to-read format

In this task, you will write a function called printBusinesses that will take the sorted tuples generated in Task 4 above and print them as shown below. You'll need to use .format to format your print statements. Note that the number of reviews appears in parentheses (). Your output should match ours exactly.

For reference, from Lecture 17, here is a video (4:32) on .format and here are the relevant slides.

Examples

# Sample invocation #1:
In []: printBusinesses(smallSortedTuples)
Businesses sorted by number and rating:

 Category: Restaurants contains 13 businesses 
     Rating: 5.0 (3) Sunnyside Grill in Toronto, ON
     Rating: 4.0 (55) Bampot House of Tea & Board Games in Toronto, ON
     Rating: 4.0 (5) Messina in Stuttgart, BW
     Rating: 3.5 (116) Brick House Tavern + Tap in Cuyahoga Falls, OH
     Rating: 3.5 (39) Chula Taberna Mexicana in Toronto, ON
     Rating: 3.5 (34) Maxim Bakery & Restaurant in Richmond Hill, ON
     Rating: 3.5 (25) Carrabba's Italian Grill in Frazer, PA
     Rating: 3.5 (7) Showmars Government Center in Charlotte, NC
     Rating: 3.5 (6) Toast Cafe in Fort Mill, SC
     Rating: 3.0 (232) Charr An American Burger Bar in Phoenix, AZ
     Rating: 3.0 (12) Alize Catering in Toronto, ON
     Rating: 2.0 (4) Panera Bread in Elyria, OH
     Rating: 1.0 (10) McDonald's in Phoenix, AZ

 Category: Food contains 8 businesses 
     Rating: 5.0 (15) Any Given Sundae in Wexford, PA
     Rating: 4.0 (55) Bampot House of Tea & Board Games in Toronto, ON
     Rating: 4.0 (38) T & T Bakery and Cafe in Markham, ON
     Rating: 4.0 (21) Starbucks in Toronto, ON
     Rating: 4.0 (6) DAVIDsTEA in Toronto, ON
     Rating: 3.5 (34) Maxim Bakery & Restaurant in Richmond Hill, ON
     Rating: 3.5 (6) Toast Cafe in Fort Mill, SC
     Rating: 3.5 (3) TCBY in Davidson, NC

 Category: Beauty & Spas contains 6 businesses 
     Rating: 5.0 (65) Senior's Barber Shop in Goodyear, AZ
     Rating: 5.0 (3) Pampered Hair Passionate about Hair in Henderson, NV
     Rating: 4.5 (14) William Jon Salon & Spa in Madison, WI
     Rating: 4.0 (17) Bubbly Nails in Fort Mill, SC
     Rating: 3.0 (20) T & Y Nail Spa in Peoria, AZ
     Rating: 3.0 (11) Stephen Szabo Salon in McMurray, PA

 Category: Coffee & Tea contains 5 businesses 
     Rating: 5.0 (15) Any Given Sundae in Wexford, PA
     Rating: 4.0 (55) Bampot House of Tea & Board Games in Toronto, ON
     Rating: 4.0 (21) Starbucks in Toronto, ON
     Rating: 4.0 (6) DAVIDsTEA in Toronto, ON
     Rating: 3.5 (6) Toast Cafe in Fort Mill, SC

 Category: Shopping contains 5 businesses 
     Rating: 5.0 (9) Olsen Firearms in Cave Creek, AZ
     Rating: 4.5 (23) Alfredo's Jewelry in Las Vegas, NV
     Rating: 3.5 (25) Star Nursery in Las Vegas, NV
     Rating: 3.0 (9) Sports Authority in Tempe, AZ
     Rating: 1.5 (9) Soccer Zone in Las Vegas, NV

 Category: Bars contains 4 businesses 
     Rating: 4.0 (4) Brewster's Pub in Munroe Falls, OH
     Rating: 3.5 (116) Brick House Tavern + Tap in Cuyahoga Falls, OH
     Rating: 3.5 (39) Chula Taberna Mexicana in Toronto, ON
     Rating: 2.5 (7) Sportster's in Toronto, ON

 Category: Nightlife contains 4 businesses 
     Rating: 4.0 (4) Brewster's Pub in Munroe Falls, OH
     Rating: 3.5 (116) Brick House Tavern + Tap in Cuyahoga Falls, OH
     Rating: 3.5 (39) Chula Taberna Mexicana in Toronto, ON
     Rating: 2.5 (7) Sportster's in Toronto, ON

 Category: American (Traditional) contains 3 businesses 
     Rating: 3.5 (116) Brick House Tavern + Tap in Cuyahoga Falls, OH
     Rating: 3.5 (7) Showmars Government Center in Charlotte, NC
     Rating: 3.5 (6) Toast Cafe in Fort Mill, SC

 Category: Burgers contains 3 businesses 
     Rating: 3.5 (116) Brick House Tavern + Tap in Cuyahoga Falls, OH
     Rating: 3.0 (232) Charr An American Burger Bar in Phoenix, AZ
     Rating: 1.0 (10) McDonald's in Phoenix, AZ

 Category: Hair Salons contains 3 businesses 
     Rating: 5.0 (3) Pampered Hair Passionate about Hair in Henderson, NV
     Rating: 4.5 (14) William Jon Salon & Spa in Madison, WI
     Rating: 3.0 (11) Stephen Szabo Salon in McMurray, PA

 Category: Italian contains 3 businesses 
     Rating: 4.0 (5) Messina in Stuttgart, BW
     Rating: 3.5 (25) Carrabba's Italian Grill in Frazer, PA
     Rating: 3.0 (12) Alize Catering in Toronto, ON

 Category: Bakeries contains 2 businesses 
     Rating: 4.0 (38) T & T Bakery and Cafe in Markham, ON
     Rating: 3.5 (34) Maxim Bakery & Restaurant in Richmond Hill, ON

 Category: Blow Dry/Out Services contains 2 businesses 
     Rating: 5.0 (3) Pampered Hair Passionate about Hair in Henderson, NV
     Rating: 3.0 (11) Stephen Szabo Salon in McMurray, PA

 Category: French contains 2 businesses 
     Rating: 3.5 (34) Maxim Bakery & Restaurant in Richmond Hill, ON
     Rating: 3.0 (12) Alize Catering in Toronto, ON

 Category: Hair Extensions contains 2 businesses 
     Rating: 5.0 (3) Pampered Hair Passionate about Hair in Henderson, NV
     Rating: 3.0 (11) Stephen Szabo Salon in McMurray, PA

 Category: Hair Stylists contains 2 businesses 
     Rating: 5.0 (3) Pampered Hair Passionate about Hair in Henderson, NV
     Rating: 3.0 (11) Stephen Szabo Salon in McMurray, PA

 Category: Home Services contains 2 businesses 
     Rating: 5.0 (5) Kool Pool Care & Repair in Phoenix, AZ
     Rating: 4.0 (5) BDJ Realty in Las Vegas, NV

 Category: Ice Cream & Frozen Yogurt contains 2 businesses 
     Rating: 5.0 (15) Any Given Sundae in Wexford, PA
     Rating: 3.5 (3) TCBY in Davidson, NC

 Category: Local Services contains 2 businesses 
     Rating: 5.0 (23) CubeSmart Self Storage in Chandler, AZ
     Rating: 4.5 (23) Alfredo's Jewelry in Las Vegas, NV

 Category: Nail Salons contains 2 businesses 
     Rating: 4.0 (17) Bubbly Nails in Fort Mill, SC
     Rating: 3.0 (20) T & Y Nail Spa in Peoria, AZ

 Category: Public Services & Government contains 2 businesses 
     Rating: 1.5 (46) TSA Checkpoint T-4 A - Phoenix Sky Harbor International Airport in Phoenix, AZ
     Rating: 1.5 (18) Western Motor Vehicle in Phoenix, AZ

 Category: Sandwiches contains 2 businesses 
     Rating: 3.5 (116) Brick House Tavern + Tap in Cuyahoga Falls, OH
     Rating: 2.0 (4) Panera Bread in Elyria, OH

 Category: Sporting Goods contains 2 businesses 
     Rating: 3.0 (9) Sports Authority in Tempe, AZ
     Rating: 1.5 (9) Soccer Zone in Las Vegas, NV

 Category: Tea Rooms contains 2 businesses 
     Rating: 4.0 (55) Bampot House of Tea & Board Games in Toronto, ON
     Rating: 4.0 (6) DAVIDsTEA in Toronto, ON

 Category: American (New) contains 1 businesses 
     Rating: 3.5 (116) Brick House Tavern + Tap in Cuyahoga Falls, OH

 Category: Arts & Entertainment contains 1 businesses 
     Rating: 4.0 (213) Rock of Ages in Las Vegas, NV

 Category: Auto Detailing contains 1 businesses 
     Rating: 5.0 (7) Detailing Gone Mobile in Henderson, NV

 Category: Automotive contains 1 businesses 
     Rating: 5.0 (7) Detailing Gone Mobile in Henderson, NV

 Category: Bagels contains 1 businesses 
     Rating: 4.0 (38) T & T Bakery and Cafe in Markham, ON

 Category: Barbers contains 1 businesses 
     Rating: 5.0 (65) Senior's Barber Shop in Goodyear, AZ

 Category: Breakfast & Brunch contains 1 businesses 
     Rating: 5.0 (3) Sunnyside Grill in Toronto, ON

 Category: Caterers contains 1 businesses 
     Rating: 2.0 (5) Ciao Baby Catering in Scottsdale, AZ

 Category: Contractors contains 1 businesses 
     Rating: 5.0 (5) Kool Pool Care & Repair in Phoenix, AZ

 Category: Day Spas contains 1 businesses 
     Rating: 4.5 (14) William Jon Salon & Spa in Madison, WI

 Category: Departments of Motor Vehicles contains 1 businesses 
     Rating: 1.5 (18) Western Motor Vehicle in Phoenix, AZ

 Category: Desserts contains 1 businesses 
     Rating: 3.5 (3) TCBY in Davidson, NC

 Category: Event Planning & Services contains 1 businesses 
     Rating: 2.0 (5) Ciao Baby Catering in Scottsdale, AZ

 Category: Fast Food contains 1 businesses 
     Rating: 1.0 (10) McDonald's in Phoenix, AZ

 Category: Guns & Ammo contains 1 businesses 
     Rating: 5.0 (9) Olsen Firearms in Cave Creek, AZ

 Category: Home & Garden contains 1 businesses 
     Rating: 3.5 (25) Star Nursery in Las Vegas, NV

 Category: Jewelry contains 1 businesses 
     Rating: 4.5 (23) Alfredo's Jewelry in Las Vegas, NV

 Category: Mediterranean contains 1 businesses 
     Rating: 4.0 (55) Bampot House of Tea & Board Games in Toronto, ON

 Category: Men's Hair Salons contains 1 businesses 
     Rating: 3.0 (11) Stephen Szabo Salon in McMurray, PA

 Category: Mexican contains 1 businesses 
     Rating: 3.5 (39) Chula Taberna Mexicana in Toronto, ON

 Category: Nurseries & Gardening contains 1 businesses 
     Rating: 3.5 (25) Star Nursery in Las Vegas, NV

 Category: Performing Arts contains 1 businesses 
     Rating: 4.0 (213) Rock of Ages in Las Vegas, NV

 Category: Pool & Hot Tub Service contains 1 businesses 
     Rating: 5.0 (5) Kool Pool Care & Repair in Phoenix, AZ

 Category: Pool Cleaners contains 1 businesses 
     Rating: 5.0 (5) Kool Pool Care & Repair in Phoenix, AZ

 Category: Property Management contains 1 businesses 
     Rating: 4.0 (5) BDJ Realty in Las Vegas, NV

 Category: Pubs contains 1 businesses 
     Rating: 4.0 (4) Brewster's Pub in Munroe Falls, OH

 Category: Real Estate contains 1 businesses 
     Rating: 4.0 (5) BDJ Realty in Las Vegas, NV

 Category: Real Estate Services contains 1 businesses 
     Rating: 4.0 (5) BDJ Realty in Las Vegas, NV

 Category: Salad contains 1 businesses 
     Rating: 2.0 (4) Panera Bread in Elyria, OH

 Category: Seafood contains 1 businesses 
     Rating: 3.5 (25) Carrabba's Italian Grill in Frazer, PA

 Category: Self Storage contains 1 businesses 
     Rating: 5.0 (23) CubeSmart Self Storage in Chandler, AZ

 Category: Soup contains 1 businesses 
     Rating: 2.0 (4) Panera Bread in Elyria, OH

 Category: Sports Bars contains 1 businesses 
     Rating: 2.5 (7) Sportster's in Toronto, ON

 Category: Spray Tanning contains 1 businesses 
     Rating: 4.5 (14) William Jon Salon & Spa in Madison, WI

 Category: Tanning contains 1 businesses 
     Rating: 4.5 (14) William Jon Salon & Spa in Madison, WI

 Category: Tiki Bars contains 1 businesses 
     Rating: 3.5 (39) Chula Taberna Mexicana in Toronto, ON

 Category: Watch Repair contains 1 businesses 
     Rating: 4.5 (23) Alfredo's Jewelry in Las Vegas, NV
# Sample invocation #2:
In []: printBusinesses(mediumSortedTuples)
Businesses sorted by number and rating: 

 Category: Restaurants contains 58 businesses 
     Rating: 5.0 (4) Caviness Studio in Phoenix, AZ
     Rating: 5.0 (3) Sunnyside Grill in Toronto, ON
     Rating: 4.5 (373) Duck Donuts in Charlotte, NC
     Rating: 4.5 (21) Finga Lickin' Caribbean Eatery in Charlotte, NC
     Rating: 4.5 (16) The Bluebird Cafe in Edinburgh, MLN
     Rating: 4.5 (11) Restaurant Lucca in Montreal, QC
     Rating: 4.5 (7) Maki My Way in Toronto, ON
     Rating: 4.5 (3) East Coast Coffee in Houston, PA
     Rating: 4.0 (140) Divine Cafe at the Springs Preserve in Las Vegas, NV
     Rating: 4.0 (55) Bampot House of Tea & Board Games in Toronto, ON
     Rating: 4.0 (37) D'Lish Cafe in Phoenix, AZ
     Rating: 4.0 (34) Villa Tap in Madison, WI
     Rating: 4.0 (13) Flight Deck Bar & Grill in Las Vegas, NV
     Rating: 4.0 (8) Pizza Pizza in Markham, ON
     Rating: 4.0 (5) River Moon Cafe in Pittsburgh, PA
     Rating: 4.0 (5) Messina in Stuttgart, BW
     Rating: 3.5 (263) Tandoori Times Indian Bistro in Scottsdale, AZ
     Rating: 3.5 (116) Brick House Tavern + Tap in Cuyahoga Falls, OH
     Rating: 3.5 (97) Barrio in Cleveland Heights, OH
     Rating: 3.5 (82) Firehouse Subs in Mesa, AZ
     Rating: 3.5 (78) Yuzu in Lakewood, OH
     Rating: 3.5 (76) China Buffet in Charlotte, NC
     Rating: 3.5 (39) Chula Taberna Mexicana in Toronto, ON
     Rating: 3.5 (34) Maxim Bakery & Restaurant in Richmond Hill, ON
     Rating: 3.5 (33) Don Taco in Montreal, QC
     Rating: 3.5 (25) Carrabba's Italian Grill in Frazer, PA
     Rating: 3.5 (16) Crazy Mocha Coffee in Pittsburgh, PA
     Rating: 3.5 (15) Restaurant Plenum in Stuttgart, BW
     Rating: 3.5 (11) Mint Garden in Newmarket, ON
     Rating: 3.5 (7) Showmars Government Center in Charlotte, NC
     Rating: 3.5 (6) Toast Cafe in Fort Mill, SC
     Rating: 3.0 (349) GameWorks in Las Vegas, NV
     Rating: 3.0 (232) Charr An American Burger Bar in Phoenix, AZ
     Rating: 3.0 (88) Fraticelli's Authentic Italian Grill in Richmond Hill, ON
     Rating: 3.0 (80) Kitchen M in Markham, ON
     Rating: 3.0 (34) McDonald's in Phoenix, AZ
     Rating: 3.0 (12) Alize Catering in Toronto, ON
     Rating: 3.0 (11) QDC Burger in Dorval, QC
     Rating: 3.0 (9) Police Station Pizza in Oakdale, PA
     Rating: 3.0 (8) Le Bistro Balmoral in Montreal, QC
     Rating: 3.0 (6) Flavor Cuisine in Toronto, ON
     Rating: 3.0 (5) Thai Express in Toronto, ON
     Rating: 3.0 (5) Simply Burgers in Gilbert, AZ
     Rating: 3.0 (4) La Isla Cuban Restaurant in Charlotte, NC
     Rating: 3.0 (4) Crepe Cafe in Champaign, IL
     Rating: 3.0 (3) Subway in Las Vegas, NV
     Rating: 2.5 (37) Red Lobster in Phoenix, AZ
     Rating: 2.5 (34) Amaya Express in Toronto, ON
     Rating: 2.5 (7) Nesenbach in Stuttgart, BW
     Rating: 2.5 (7) El Dorado's in Kent, OH
     Rating: 2.5 (6) Subway in Las Vegas, NV
     Rating: 2.5 (5) Bruegger's Bagels in Pittsburgh, PA
     Rating: 2.5 (4) Little Caesars Pizza in Mesa, AZ
     Rating: 2.0 (20) McDonald's in Las Vegas, NV
     Rating: 2.0 (6) Mirage Grill & Lounge in Toronto, ON
     Rating: 2.0 (4) Panera Bread in Elyria, OH
     Rating: 1.5 (3) Cafe Mastrioni in Las Vegas, NV
     Rating: 1.0 (10) McDonald's in Phoenix, AZ

 Category: Food contains 29 businesses 
     Rating: 5.0 (15) Any Given Sundae in Wexford, PA
     Rating: 4.5 (373) Duck Donuts in Charlotte, NC
     Rating: 4.5 (76) Donut Tyme in Las Vegas, NV
...

How to turn in this Problem Set

Challenge Task 1

Building on the sorting challenge from last week, we've got a more interesting distant reading problem to work on: what can we learn about gender roles in a novel based on analysis of the text?

One approach to this question is to look at what words authors use after gendered pronouns, like 'he' and 'she'. In particular, the words following these two pronouns are often verbs, like he said or she was. If we can find out which actions are more commonly taken by male or female characters, this gives us insight into how those characters are represented by an author.

There are three parts to this challenge:

  1. Write a findWordsAfter(sentences, target) function which takes a list of sentences (where each sentence is a list of strings) and returns a list of words which occur in those sentences immediately after a target word.
  2. Write a countWords(wordList) function which takes a list of words and builds a dictionary where the value for each word is an integer indicating how many copies of that word exist in the given word list. (Note: this is covered in the lecture notebook).
  3. Write a getNegativeCountThenWord(wordThenCount) function to help with sorting, and then a countWordsAfter(sentences, target) function which puts together the functions from parts 1 and 2 in order to produce a sorted list of (word, count) pairs occurring after the target word in the given list of sentences.

Note that we've given you a textStructure.py file for extracting lists of sentences from .txt files, and a buildStructureFiles.py file which will apply that code to each .txt file in the novels/ directory. Before testing your code, you will need to open buildStructureFiles.py and run it at least once to create the text structure files needed for testing.

If you want, you can add new text files to the novels/ directory and modify the testing code in distantReading.py to analyze other books (assuming you can find a .txt file version of them).

You can follow the docstrings in the starter distantReading.py file to implement the functions listed above, and run that file when you're done to test things. You may want to come up with smaller tests for the findWordsAfter and countWords functions to make sure that they work, but these are not provided for you.

If you get things working, we suggest that you see if you can find any interesting patterns in the results, and if you want to take things further, think of your own distant reading method and write some functions to apply it.

An example of what part of the correct output should look like:

"Dracula"
  6363 sentences of narration and 4060 sentences of dialogue
Top 25 words after 'he' and 'she':
 In dialogue                       | In narration                        
  after 'he'     | after 'she'     |  after 'he'       | after 'she'     
  ('is', 75)     | ('is', 28)      |  ('said', 178)    | ('was', 73)     
  ('has', 53)    | ('was', 17)     |  ('was', 125)     | ('is', 34)      
  ('was', 41)    | ('has', 7)      |  ('had', 106)     | ('said', 33)    
  ('can', 37)    | ('must', 7)     |  ('is', 61)       | ('had', 32)     
  ('had', 31)    | ('may', 6)      |  ('went', 56)     | ('did', 20)     
  ('will', 30)   | ('will', 6)     |  ('would', 50)    | ('has', 16)     
  ('have', 24)   | ('would', 5)    |  ('took', 46)     | ('looked', 16)  
  ('may', 18)    | ('died', 4)     |  ('has', 42)      | ('lay', 12)     
  ('must', 16)   | ('had', 4)      |  ('spoke', 38)    | ('spoke', 10)   
  ('would', 15)  | ('be', 3)       |  ('could', 37)    | ('could', 9)    
  ('be', 13)     | ('die', 3)      |  ('answered', 29) | ('stopped', 9)  
  ('come', 13)   | ('shall', 3)    |  ('did', 29)      | ('will', 9)     
  ('went', 12)   | ('asked', 2)    |  ('looked', 28)   | ('woke', 9)     
  ('cannot', 10) | ('can', 2)      |  ('came', 24)     | ('came', 8)     
  ('did', 10)    | ('cannot', 2)   |  ('saw', 21)      | ('must', 8)     
  ('wanted', 9)  | ('did', 2)      |  ('might', 20)    | ('went', 8)     
  ('could', 8)   | ("doesn't", 2)  |  ('held', 17)     | ('would', 8)    
  ('said', 8)    | ('have', 2)     |  ('replied', 17)  | ('put', 7)      
  ('began', 7)   | ('knew', 2)     |  ('stood', 17)    | ('should', 7)   
  ('go', 7)      | ('know', 2)     |  ('turned', 17)   | ('were', 7)     
  ('shall', 7)   | ('lay', 2)      |  ('began', 16)    | ('made', 6)     
  ('took', 7)    | ('live', 2)     |  ('must', 16)     | ('opened', 6)   
  ('came', 6)    | ('might', 2)    |  ('will', 15)     | ('seemed', 6)   
  ('not', 6)     | ('only', 2)     |  ('seemed', 14)   | ('seems', 6)    
  ("didn't", 5)  | ('seems', 2)    |  ('made', 13)     | ('asked', 5)  

If you look carefully, there are some interesting differences between how men and women are portrayed in "Dracula," especially in the column for narration.