Lab 7: Part 3. Practice sorting lists of tuples

In your lab07 folder, open the file called babySorting.py. At the top of this file, there is a predefined variable called office that is a list of tuples.

office = [('Michael','Scott',45),
          ('Dwight','Schrute',50),
          ('Angela','Martin',25),
          ('Jim','Halpert',27),
          ('Pam','Beesly',25),
          ('Phyllis','Lapin',60)]

The following are short exercises that take a few lines of code, with a definition for a key function plus a call to sorted. The goal is to get you more comfortable with sorting before moving onto more complex data.

Task 1a: Basic sorting

Run the file, and then, in the babySorting.py file under Task 1a, write a line of code to print the office sorted by first name.

The printed list should have this same tuple order:

[('Angela', 'Martin', 25), ('Dwight', 'Schrute', 50), ('Jim', 'Halpert', 27), 
 ('Michael', 'Scott', 45), ('Pam', 'Beesly', 25), ('Phyllis', 'Lapin', 60)]

Task 1b: Sort by last name

Under Task 1b, write code to sort the office by last name. Hint: you will need to write a helper function that will return the relevant part of the tuple.

The printed list should have this same tuple order:

[('Pam', 'Beesly', 25), ('Jim', 'Halpert', 27), ('Phyllis', 'Lapin', 60), 
 ('Angela', 'Martin', 25), ('Dwight', 'Schrute', 50), ('Michael', 'Scott', 45)]

Task 1c: Reverse sort by last name

Under Task 1c, write one line of code to sort the office by last name in reverse alphabetical order.

The printed list should have this same tuple order:

[('Michael', 'Scott', 45), ('Dwight', 'Schrute', 50), ('Angela', 'Martin', 25), 
 ('Phyllis', 'Lapin', 60), ('Jim', 'Halpert', 27), ('Pam', 'Beesly', 25)]

Task 1d: Sort by more than one thing

Under Task 1d, sort the office first by age and then by last name (e.g., Pam and Angela are the same age, but Pam should come before Angela because "Beesly" comes before "Martin"). Hint: you will need a helper function that returns a tuple.

The printed list should have this same tuple order:

[('Pam', 'Beesly', 25), ('Angela', 'Martin', 25), ('Jim', 'Halpert', 27), 
 ('Michael', 'Scott', 45), ('Dwight', 'Schrute', 50), ('Phyllis', 'Lapin', 60)]

Sorting with more complex Yelp data

This part of the lab has a video walkthrough. If you get stuck or want extra context, you can watch the video below, or click the download link to download it and watch it later. The full video description on YouTube includes a breakdown of all of the topics in the video with links to each one.

Download Video | Download Captions

Now you're ready to work with more complex (real world) data. Inside your lab07 folder, there is a file called smallYelpSample.py that contains a variable with the same name. This is a list of 39 tuples, and each tuple represents a business. [Note: this is a subset of actual Yelp data, which we will revisit again later this semester].

Understand the structure of the Yelp data

Here are two sample tuples:

# bizName     city  state   numRatings stars  list_of_categories
('Starbucks', 'Toronto', 'ON', 21, 4.0, ['Food', 'Coffee & Tea']),
('Panera Bread', 'Elyria', 'OH', 4, 2.0, ['Soup', 'Salad', 'Sandwiches', 'Restaurants'])

The Starbucks, for example, in Toronto, Ontario, has 21 user ratings, and an average rating of 4.0 stars. Starbucks has two categories, Food and Coffee & Tea. Similar remarks hold true for Panera in Elyria, OH.

Open the provided starter called sorting.py. Add your usual comments at the top of the file, and note this line so you can access the Yelp data:

from smallYelpSample import smallYelpSample

Task 0: Write a function called sortDataByName to sort by business name

Here is a sample test:

sortedByName = sortDataByName(smallYelpSample)
print(sortedByName[:5]) # just printing first 5

produces (scroll right to see all data):

[("Alfredo's Jewelry", 'Las Vegas', 'NV', 23, 4.5, ['Shopping', 'Jewelry', 'Watch Repair', 'Local Services']),
('Alize Catering', 'Toronto', 'ON', 12, 3.0, ['Italian', 'French', 'Restaurants']), 
('Any Given Sundae', 'Wexford', 'PA', 15, 5.0, ['Coffee & Tea', 'Ice Cream & Frozen Yogurt', 'Food']), 
('BDJ Realty', 'Las Vegas', 'NV', 5, 4.0, ['Real Estate Services', 'Real Estate', 'Home Services', 'Property Management']), 
('Bampot House of Tea & Board Games', 'Toronto', 'ON', 55, 4.0, ['Coffee & Tea', 'Restaurants', 'Food', 'Mediterranean', 'Tea Rooms'])]

Task 1: What are the highest (and lowest) rated businesses?

Task 1a: Write howManyStars to get the rating for any business

howManyStars is a fruitful function that, given a single business tuple, will return the star rating from that tuple.

For example,

howManyStars(('Starbucks', 'Toronto', 'ON', 21, 4.0, ['Food', 'Coffee & Tea']))

returns 4.0

and

howManyStars(('Panera Bread', 'Elyria', 'OH', 4, 2.0, ['Soup', 'Salad', 'Sandwiches', 'Restaurants']))

returns 2.0

Task 1b: Write sortDataByStars

This function will return the list of all 39 businesses, sorted by the star ranking, from high stars to low stars. sortDataByStars should use howManyStars from Task 1a above. Here is a sample test:

sortedByStars = sortDataByStars(smallYelpSample)
print(sortedByStars[:5]) # Best 5 businesses

Five highest rated businesses (can scroll right to view all data):

# top 5
[('Pampered Hair Passionate about Hair', 'Henderson', 'NV', 3, 5.0, ['Hair Salons', 'Blow Dry/Out Services', 'Hair Stylists', 'Beauty & Spas', 'Hair Extensions']), 
('Kool Pool Care & Repair', 'Phoenix', 'AZ', 5, 5.0, ['Home Services', 'Contractors', 'Pool & Hot Tub Service', 'Pool Cleaners']), 
("Senior's Barber Shop", 'Goodyear', 'AZ', 65, 5.0, ['Barbers', 'Beauty & Spas']), 
('Any Given Sundae', 'Wexford', 'PA', 15, 5.0, ['Coffee & Tea', 'Ice Cream & Frozen Yogurt', 'Food']), 
('Olsen Firearms', 'Cave Creek', 'AZ', 9, 5.0, ['Shopping', 'Guns & Ammo'])]
sortedByStars = sortDataByStars(smallYelpSample)
print(sortedByStars[-5:]) # Worst 5 businesses

Five lowest rated businesses (can scroll right to view all data):

[('Ciao Baby Catering', 'Scottsdale', 'AZ', 5, 2.0, ['Event Planning & Services', 'Caterers']), 
('TSA Checkpoint T-4 A - Phoenix Sky Harbor International Airport', 'Phoenix', 'AZ', 46, 1.5, ['Public Services & Government']), 
('Soccer Zone', 'Las Vegas', 'NV', 9, 1.5, ['Shopping', 'Sporting Goods']), 
('Western Motor Vehicle', 'Phoenix', 'AZ', 18, 1.5, ['Departments of Motor Vehicles', 'Public Services & Government']), 
("McDonald's", 'Phoenix', 'AZ', 10, 1.0, ['Fast Food', 'Burgers', 'Restaurants'])]

Task 2: Write sortByNumberCategories

sortByNumberCategories sorts the list of tuples by the number of categories that a given business has. Those businesses with more categories should be at the top of the list. Note that the Brick House Tavern + Tap has 7 categories: ['American (New)', 'Nightlife', 'Bars', 'Sandwiches', 'American (Traditional)', 'Burgers', 'Restaurants'].

print(sortDataByNumberCategories(smallYelpSample)[:5]) # the top five
[('Brick House Tavern + Tap', 'Cuyahoga Falls', 'OH', 116, 3.5, ['American (New)', 'Nightlife', 'Bars', 'Sandwiches', 'American (Traditional)', 'Burgers', 'Restaurants']), 
('Stephen Szabo Salon', 'McMurray', 'PA', 11, 3.0, ['Hair Stylists', 'Hair Salons', "Men's Hair Salons", 'Blow Dry/Out Services', 'Hair Extensions', 'Beauty & Spas']),
('Chula Taberna Mexicana', 'Toronto', 'ON', 39, 3.5, ['Tiki Bars', 'Nightlife', 'Mexican', 'Restaurants', 'Bars']), 
('William Jon Salon & Spa', 'Madison', 'WI', 14, 4.5, ['Tanning', 'Day Spas', 'Spray Tanning', 'Beauty & Spas', 'Hair Salons']), 
('Pampered Hair Passionate about Hair', 'Henderson', 'NV', 3, 5.0, ['Hair Salons', 'Blow Dry/Out Services', 'Hair Stylists', 'Beauty & Spas', 'Hair Extensions'])]

Task 3: Write sortByStateCityName

sortByStateCityName sorts the list of tuples by the state, and then by the city, and then by the business name. Use a helper function. Note that in the example below, the first 10 businesses are all in AZ, since we sorted by state first. Within AZ, the businesses are alphabetically sorted by city. Note that the 5 businesses that are in Phoenix, AZ are sorted by business name (Charr, Kool Pool, McDonald's, TSA Checkpoint, Western Motor Vehicle).

print(sortByStateCityName(smallYelpSample)[:10]) # the top 10

produces this list:

This is just an image with relevant parts highlighted:


And this is the actual data (horizontally scrollable):

[('Olsen Firearms', 'Cave Creek', 'AZ', 9, 5.0, ['Shopping', 'Guns & Ammo']), 
('CubeSmart Self Storage', 'Chandler', 'AZ', 23, 5.0, ['Local Services', 'Self Storage']),
("Senior's Barber Shop", 'Goodyear', 'AZ', 65, 5.0, ['Barbers', 'Beauty & Spas']), 
('T & Y Nail Spa', 'Peoria', 'AZ', 20, 3.0, ['Beauty & Spas', 'Nail Salons']), 
('Charr An American Burger Bar', 'Phoenix', 'AZ', 232, 3.0, ['Burgers', 'Restaurants']),
('Kool Pool Care & Repair', 'Phoenix', 'AZ', 5, 5.0, ['Home Services', 'Contractors', 'Pool & Hot Tub Service', 'Pool Cleaners']), 
("McDonald's", 'Phoenix', 'AZ', 10, 1.0, ['Fast Food', 'Burgers', 'Restaurants']), 
('TSA Checkpoint T-4 A - Phoenix Sky Harbor International Airport', 'Phoenix', 'AZ', 46, 1.5, ['Public Services & Government']), 
('Western Motor Vehicle', 'Phoenix', 'AZ', 18, 1.5, ['Departments of Motor Vehicles', 'Public Services & Government']), 
('Ciao Baby Catering', 'Scottsdale', 'AZ', 5, 2.0, ['Event Planning & Services', 'Caterers'])]

Table of Contents