Instructions for payroll

(produced at 22:21 UTC on 2023-12-05)

This task is part of project11 which is due at 23:00 EST on 2023-12-12.

You have the option to work with a partner on this task if you wish. Working with a partner requires more work to coordinate schedules, but if you work together and make sure that you are both understanding the code you write, you will make progress faster and learn more.

You can download the starter code for this task using this link.

You can submit this task using this link.

Put all of your work for this task into the file payroll.py
(you will create this file from scratch)

Overview

Your company "Franchisia" has hired you to handle some formatting for payroll reports from its various branch locations in Wolleslu and the surrounding towns. You could simply do the work by hand in a spreadsheet app, but instead, you decide to write some Python functions to handle things automatically. Since the reports are submitted as CSV files, you decide to use the csv module to process the data.

This example shows how the data is formatted; we have supplied a number of directories full of such CSV files for testing purposes.

In a new file named payroll.py, write each of the functions described below.

NOTE: Throughout your code, when using open, you should add newline='' as an argument if you are going to use csv.DictReader or csv.DictWriter to work with the file contents (which should be every time you use open). If you forget to do this, in some cases your output files may appear double-spaced.

columnAverage

Your boss is always asking you to tell them the averages for specific branch locations, like the average total pay or overtime hours worked. Since it's hard to predict what they'll ask for, you decide to make a function that can handle any request for an average.

columnAverage takes a filename as its first parameter and a string naming a column as its second. It reads the specified file and must average the numeric values in the specified column by adding them up and then dividing by the number of numeric values it found. If the specified column doesn't exist in the specified file, or if it exists but doesn't contain any numeric values (i.e., values which can be converted to a floating-point number using the float function), columnAverage must return None.

You must use a csv.DictReader to read the data.

These examples show how columnAverage should work.

Hint: use try/except to handle the possibility that a value might not be a number. Keep your try block as small as possible to avoid skipping over too much code when an exception does occur.

mergeReports

Another common task is merging reports from two branch locations into one report file.

mergeReports takes three filenames as arguments specifying two files to merge and a file to write the results into. It may assume that both files being merged contain the same columns, in the same order, and it just needs to write one file containing all of the rows from both files being merged, except that the redundant header row of the second file is skipped. The rows of the second file should appear in their original order after the rows of the first file.

These examples show how mergeReports should work. You may assume that the first file includes at least one row of data.

updateReport

Sometimes, a branch will send over one payroll file, and then later send an update because their first report was wrong. These updated reports are a pain to deal with because they only have some of the workers: any worker not in the updated report is assumed to have correct data in the original report. You decide to write a function to make updating reports easy.

updateReport takes three filenames as arguments specifying a original file, an update file, and a result file to write into. It must write an updated report based on the original file by discarding rows for employees listed in the update file, and then adding all rows from the update file (in the same order they appear in that file).

All of the rows of the original file which aren't being updated will come first in the result file, followed by all of the rows in the update file. The values in the "Id" column in the two files determine which rows from the base file are being discarded: if the "Id" value from a row in the original file appears in the "Id" column of any row of the update file, that original row should not be written to the result file (because the corresponding row of the update file will include that employee and will be written to the result).

The columns may be in different orders in the different files, but by using csv.DictReader you won't have to worry about that. The output file should have the same columns in the same order as the first input file.

You must use a csv.DictReader to read the data, and you must use a csv.DictWriter to write the data.

These examples show how updateReport should work. You may assume that the original file includes at least one row of data.

salariesReport

At most locations, there's a manger who gets a salary instead of an hourly wage. One thing your boss usually asks for is a report on just these salaried employees, from all locations. You decide to write a function that will filter a report to just have the "Name", "Id", "Pay", and "Hours" columns, and to only include rows where the "Wage" value is 'salaried'.

salariesReport takes a report filename and an output filename as arguments specifying a report file to read and a result file to write to. It must filter the rows of the input report to just those where the "Wage" value is 'salaried', and include only the "Name", "Id", "Pay", and "Hours" columns in the result.

Salaried employees are any employees whose value for the "Wage" column is the string 'salaried'. Values from the following columns (in this order) should be written into the combined report:

Name, Id, Pay, Hours

Note that the "Wage" column is not written to the report, and that the order of columns differs from the default order in most of the provided files. You may not assume the order of these columns in the CSV file you process, although you may assume that each column exists in the input file.

You must use a csv.DictReader to read the data, which will also mean you don't have to worry about column ordering.

These examples show how salariesReport should work, including what the output file should look like. You may assume that the specified file is a valid report with a correct header containing all of the columns you need. If there are multiple salaried employees in one report file, they should appear in the same order as they appear in that file.

Notes

  • As usual you must document each function you write, and you must not waste fruit or waste boxes.
  • You should check the procedure goals for each of the functions, as they include specific limitations you must obey, especially if you want to satisfy the extra goals, which are not mentioned in the short descriptions above.
  • Most of the procedure goals in this task can be satisfied by using the required function or construct in a helper function that the original function calls. You may want to take advantage of this to centralize some common code for reading from and/or writing to CSV files, although this is not required.
  • You may assume that each file you are processing includes a correct header row, and in many cases (specified above) you may also assume it includes at least one row of data. You do not have to worry about dealing with missing files or directories, or mis-formatted files.

Ungraded Challenge

If you're up for a further challenge, there's one company office in each of the obvious_reports, subtle_reports, and reports directories which is doing something bad. Figure out what's going on, and which office is doing it in each directory.

Examples

Format of payroll files

These examples just show the format of a payload file. Note that for easier viewing, you can copy-paste the contents of a CSV file into a spreadsheet (like Google Sheets or Excel) and possibly choose the "split text to columns" option from the paste menu that pops up when you paste (at least in Google Sheets) if things are all in one column.

Note that the column order is the same in almost all of the CSV files, but you cannot assume that it will always be the same.

Although the precise meaning of each column is not important to this task, the columns are:

  • Name: Employee's first name.
  • Id: Employee's unique ID number.
  • Hours: Hours worked this month (four weeks). Capped at 4x40 = 160.
  • Overtime: Any hours worked beyond 160 this month.
  • Wage: Either a number for the employee's hourly wage, or the word "salaried" for salaried employees.
  • Pay: The actual amount paid. Should be hours times wage, plus overtime times 1.5x wage for hourly employees.
  • Benefits: Money spent on things like health insurance for that employee, which isn't direct compensation. Benefits vary by the individual, but those working fewer than 40 hours per week receive fewer benefits as they are not full-time employees, and those working fewer than 19 hours per week receive even fewer benefits, as they are not even half-time employees.
In []:
%more reports/Wolleslu.csv
File
reports/Wolleslu.csv
Name,Id,Hours,Overtime,Wage,Pay,Benefits Walter,379325,140,0,14.7,2057.66,617.3 Jolynn,379327,132,0,11.58,1528.22,458.46 Anastasia,379330,135,0,13.7,1849.1,554.73 John,379333,132,0,11.6,1530.63,459.19 Emberlynn,379334,156,0,15.06,2349.61,704.88 Mahdi,379337,97,0,11.33,1099.13,329.74 Lazarus,379339,141,0,14.32,2018.98,605.69 Yonatan,379341,101,0,11.56,1167.64,350.29 Colt,379344,148,0,14.27,2112.16,633.65 Levi,379345,140,0,11.71,1639.92,491.98 Ryder,379346,142,0,14.58,2070.51,621.15 Charlie,379347,148,0,14.96,2214.11,664.23 Genesis,379315,134,0,salaried,5278.66,2903.26
In []:
%more reports/Hamingfram.csv
File
reports/Hamingfram.csv
Name,Id,Hours,Overtime,Wage,Pay,Benefits Eleanor,379351,133,0,12.17,1618.41,485.52 Abigail,379352,146,0,11.41,1666.24,499.87 Mila,379361,137,0,14.09,1930.0,579.0 Ethan,379362,76,0,13.99,1063.35,53.17 Lily,379363,66,0,11.65,768.71,38.44 Sadie,379364,125,0,12.31,1538.81,461.64 Gabriel,379350,160,57,salaried,5691.47,3130.31
In []:
%more updates/Newt2.csv
File
updates/Newt2.csv
Name,Id,Hours,Overtime,Wage,Pay,Benefits Skye,379243,113,0,11.39,1287.49,383.62 Arlo,379251,137,0,13.91,1919.59,575.92 Jordan Jr.,379267,25,0,25.00,525.00,0.00 Jordan,379240,128,0,salaried,5066.93,2786.81
In []:
%more special/Mixedup.csv
File
special/Mixedup.csv
Benefits,Name,Pay,Id,Hours,Wage,Overtime 404.47,Luna,1348.22,379328,116,11.62,0 351.79,Phoenix,1172.62,379330,88,13.33,0 409.23,Khiara,1364.11,379331,125,10.91,0 322.16,Helene,1073.85,379333,82,13.1,0 355.97,Alexander,1186.57,379334,93,12.76,0 3126.18,Hadley,5683.96,379327,160,salaried,76

columnAverage examples

columnAverage reads a CSV file and averages all numeric values in the named column. Note that it should work even for files whose columns aren't in the same order as the payroll files we've given you to work with, so you'll need to detect which column to read from based on the first row of the file (or use a csv.DictReader). It will return None if either the file doesn't have a column with the specified name, or if there are no numeric values in that column.

The files referenced in this example and others are provided as part of the starter code.

In []:
columnAverage('reports/Wolleslu.csv', 'Hours')
Out[]:
134.30769230769232
In []:
columnAverage('reports/Wolleslu.csv', 'Pay')
Out[]:
2070.4869230769227
In []:
columnAverage('reports/Notuck.csv', 'Hours')
Out[]:
120.5
In []:
columnAverage('reports/Notuck.csv', 'Pay')
Out[]:
1955.8312500000002

mergeReports examples

mergeReports merges two reports into one file, writing just a single copy of the header row at the top. It is allowed to assume that the files being merged have the same columns in the same order.

In []:
mergeReports('reports/Newt.csv', 'reports/Notuck.csv', 'results/Newtuck.csv')
File
results/Newtuck.csv
Name,Id,Hours,Overtime,Wage,Pay,Benefits Skye,379243,109,0,11.39,1241.93,372.58 Jay,379245,76,0,13.85,1052.89,52.64 Naliyah,379247,186,0,10.34,1923.65,769.46 Ibrahim,379248,145,0,14.3,2073.74,622.12 Londyn,379249,207,0,12.53,2592.75,1037.1 Arlo,379251,145,0,13.91,2016.96,605.09 Jordan,379240,103,0,salaried,5066.93,2786.81 Jazmin,379255,53,0,15.42,817.24,40.86 Isabella,379256,188,0,13.17,2475.62,990.25 Karolyne,379257,60,0,14.25,855.14,42.76 Lucas,379260,146,0,11.2,1634.58,490.37 Palmer,379261,152,0,11.64,1769.49,530.85 Annabelle,379262,65,0,13.91,904.0,45.2 Allison,379263,140,0,14.53,2034.77,610.43 Adrian,379252,160,15,salaried,5155.81,2835.7
In []:
mergeReports( 'reports/Hamingfram.csv', 'reports/Hamneed.csv', 'results/Hamham.csv' )
File
results/Hamham.csv
Name,Id,Hours,Overtime,Wage,Pay,Benefits Eleanor,379351,133,0,12.17,1618.41,485.52 Abigail,379352,146,0,11.41,1666.24,499.87 Mila,379361,137,0,14.09,1930.0,579.0 Ethan,379362,76,0,13.99,1063.35,53.17 Lily,379363,66,0,11.65,768.71,38.44 Sadie,379364,125,0,12.31,1538.81,461.64 Gabriel,379350,160,57,salaried,5691.47,3130.31 Rane,379267,83,0,12.7,1054.21,316.26 Samuel,379268,169,0,13.29,2246.17,898.47 Charlotte,379271,101,0,11.04,1114.76,334.43 Kai,379272,182,0,14.12,2569.0,1027.6 Ahtziri,379276,140,0,10.49,1468.52,440.56 Allister,379266,160,74,salaried,5391.25,2965.19

updateReport examples

updateReport takes an original report and a second report containing some updates and/or additions. It merges these two reports into an updated report, where employee rows from the original report which have an entry with the same ID among the updates are omitted, and those updated rows, along with any rows for employees with new IDs (i.e., all of the rows of updates), are added to the end of the original report.

This function is allowed to assume that the files being combined have the same columns in the same order.

In []:
updateReport('reports/Newt.csv', 'updates/Newt2.csv', 'results/NewtUpdated.csv')
File
results/NewtUpdated.csv
Name,Id,Hours,Overtime,Wage,Pay,Benefits Jay,379245,76,0,13.85,1052.89,52.64 Naliyah,379247,186,0,10.34,1923.65,769.46 Ibrahim,379248,145,0,14.3,2073.74,622.12 Londyn,379249,207,0,12.53,2592.75,1037.1 Skye,379243,113,0,11.39,1287.49,383.62 Arlo,379251,137,0,13.91,1919.59,575.92 Jordan Jr.,379267,25,0,25.00,525.00,0.00 Jordan,379240,128,0,salaried,5066.93,2786.81
In []:
updateReport( 'reports/Thamwal.csv', 'updates/Thamwal2.csv', 'results/ThamwalUpdated.csv' )
File
results/ThamwalUpdated.csv
Name,Id,Hours,Overtime,Wage,Pay,Benefits Penelope,379278,138,0,13.14,1812.86,543.86 Dasia,379279,71,0,12.14,861.76,43.09 Leo,379280,61,0,11.12,678.02,33.9 Nicholas,379281,139,0,11.5,1598.71,479.61 Remi,379292,54,0,10.48,566.17,28.31 Lynnea,379295,147,0,12.67,1862.27,558.68 Xavier,379297,150,0,13.82,2072.94,621.88 Tucker,379300,148,0,12.3,1819.77,545.93 Zamir,379304,59,0,11.93,703.63,35.18 Isaiah,379305,139,0,12.75,1772.46,531.74 Mason,379277,160,38,salaried,5676.21,3121.92 Luis,379291,0,0,14.53,0,0 Luisa,379291,57,0,14.53,828.04,41.4

salariesReport examples

salariesReport takes an input report filename and an output filename and filters the input report, looking for rows where the value of the "Wage" column is 'salaried', indicating a salaried employee who does not have an hourly wage. It gathers all of these rows from every file it inspects, and writes a combined report, which has the following columns:

Name, Id, Pay, Hours

Note that "Wage", "Benefits", and "Overtime" are not included. This function may assume that the input file will include at least these columns, but it may not assume that they'll be in the same order from file to file. It must write them in the order shown above when it creates its output file.

In []:
salariesReport('reports/Wolleslu.csv', 'results/salaries_Wolleslu.csv')
File
results/salaries_Wolleslu.csv
Name,Id,Pay,Hours Genesis,379315,5278.66,134
In []:
salariesReport('reports/Whyland.csv', 'results/salaries_Whyland.csv')
File
results/salaries_Whyland.csv
Name,Id,Pay,Hours Ayden,379366,5668.94,109 Fenix,379482,6000.00,85

Rubric

Group goals:
 
unknown All functions are documented
Each function you define must include a non-empty documentation string as the very first thing in the function.
 
unknown Do not ignore the results of any fruitful function calls
According to the "Don't waste fruit" principle, every place you call a fruitful function (built-in or custom) you must store the result in a variable, or that function call must be part of a larger expression that uses its return value.
 
unknown Do not create any variables that you never make use of
According to the "Don't waste boxes" principle, every time you create a variable (using = or by defining a parameter for a function) you must also later use that variable as part of another expression. If you need to create a variable that you won't use, it must have the name _, but you should only do this if absolutely necessary.
 
unknown columnAverage must return the correct result
The result returned when your columnAverage function is run must match the solution result.
 
unknown columnAverage must return the correct result
The result returned when your columnAverage function is run must match the solution result.
 
unknown mergeReports must write the correct data into the appropriate file
The data written to the appropriate file when your mergeReports function is run must match what the solution writes.
 
unknown mergeReports must write the correct data into the appropriate file
The data written to the appropriate file when your mergeReports function is run must match what the solution writes.
 
unknown updateReport must write the correct data into the appropriate file
The data written to the appropriate file when your updateReport function is run must match what the solution writes.
 
unknown updateReport must write the correct data into the appropriate file
The data written to the appropriate file when your updateReport function is run must match what the solution writes.
 
unknown salariesReport must write the correct data into the appropriate file
The data written to the appropriate file when your salariesReport function is run must match what the solution writes.
 
unknown salariesReport must write the correct data into the appropriate file
The data written to the appropriate file when your salariesReport function is run must match what the solution writes.
 
unknown Define columnAverage with 2 parameters
Use def to define columnAverage with 2 parameters
 
unknown Use any kind of loop
Within the definition of columnAverage with 2 parameters, use any kind of loop in at most 2 places.
 
unknown Use a with statement
Within the definition of columnAverage with 2 parameters, use a with statement in at least one place.
 
unknown Define columnAverage with 2 parameters
Use def to define columnAverage with 2 parameters
 
unknown Call open
Within the definition of columnAverage with 2 parameters, call open in at least one place.
 
unknown Use a CSV DictReader
Within the definition of columnAverage with 2 parameters, use csv.DictReader.
 
unknown Use any kind of loop
Within the definition of columnAverage with 2 parameters, use any kind of loop in at least one place.
 
unknown Use a return statement
Within the definition of columnAverage with 2 parameters, use return _ in at least one place.
 
unknown Define mergeReports with 3 parameters
Use def to define mergeReports with 3 parameters
 
unknown Call open
Within the definition of mergeReports with 3 parameters, call open in at least one and at most 3 places.
 
unknown Use a with statement
Within the definition of mergeReports with 3 parameters, use a with statement in at least one and at most 3 places.
 
unknown Define mergeReports with 3 parameters
Use def to define mergeReports with 3 parameters
 
unknown Call open
Within the definition of mergeReports with 3 parameters, call open in at least one place.
 
unknown Define updateReport with 3 parameters
Use def to define updateReport with 3 parameters
 
unknown Call open
Within the definition of updateReport with 3 parameters, call open in at least one and at most 3 places.
 
unknown Use a with statement
Within the definition of updateReport with 3 parameters, use a with statement in at least one and at most 3 places.
 
unknown Define updateReport with 3 parameters
Use def to define updateReport with 3 parameters
 
unknown Call open
Within the definition of updateReport with 3 parameters, call open in at least one place.
 
unknown Use any kind of loop
Within the definition of updateReport with 3 parameters, use any kind of loop in at least one place.
 
unknown Use a CSV DictReader
Within the definition of updateReport with 3 parameters, use csv.DictReader.
 
unknown Use a CSV DictWriter
Within the definition of updateReport with 3 parameters, use csv.DictWriter.
 
unknown Define salariesReport with 2 parameters
Use def to define salariesReport with 2 parameters
 
unknown Call open
Within the definition of salariesReport with 2 parameters, call open in at least one and at most 2 places.
 
unknown Use a with statement
Within the definition of salariesReport with 2 parameters, use a with statement in at least one and at most 2 places.
 
unknown Define salariesReport with 2 parameters
Use def to define salariesReport with 2 parameters
 
unknown Call open
Within the definition of salariesReport with 2 parameters, call open in at least one place.
 
unknown Use a CSV DictReader
Within the definition of salariesReport with 2 parameters, use csv.DictReader.
 
unknown Use a CSV DictWriter
Within the definition of salariesReport with 2 parameters, use csv.DictWriter.