Instructions for recursiveGenetics

(produced at 10:49 a.m. UTC on 2024-04-15)

This task is part of project10 which is due at 23:00 EDT on 2024-04-23.

You have the option to work with a partner on this task if you wish. Working with a partner requires more work to coordinate schedules, but if you work together and make sure that you are both understanding the code you write, you will make progress faster and learn more.

You can download the starter code for this task using this link.

You can submit this task using this link.

Put all of your work for this task into the file recursiveGenetics.py
(which is provided among the starter files)

This task will give you practice with fruitful recursion. For most of the functions, it asks you to do the same thing you did in the earlier genetics task, but this time using recursion rather than loops.

Recall that in the genetics task, we were dealing with sequences of letters representing DNA and RNA bases, using the letters 'A', 'U', 'G', and 'C', along with 'T' instead of 'U' for DNA. You will be using these strings again in this assignment, and again, you don't need to know any of the details of the biology to get things to work.

There are four different functions you must write, which are equivalent to functions from the earlier genetics task. All functions in this assignment require you to use recursion, and conversely, you are not allowed to use any loops in these functions. The descriptions from the previous task have been repeated below, even though (except that you must use recursion) their required behavior is exactly identical to the previous task.

  1. countOtherBases - This function takes two arguments — (1) a sequence of bases and (2) a particular base to exclude — and returns the number of bases (i.e., letters) in the sequence that are different from excluded base. These examples for countOtherBases show how it works.
  2. templateSequence - Given a sequence of RNA bases, returns the sequence of DNA bases that must have been used to create that RNA sequence (i.e., the template for that piece of RNA). You must use the provided templateBase function to compute the DNA template base that corresponds to a single RNA base within this function. These examples for templateSequence demonstrate how it works.
  3. onlyAU - Given an RNA sequence, returns a new sequence consisting of only the 'A' and 'U' bases from the original sequence with the 'C' and 'G' bases removed. These examples for onlyAU demonstrate how it must work.
  4. transcriptionErrors - Given two sequences, one of RNA and one of DNA, looks for transcription errors where the base in the RNA sequence doesn't match the base in the DNA sequence. Each RNA base is supposed to be transcribed from a particular DNA base (for example, the DNA base 'A' becomes the RNA base 'U'), but sometimes errors occur in this process. Given an RNA sequence and the DNA it came from, any place where the corresponding bases aren't paired (according to the templateBase function) is a transcription error. This function must return the number of transcription errors evident in the given RNA and DNA pair. This example of transcriptionErrors demonstrates how this should work. For this function, you may NOT assume that the two strings provided will always be the same length, and if one is longer, in addition to counting errors in the portion where they overlap, each extra base in the longer string should count as an error, although we will only test this as an extra goal.

Once you have written these functions and have tested them to make sure they work correctly, you are done with this task. Refer to the examples below in understanding how these functions should work, and feel free to refer back to our solutions for the earlier genetics task which are available via the Potluck server.

Notes

  • The genetics.py file contains a correct definition of the templateBase function that you must use in your definitions of templateSequence and transcriptionErrors
  • For all the functions, you may assume that all of the arguments are valid, and do not need to handle any error cases. This means that:
    • You may assume that all DNA/RNA sequence arguments are strings with zero or more correct bases.
    • You may assume that any base argument is a single-character string that is a valid base.
  • There are various requirements on the structure of each function which are expressed in the rubric, but are not described explicitly above. You should check the rubric before defining each function to see which other functions you may be required to use. In many cases there are also extra goals which require writing functions in a particular way, often by using only a recursive call, for example.
  • As usual, you must document every function you write with a triple-quoted docstring after the function header.
  • Also as usual, you must not waste fruit and you must not waste boxes.
  • We have provided a testing file test_recursiveGenetics.py that you can use to test your code. You are encouraged to define extra tests beyond the few we've provided, since they are not sufficient to catch many kinds of errors you might encounter.

Examples

countOtherBases Examples

Examples of how countOtherBases works. You may assume that the second argument will always be a single letter.

In []:
countOtherBases('GAUUACA', 'A')
Out[]:
4
In []:
countOtherBases('GAUUACA', 'G')
Out[]:
6

templateSequence Examples

Examples of how templateSequence works. You are required to use the provided templateBase function to compute the template version of each RNA base.

In []:
templateSequence('GACU')
Out[]:
'CTGA'
In []:
templateSequence('AAA')
Out[]:
'TTT'
In []:
templateSequence('UUU')
Out[]:
'AAA'

onlyAU Examples

Examples of how onlyAU works.

In []:
onlyAU('GUACGU')
Out[]:
'UAU'
In []:
onlyAU('AAU')
Out[]:
'AAU'
In []:
onlyAU('GCC')
Out[]:
''

transcriptionErrors Examples

Examples of how transcriptionErrors works. Note that it looks at the first RNA base and the first DNA base together, determines if they are a correct match or not (according to the templateBase function), and then moves on and repeats this process for each pair of bases, counting the number of mismatched pairs. In the first example, for instance, because the templateBase for 'A' is 'T', the first 'A' of the RNA sequence and the first 'T' of the DNA sequence are matching (i.e., not an error). The second bases also match (they're 'A' and 'T' again), but in the third position, the 'A' in the RNA does not match the 'C' in the DNA (it should have been a third 'T', or the RNA base should have been a 'G'). Thus the number of errors is 1 for those two sequences. The second example shows two sequences that are perfectly matched, while the third example contains two errors: the initial 'C' matched with a 'C', and the second 'A' in the DNA sequence matched with an 'A' in the RNA sequence.

In []:
transcriptionErrors('AAA', 'TTC')
Out[]:
1
In []:
transcriptionErrors('AAG', 'TTC')
Out[]:
0
In []:
transcriptionErrors('CAGUAGG', 'CTCAACC')
Out[]:
2
In []:
transcriptionErrors('GAUA', 'CTTTCG')
Out[]:
3

Rubric

Group goals:
 
unknown Do not ignore the results of any fruitful function calls
According to the "Don't waste fruit" principle, every place you call a fruitful function (built-in or custom) you must store the result in a variable, or that function call must be part of a larger expression that uses its return value.
 
unknown Do not create any variables that you never make use of
According to the "Don't waste boxes" principle, every time you create a variable (using = or by defining a parameter for a function) you must also later use that variable as part of another expression. If you need to create a variable that you won't use, it must have the name _, but you should only do this if absolutely necessary.
 
unknown All functions are documented
Each function you define must include a non-empty documentation string as the very first thing in the function.
 
unknown countOtherBases must return the correct result
The result returned when your countOtherBases function is run must match the solution result.
 
unknown countOtherBases must return the correct result
The result returned when your countOtherBases function is run must match the solution result.
 
unknown templateSequence must return the correct result
The result returned when your templateSequence function is run must match the solution result.
 
unknown templateSequence must return the correct result
The result returned when your templateSequence function is run must match the solution result.
 
unknown onlyAU must return the correct result
The result returned when your onlyAU function is run must match the solution result.
 
unknown onlyAU must return the correct result
The result returned when your onlyAU function is run must match the solution result.
 
unknown transcriptionErrors must return the correct result
The result returned when your transcriptionErrors function is run must match the solution result.
 
unknown transcriptionErrors must return the correct result
The result returned when your transcriptionErrors function is run must match the solution result.
 
unknown Define countOtherBases with 2 parameters
Use def to define countOtherBases with 2 parameters
 
unknown Call countOtherBases
Within the definition of countOtherBases with 2 parameters, call countOtherBases in exactly one place.
 
unknown Define countOtherBases with 2 parameters
Use def to define countOtherBases with 2 parameters
 
unknown Do not use any kind of loop
Within the definition of countOtherBases with 2 parameters, do not use any kind of loop.
 
unknown Use a return statement
Within the definition of countOtherBases with 2 parameters, use return _ in at least one place.
 
unknown Call countOtherBases
Within the definition of countOtherBases with 2 parameters, call countOtherBases in at least one place.
 
unknown Don't use .count
Within the definition of countOtherBases with 2 parameters, strings have a built-in .count method, but you may not use is in this function.
 
unknown Define templateSequence with 1 parameter
Use def to define templateSequence with 1 parameter
 
unknown Call templateSequence
Within the definition of templateSequence with 1 parameter, call templateSequence in exactly one place.
 
unknown Define templateSequence with 1 parameter
Use def to define templateSequence with 1 parameter
 
unknown Do not use any kind of loop
Within the definition of templateSequence with 1 parameter, do not use any kind of loop.
 
unknown Use a return statement
Within the definition of templateSequence with 1 parameter, use return _ in at least one place.
 
unknown Call templateSequence
Within the definition of templateSequence with 1 parameter, call templateSequence in at least one place.
 
unknown Use a return statement
Within the definition of templateSequence with 1 parameter, use return _ in at least one place.
 
unknown Call templateBase
Within the definition of templateSequence with 1 parameter, call templateBase in at least one place.
 
unknown Define onlyAU with 1 parameter
Use def to define onlyAU with 1 parameter
 
unknown Call onlyAU
Within the definition of onlyAU with 1 parameter, call onlyAU in exactly one place.
 
unknown Define onlyAU with 1 parameter
Use def to define onlyAU with 1 parameter
 
unknown Do not use any kind of loop
Within the definition of onlyAU with 1 parameter, do not use any kind of loop.
 
unknown Use a return statement
Within the definition of onlyAU with 1 parameter, use return _ in at least one place.
 
unknown Call onlyAU
Within the definition of onlyAU with 1 parameter, call onlyAU in at least one place.
 
unknown Define transcriptionErrors with 2 parameters
Use def to define transcriptionErrors with 2 parameters
 
unknown Call transcriptionErrors
Within the definition of transcriptionErrors with 2 parameters, call transcriptionErrors in exactly one place.
 
unknown Define transcriptionErrors with 2 parameters
Use def to define transcriptionErrors with 2 parameters
 
unknown Do not use any kind of loop
Within the definition of transcriptionErrors with 2 parameters, do not use any kind of loop.
 
unknown Use a return statement
Within the definition of transcriptionErrors with 2 parameters, use return _ in at least one place.
 
unknown Call transcriptionErrors
Within the definition of transcriptionErrors with 2 parameters, call transcriptionErrors in at least one place.