Instructions for genetics

(produced at 17:39 UTC on 2024-09-30)

This task is part of project05 which is due at 23:00 EDT on 2024-10-08.

You have the option to work with a partner on this task if you wish. Working with a partner requires more work to coordinate schedules, but if you work together and make sure that you are both understanding the code you write, you will make progress faster and learn more.

You can download the starter code for this task using this link.

You can submit this task using this link.

Put all of your work for this task into the file genetics.py
(which is provided among the starter files)

This task will give you practice using loops to deal with strings letter-by-letter. It is biology-themed, but no understanding of biology should be required to complete it.

RNA and DNA are molecules that make up our genetic code, and their structure can be represented as a sequence of bases, represented by the letters 'A', 'T', 'C', 'G', and 'U'. Each base is one letter. In our bodies, DNA gets translated into RNA, by replacing each 'A' with a 'U', each 'T' with an 'A', each 'C' with a 'G', and each 'G' with a 'C'. In this task we will be dealing with strings containing these letters, and writing functions to process these strings using loops that consider each letter individually. There are 5 different functions you must write:

  1. countOtherBases - This function takes two arguments — (1) a sequence of bases and (2) a particular base to exclude — and returns the number of bases (i.e., letters) in the sequence that are different from excluded base. These examples for countOtherBases show how it works.
  2. templateSequence - Given a sequence of RNA bases, returns the sequence of DNA bases that must have been used to create that RNA sequence (i.e., the template for that piece of RNA). You must use the provided templateBase function to compute the DNA template base that corresponds to a single RNA base within this function. These examples for templateSequence demonstrate how it works.
  3. onlyAU - Given an RNA sequence, returns a new sequence consisting of only the 'A' and 'U' bases from the original sequence with the 'C' and 'G' bases removed. These examples for onlyAU demonstrate how it must work.
  4. transcriptionErrors - Given two sequences, one of RNA and one of DNA, looks for transcription errors where the base in the RNA sequence doesn't match the base in the DNA sequence. Each RNA base is supposed to be transcribed from a particular DNA base (for example, the DNA base 'A' becomes the RNA base 'U'), but sometimes errors occur in this process. Given an RNA sequence and the DNA it came from, any place where the corresponding bases aren't paired (according to the templateBase function) is a transcription error. This function must return the number of transcription errors evident in the given RNA and DNA pair. This example of transcriptionErrors demonstrates how this should work. For this function, you may NOT assume that the two strings provided will always be the same length, and if one is longer, in addition to counting errors in the portion where they overlap, each extra base in the longer string should count as an error, although we will only test this as an extra goal.
  5. hasCGBlock - Given a sequence of RNA, this function must return True if it contains a block of 5 consecutive 'C' and/or 'G' bases, and False otherwise. The block may be any combination of 'C' and 'G' bases as long as there are 5 in a row with no other bases in between them. But if other bases are present, there might be more than 5 total 'C' or 'G' bases in the sequence without it actually containing a 'CG' block. These examples of hasCGBlock demonstrate how it must work.

Once you have written these five functions and have tested them to make sure they work correctly, you are done with this task. Refer to the examples below in understanding how these functions should work.

Notes

  • The genetics.py file contains a correct definition of the templateBase function that you must use in your definitions of templateSequence and transcriptionErrors
  • For all the functions, you may assume that all of the arguments are valid, and do not need to handle any error cases. This means that:
    • You may assume that all DNA/RNA sequence arguments are strings with zero or more correct bases which are entirely in upper case.
    • You may assume that any base argument is a single-character string that is a valid base.
  • There are various requirements on the structure of each function which are expressed in the rubric, but are not described explicitly above. You should check the rubric before defining each function to see which other functions you may be required to use.
  • As usual, you must document every function you write with a triple-quoted docstring after the function header.
  • Also as usual, you must not waste fruit and you must not waste boxes.
  • We have provided some tests using optimism for this task in the test_genetics.py file. This time around, we have not provided enough tests to really check most of your functions. You are encouraged to edit test_genetics.py and add more test cases (just follow the format of the cases that we've provided) if you want to be sure that your functions work correctly.

Examples

countOtherBases Examples

Examples of how countOtherBases works. You may assume that the second argument will always be a single letter.

In []:
countOtherBases('GAUUACA', 'A')
Out[]:
4
In []:
countOtherBases('GAUUACA', 'G')
Out[]:
6

templateSequence Examples

Examples of how templateSequence works. You are required to use the provided templateBase function to compute the template version of each RNA base.

In []:
templateSequence('GACU')
Out[]:
'CTGA'
In []:
templateSequence('AAA')
Out[]:
'TTT'
In []:
templateSequence('UUU')
Out[]:
'AAA'

onlyAU Examples

Examples of how onlyAU works.

In []:
onlyAU('GUACGU')
Out[]:
'UAU'
In []:
onlyAU('AAU')
Out[]:
'AAU'
In []:
onlyAU('GCC')
Out[]:
''

transcriptionErrors Examples

Examples of how transcriptionErrors works. Note that it looks at the first RNA base and the first DNA base together, determines if they are a correct match or not (according to the templateBase function), and then moves on and repeats this process for each pair of bases, counting the number of mismatched pairs. In the first example, for instance, because the templateBase for 'A' is 'T', the first 'A' of the RNA sequence and the first 'T' of the DNA sequence are matching (i.e., not an error). The second bases also match (they're 'A' and 'T' again), but in the third position, the 'A' in the RNA does not match the 'C' in the DNA (it should have been a third 'T', or the RNA base should have been a 'G'). Thus the number of errors is 1 for those two sequences. The second example shows two sequences that are perfectly matched, while the third example contains two errors: the initial 'C' matched with a 'C', and the second 'A' in the DNA sequence matched with an 'A' in the RNA sequence.

In []:
transcriptionErrors('AAA', 'TTC')
Out[]:
1
In []:
transcriptionErrors('AAG', 'TTC')
Out[]:
0
In []:
transcriptionErrors('CAGUAGG', 'CTCAACC')
Out[]:
2
In []:
transcriptionErrors('GAUA', 'CTTTCG')
Out[]:
3

hasCGBlock Examples

Examples of how hasCGBlock works. Note that in the second example, there are a total of at least 5 'C's and 'G's, but they don't form a continuous block (the 'A' interrupts them).

In []:
hasCGBlock('CGGCC')
Out[]:
True
In []:
hasCGBlock('CGACCG')
Out[]:
False
In []:
hasCGBlock('CGACCGCGU')
Out[]:
True

Rubric

Group goals:
 
unknown Do not ignore the results of any fruitful function calls
According to the "Don't waste fruit" principle, every place you call a fruitful function (built-in or custom) you must store the result in a variable, or that function call must be part of a larger expression that uses its return value.
 
unknown Do not create any variables that you never make use of
According to the "Don't waste boxes" principle, every time you create a variable (using = or by defining a parameter for a function) you must also later use that variable as part of another expression. If you need to create a variable that you won't use, it must have the name _, but you should only do this if absolutely necessary.
 
unknown All functions are documented
Each function you define must include a non-empty documentation string as the very first thing in the function.
 
unknown countOtherBases must return the correct result
The result returned when your countOtherBases function is run must match the solution result.
 
unknown countOtherBases must return the correct result
The result returned when your countOtherBases function is run must match the solution result.
 
unknown templateSequence must return the correct result
The result returned when your templateSequence function is run must match the solution result.
 
unknown templateSequence must return the correct result
The result returned when your templateSequence function is run must match the solution result.
 
unknown onlyAU must return the correct result
The result returned when your onlyAU function is run must match the solution result.
 
unknown onlyAU must return the correct result
The result returned when your onlyAU function is run must match the solution result.
 
unknown transcriptionErrors must return the correct result
The result returned when your transcriptionErrors function is run must match the solution result.
 
unknown transcriptionErrors must return the correct result
The result returned when your transcriptionErrors function is run must match the solution result.
 
unknown hasCGBlock must return the correct result
The result returned when your hasCGBlock function is run must match the solution result.
 
unknown hasCGBlock must return the correct result
The result returned when your hasCGBlock function is run must match the solution result.
 
unknown Define countOtherBases with 2 parameters
Use def to define countOtherBases with 2 parameters
 
unknown Use any kind of loop
Within the definition of countOtherBases with 2 parameters, use any kind of loop in exactly one place.
 
unknown Define countOtherBases with 2 parameters
Use def to define countOtherBases with 2 parameters
 
unknown Use any kind of loop
Within the definition of countOtherBases with 2 parameters, use any kind of loop in at least one place.
 
unknown Use a conditional
Within the loop within the definition of countOtherBases with 2 parameters, use an if statement (possibly accompanied by an elif or else block) in at least one place.
 
unknown Use a return statement
Within the definition of countOtherBases with 2 parameters, use return _ in at least one place.
 
unknown Don't use .count
Within the definition of countOtherBases with 2 parameters, strings have a built-in .count method, but you may not use is in this function.
 
unknown Define templateSequence with 1 parameter
Use def to define templateSequence with 1 parameter
 
unknown Use any kind of loop
Within the definition of templateSequence with 1 parameter, use any kind of loop in exactly one place.
 
unknown Define templateSequence with 1 parameter
Use def to define templateSequence with 1 parameter
 
unknown Use any kind of loop
Within the definition of templateSequence with 1 parameter, use any kind of loop in at least one place.
 
unknown Call templateBase
Within the loop within the definition of templateSequence with 1 parameter, call templateBase in at least one place.
 
unknown Use a return statement
Within the definition of templateSequence with 1 parameter, use return _ in at least one place.
 
unknown Define onlyAU with 1 parameter
Use def to define onlyAU with 1 parameter
 
unknown Use any kind of loop
Within the definition of onlyAU with 1 parameter, use any kind of loop in exactly one place.
 
unknown Define onlyAU with 1 parameter
Use def to define onlyAU with 1 parameter
 
unknown Use any kind of loop
Within the definition of onlyAU with 1 parameter, use any kind of loop in at least one place.
 
unknown Use a conditional
Within the loop within the definition of onlyAU with 1 parameter, use an if statement (possibly accompanied by an elif or else block) in at least one place.
 
unknown Use a return statement
Within the definition of onlyAU with 1 parameter, use return _ in at least one place.
 
unknown Define transcriptionErrors with 2 parameters
Use def to define transcriptionErrors with 2 parameters
 
unknown Use any kind of loop
Within the definition of transcriptionErrors with 2 parameters, use any kind of loop in exactly one place.
 
unknown Define transcriptionErrors with 2 parameters
Use def to define transcriptionErrors with 2 parameters
 
unknown Call len
Within the definition of transcriptionErrors with 2 parameters, call len in at least one place.
 
unknown Use any kind of loop
Within the definition of transcriptionErrors with 2 parameters, use any kind of loop in at least one place.
 
unknown Use a conditional
Within the loop within the definition of transcriptionErrors with 2 parameters, use an if statement (possibly accompanied by an elif or else block) in at least one place.
 
unknown Use a return statement
Within the definition of transcriptionErrors with 2 parameters, use return _ in at least one place.
 
unknown Define hasCGBlock with 1 parameter
Use def to define hasCGBlock with 1 parameter
 
unknown Use any kind of loop
Within the definition of hasCGBlock with 1 parameter, use any kind of loop in exactly one place.
 
unknown Use a conditional
Within the loop within the definition of hasCGBlock with 1 parameter, use an if statement (possibly accompanied by an elif or else block) in at least one place.
 
unknown Use a return statement
Within the if/else block within the loop within the definition of hasCGBlock with 1 parameter, use return _ in at least one place.
 
unknown Define hasCGBlock with 1 parameter
Use def to define hasCGBlock with 1 parameter
 
unknown Use any kind of loop
Within the definition of hasCGBlock with 1 parameter, use any kind of loop in at least one place.
 
unknown Use a conditional
Within the loop within the definition of hasCGBlock with 1 parameter, use an if statement (possibly accompanied by an elif or else block) in at least one place.
 
unknown Use a return statement
Within the definition of hasCGBlock with 1 parameter, use return _ in at least one place.