Problem Set 7 - Due Fri Dec 11 at 23:59
The task 3 rubric shows how this task will be graded, and can be used as a checklist to make sure you are done with the task.
Put all of your work for this task into the file ``.
This task involves writing five fruitful recursive functions which will operate on strings representing DNA sequences. Genetic information can be represented as a string containing the letters 'A', 'T', 'G', and 'C', which represent the four natural DNA components, called "bases." A single strand of DNA is one such string, while normally two complimentary strings would be paired together. For each 'A' in one string, there would be a matching 'T' in the other, and vice versa, while for each 'G', there would be a matching 'C' (and again vice versa). So for example, the strings:
represent two complimentary DNA sequences that could be matched together. The functions you have to write for this task are as follows:
- Part A:
countBases- Counts the number of a specific DNA base within a given sequence.
- Part B:
symmetricStrand- Returns the symmetric DNA sequence that matches a provided sequence.
- Part C:
onlyTA- Returns a new sequence that contains only 'A' and 'T' bases, with all 'G' and 'C' bases removed.
- Part D:
unmatchedCount- Given two sequences, returns the number of places at which their bases do not match up according to the A-T and G-C matching rules.
- Part E:
cutOut- Given a DNA sequence and a target sequence, returns the result of removing each non-overlapping copy of the target sequence from the provided sequence.
For this task you are provided with a function
matchingBase which given
one base will return the matching base (i.e., given 'A' it returns 'T',
All parts of this task must be accomplished using fruitful recursive functions, and you are not allowed to use any loops in this task.
countBases takes two arguments: the sequence of bases to inspect, and
the base to look for. It returns how many copies of that base exist
within the given sequence (an integer).
>>> countBases("GATTACA", "A") 3 >>> countBases("TATATA", "T") 3 >>> countBases("TATATA", "G") 0 >>> countBases("T", "T") 1 >>> countBases("", "T") 0
symmetricStrand accepts just one argument: the sequence to create a
complimentary strand for. It returns a string of the same length
representing the complimentary strand, where each 'A' has been swapped
for a 'T', each 'T' for an 'A', each 'G' for a 'C', and each 'C' for a
>>> symmetricStrand("GACT") 'CTGA' >>> symmetricStrand("ATC") 'TAG' >>> symmetricStrand("CATAGAG") 'GTATCTC'
onlyTA accepts one argument: the sequence to process. It returns a new
string where each 'G' and 'C' base has been removed, leaving only the 'A'
and 'T' bases.
>>> onlyTA("ATGCACTA") 'ATATA' >>> onlyTA("AT") 'AT' >>> onlyTA("GCG") '' >>> onlyTA("GAGCTCG") 'AT'
unmatchedCount takes two arguments representing two sequences of DNA
(which will always have the same length). Its job is to return the number
of positions where the two sequences are mismatched, meaning that the
base from one sequence is not the correct matching base for the base
from the other sequence.
>>> unmatchedCount("AAA", "TTT") 0 >>> unmatchedCount("AAA", "TTC") 1 >>> unmatchedCount("AAA", "GGG") 3 >>> unmatchedCount("GATTACA", "CTAATGT") 0 >>> unmatchedCount("GATTACA", "CGATTGT") 2
cutOut takes two arguments: a string representing a sequence of DNA
bases, and a (usually shorter) target sequence. It should find all places
in the first sequence where the target sequence exists and remove those,
returning what remains of the first sequence.
>>> cutOut("GATTACA", "ATTA") 'GCA' >>> cutOut("TAGAGCGAT", "AG") 'TCGAT' >>> cutOut("ATTGCCAG", "C") 'ATTGAG' >>> cutOut("TATATATAT", "TAT") 'AAT'
Note: as implied by the last example, we remove copies of the target
sequence one-by-one, so overlapping copies are not always removed. For
that sequence, the removals are:
"TATATATAT" → "TAT | ATATAT", "ATATAT" → "A | TAT | AT".
.startswith method of strings can be useful for solving this