# Problem Set 7 - Due Fri Dec 11 at 23:59

Back to the problem set 7 page

The task 3 rubric shows how this task will be graded, and can be used as a checklist to make sure you are done with the task.

Put all of your work for this task into the file ``.

This task involves writing five fruitful recursive functions which will operate on strings representing DNA sequences. Genetic information can be represented as a string containing the letters 'A', 'T', 'G', and 'C', which represent the four natural DNA components, called "bases." A single strand of DNA is one such string, while normally two complimentary strings would be paired together. For each 'A' in one string, there would be a matching 'T' in the other, and vice versa, while for each 'G', there would be a matching 'C' (and again vice versa). So for example, the strings:

``````GATTACA
CTAATGT``````

represent two complimentary DNA sequences that could be matched together. The functions you have to write for this task are as follows:

• Part A: `countBases` - Counts the number of a specific DNA base within a given sequence.
• Part B: `symmetricStrand` - Returns the symmetric DNA sequence that matches a provided sequence.
• Part C: `onlyTA` - Returns a new sequence that contains only 'A' and 'T' bases, with all 'G' and 'C' bases removed.
• Part D: `unmatchedCount` - Given two sequences, returns the number of places at which their bases do not match up according to the A-T and G-C matching rules.
• Part E: `cutOut` - Given a DNA sequence and a target sequence, returns the result of removing each non-overlapping copy of the target sequence from the provided sequence.

For this task you are provided with a function `matchingBase` which given one base will return the matching base (i.e., given 'A' it returns 'T', etc.).

All parts of this task must be accomplished using fruitful recursive functions, and you are not allowed to use any loops in this task.

## Part A: `countBases`

`countBases` takes two arguments: the sequence of bases to inspect, and the base to look for. It returns how many copies of that base exist within the given sequence (an integer).

Examples:

``````>>> countBases("GATTACA", "A")
3
>>> countBases("TATATA", "T")
3
>>> countBases("TATATA", "G")
0
>>> countBases("T", "T")
1
>>> countBases("", "T")
0``````

## Part B: `symmetricStrand`

`symmetricStrand` accepts just one argument: the sequence to create a complimentary strand for. It returns a string of the same length representing the complimentary strand, where each 'A' has been swapped for a 'T', each 'T' for an 'A', each 'G' for a 'C', and each 'C' for a 'G'.

Examples:

``````>>> symmetricStrand("GACT")
'CTGA'
>>> symmetricStrand("ATC")
'TAG'
>>> symmetricStrand("CATAGAG")
'GTATCTC'``````

## Part C: `onlyTA`

`onlyTA` accepts one argument: the sequence to process. It returns a new string where each 'G' and 'C' base has been removed, leaving only the 'A' and 'T' bases.

Examples:

``````>>> onlyTA("ATGCACTA")
'ATATA'
>>> onlyTA("AT")
'AT'
>>> onlyTA("GCG")
''
>>> onlyTA("GAGCTCG")
'AT'``````

## Part D: `unmatchedCount`

`unmatchedCount` takes two arguments representing two sequences of DNA (which will always have the same length). Its job is to return the number of positions where the two sequences are mismatched, meaning that the base from one sequence is not the correct matching base for the base from the other sequence.

Examples:

``````>>> unmatchedCount("AAA", "TTT")
0
>>> unmatchedCount("AAA", "TTC")
1
>>> unmatchedCount("AAA", "GGG")
3
>>> unmatchedCount("GATTACA", "CTAATGT")
0
>>> unmatchedCount("GATTACA", "CGATTGT")
2``````

## Part E: `cutOut`

`cutOut` takes two arguments: a string representing a sequence of DNA bases, and a (usually shorter) target sequence. It should find all places in the first sequence where the target sequence exists and remove those, returning what remains of the first sequence.

Examples:

``````>>> cutOut("GATTACA", "ATTA")
'GCA'
>>> cutOut("TAGAGCGAT", "AG")
'TCGAT'
>>> cutOut("ATTGCCAG", "C")
'ATTGAG'
>>> cutOut("TATATATAT", "TAT")
'AAT'``````

Note: as implied by the last example, we remove copies of the target sequence one-by-one, so overlapping copies are not always removed. For that sequence, the removals are:
"TATATATAT" → "TAT | ATATAT", "ATATAT" → "A | TAT | AT".

Hint: The `.startswith` method of strings can be useful for solving this part.

When you are done with this task, you should submit it via the Ocean server. Remember to follow the submission instructions when you are done with the whole problem set.