CS111 Lecture: Testing and Debugging

Table of Contents:

  1. Overview
  2. Motivation for Testing Functions
  3. Towards Automated Testing: Printing Test Case
  4. Digression: Creating complex strings with .format
  5. Digression: Iterating over Lists of Tuples
  6. Towards Automated Testing: Looping over Test Cases
  7. Automated Testing with Expected Values
  8. Using optimism for input/output testing
  9. Designing Black-box Test Cases
  10. Designing Glass-box Test Cases and Minimal Counterexamples
  11. Debugging Techniques
  12. ## 12. Debugging Exercise with isBeauteous

1. Overview

As you've learned so far in CS111, it can be challenging to create programs that work correctly in all cases.

Programs that don't behave correctly are said to be buggy because they contain bugs = reasons why they don't work. The process of finding and fixing these bugs is called debugging.

The first step in the debugging process is testing. You don't know whether a program might be incorrect until you have evidence that it's not working as expected. We begin today by discussing how to develop test cases that help us determine cases in which programs misbehave.

Here are the high-level steps in today's lecture:

  1. Testing test cases interactively in Thonny is cumbersome. We'll show how to automate input/output testing using test cases that consist of (1) a set of inputs for the function we want to test and (2) the corresponding outputs we expect that function to return for these inputs. We'll then compare the actual results returned by the function to the expected results, and pay attention to the cases where these differ. This approach can be used in any programming language.

  2. Developing our own programs to perform input/output testing has a high overhead. We show how the optimism library can simplify testing in Python. We've used optimism for testing in several previous psets, but haven't really asked you to understand the details. After this lecture, we'll expect that you will know how to use optimism to express input/output test cases going forward. This is an important skill you need to master as you mature as programmers.

  3. What do you do when testing reveals cases in which your program doesn't work? You'll then need to apply debugging techniques to determine why it misbehaves and how to fix it.

The goals of this lecture are to:

  1. Give you an appreciation for the importance of automated input/output testing;
  2. Teach you how to develop effective test cases;
  3. Show you how to use optimism to express those test cases; and
  4. Intoduce useful debugging techniques that will help you find and fix bugs in your programs.

2. Motivation for Testing Functions

The file functionsToTest.py contains several functions that will be used as examples in this lecture. Many are buggy, but some are correct.

Do not study these functions definition now. Instead, treat them as a black boxes = functions that you will call without understanding how they work. We will look at the details later in lecture.

One of the functions in functionsToTest.py is a a buggy version of the countChar function that was defined at the end of the notebook for Lec 08 While Loops and was revisited (in for loop form) at the beginning of the notebook for Lec 09 Sequences and Loops (where it was renamed countLetter).

def countChar(char, word):
    '''Return the number of time char occurs in word, ignoring case.'''
    # Details omitted

In the version of countChar we study in this lecture, it should ignore the case of the character when counting. So countChar('e', 'Eagle') and countChar('E', 'Eagle') should both return 2.

Without studying the function definition, let's try some test cases to see in which situations the function works and doesn't work. Make some hypotheses about possible bugs in the function.

3. Towards Automated Testing: Printing Test Cases

It's tedious to interactively test calls to countChar one at a time. Can we do better?

The following doesn't work, because the notebook returns only the value of the last expression in a cell.

Running a file in Thonny is similarly problematic. That never automatically shows the values of any expressions.

In these cases, we'll need to use print to display the results of test cases.

Above, we can see the results of all the tests, but it's challenging to match them up with the test cases.

We can improve things by printing more context for each test case:

Let's capture the pattern in each line with a printCountChar function:

4. Digression: Creating complex strings with Python 3's f-strings

The string concatenations in printCountChar are difficult to create and read.

You have encounter similar situations before in tasks like timeProfiler and timeProfilerDeluxe.

Is there a better way? Yes! The version of Python we use in Thonny (Python 3.7) supports a feature called f-strings that greatly simply specifying complex strings that have constant parts and parts that are the results of evaluating expressions.

An f-string is a string preceded by the character f that specifies a string template with "holes" (marked by {}) that can be filled by the results of arbitrary Python expressions. The results of the expressions in the holes are automatically converted to strings, so we don't need to explicitly use str to do that.

For example:

Take a moment to appreciate the power of f-strings. Without f-strings, we would need to change the line

print(f'{n1} + {n2} => {n1+n2}')

to

print(str(n1) + ' + ' + str(n2) + ' => ' + str(n1+n2))

The f-string version is so much easier to read and write!

Now let's use f-string to simplify the definition of printCountChar:

5. Digression: Iterating over Lists of Tuples

One way to iterate over lists of tuples (or lists of lists) is to access the tuples elements by index:

But if the length of all the tuples in the list is the same, we can instead name the n individual tuple elements by n comma-separated variable names between the for and in:

This is basically a convenient abbreviation for the following:

6. Towards Automated Testing: Looping over Test Cases

We can put test inputs as tuples into a list testInputs and then test each pair of inputs:

An advantage of putting test cases into a list is that we can auomatically adapt each test case to handle variants, such as considering char and word with uppercase and lowercase variants:

7. Automated Testing with Expected Values

In printouts like the ones above, we still need to carefully examine the results of each test case to determine when they're correct or incorrect. Can we do better?

Yes! By including the expected result value in the test tuple, we can highlight test cases that fail.

If we only care about highlighting the FAILED cases, we can (1) remove the printing of PASSED cases and (2) keep track of the number of FAILED cases in order display a summary at the end.

Also, in this particular case, we can leverage list comprehensions (see Lec 12) to give us more combinations of lower/upper case values in our test cases. This is not generally useful, but is helpful in this specific problem.

Based on the above test failures, can you guess the source of some of the bugs in the buggy version of countChar?

8. Using optimism for input/output testing

Developing testing functions like testCountChar and findCountCharFailures for all the different functions we might want to test takes a lot of work. We could use some functional abstraction techniques we've already seen (plus some other features we'll learn) to generalize the these functions, but handling all sorts of special cases is challenging. E.g., How do we handle functions that get input from the user? How do we handle functions that print output in addition to (or instead of) returning it.

An alternative approach is to use a library of powerful functions that work together to express a mini-language for testing. One such library is the optimism library developed by CS111's very own Peter Mawhorter. You have already encountered examples of optimism in some of your CS111 psets. We'll now explore some of its simple features that you can use for input/output testing.

To use optimism, first import it via

import optimism as opt

You can then use opt.testCase to specify a function call you want to test and opt.expectResult to specify the result that is expected from the most recent test from opt.testCase. For example, below is a function someOptimismCountCharTests that tests the same test cases as in countCharTestCases above.

def someOptimismCountCharTests():
  '''
    A simple illustration of using optimism to perform input/ouptut testing
    on the countChar function. It performs tests corresponding to these cases:

       countCharTestCases = [
          # Each tuple in list now has three elements:
          # (1) character; (2) word; and (3) expected result
          ('s','Mississippi', 4),
          ('i','Mississippi', 4),
          ('p','Mississippi', 2),
          ('m','Mississippi', 1),
          ('a','Mississippi', 0),
        ]
    '''
    opt.testCase(countChar('s', 'Mississippi'))
    opt.expectResult(4)
    opt.testCase(countChar('i', 'Mississippi'))
    opt.expectResult(4)
    opt.testCase(countChar('p', 'Mississippi'))
    opt.expectResult(2)
    opt.testCase(countChar('m', 'Mississippi'))
    opt.expectResult(1)
    opt.testCase(countChar('a', 'Mississippi'))
    opt.expectResult(0)

Although optimism has many benefits, one of its downsides is that it does not work in Jupyter notebooks. It only works properly when run within a .py file in an environment like Thonny. (Why? Because it uses the ability of a Python function to inspect details of the file in which it's run to work its magic. What magic? The fact that unlike other Python functions you have seen, opt.testCase does not evaluate its argument.) So to see the above function in action, you'll have to load the file testCountCharWithOptimism.py (found in the same folder as this notebook) into Thonny and run it there. When you do, you'll see output like this (except it will be colored in red):

>>> someOptimismCountCharTests()
 testCountCharWithOptimism.py:14
 testCountCharWithOptimism.py:16
  Result:
    3
  was different from the expected value:
    4
  Test expression was:
    countChar('i', 'Mississippi')
 testCountCharWithOptimism.py:18
 testCountCharWithOptimism.py:20
  Result:
    0
  was different from the expected value:
    1
  Test expression was:
    countChar('m', 'Mississippi')
 testCountCharWithOptimism.py:22

It's even possible to create testing functions like testCountChar that use optimism to test tuples of test cases:

def optimismTestCountChar(testTriples):
    for char, word, expected in testTriples:
        opt.testCase(countChar(char, word))
        opt.expectResult(expected)

This is significantly simpler that the definition for testCountChar above, yet still gives output with similar information:

>>> optimismFindCountCharFailures(testCases)
 testCountCharWithOptimism.py:43
 testCountCharWithOptimism.py:43
  Result:
    3
  was different from the expected value:
    4
  Test expression was:
    countChar(char, word)
  Values were:
    char = 'i'
    word = 'Mississippi'
 testCountCharWithOptimism.py:43
 testCountCharWithOptimism.py:43
  Result:
    0
  was different from the expected value:
    1
  Test expression was:
    countChar(char, word)
  Values were:
    char = 'm'
    word = 'Mississippi'
 testCountCharWithOptimism.py:43

optimism can handle many sophisticated testing features beyond the simple input/ouput testse shown here. To learn more, consult the full documentation for optimism.

Going forward with the rest of this lecture, you can either use functions similar to testCountChars and findCountCharFailures in this notebook, or you can use the optimism versions in the .py files in the same folder as this notebook.

9. Designing Black-box Test Cases

Testing a function like countChar without seeing its definition is called black-box testing, because the testing is purely based on its input/output behavior according to its contract without being able to see the code implementing the function. It's as if it's a mechanical contraption whose internal workings are hidden inside a black box and cannot be viewed.

9.1 Categories of Black-box Test Cases

When designing black-box tests, you must imagine ways in which the function might be implemented and how such implementations could go wrong. Some classes of test cases:

  1. Regular cases: These are "normal" cases that check basic advertised input/output functionality, like tests of counting different letters in "Mississippi" for countChar.

  2. Implied conditional cases: When the contract mention different categories of an input (e.g., positive or negative numbers, vowels vs. nonvowels), it implies that these categories will be checked by conditionals in the function body. Since those conditionals could be wrong, testing all combinations values from input categories is prudent.

  3. Edge cases: These are tests of extreme or special cases that the function might not handle properly. For example

    • For numeric inputs, extreme inputs can include 0, large numbers, negative numbers, and floats vs. ints.

    • Fencepost errors are off-by-one errors, which are common in programs. E.g n elements in a list are separated by n-1 commas, not n.

    • For inputs that are indices of sequences, test indices near the ends of the sequence, e.g., indices like 0, 1, -1 and len(seq), len(seq)-1, len(seq)+1.
      Since Python allows negative indices, you should also test -len(seq), -len(seq)-1, -len(seq)+1.

    • For functions involving elements of sequences, test elements in the first and last positions of the sequences, e.g. characters at the beginning and end of a string.

    • For inputs that are sequences, empty and small sequences are often not handled correctly, so you should always test empty and singleton strings/lists/tuples. When specific numbers are mentioned in the contract (e.g. isBeauteous tests for 3 consecutive vowels) it's important to test strings of length <= 3 as edge cases.

    • For inputs expected to be booleans, what happens if other Truthy/Falsey values are supplied? Is it OK that to treat other Truthy/Falsey values as True/False?

9.2 Example: Designing Black-box Tests for countChar

In the case of testing countChar, how confident are we that our tests with testing different characters in different capitalizations of "Mississippi" effectively tests countChar?

Rather than testing a long string like "Mississippi", it may be more effective to carefully test a combination of shorter strings and characters in those strings.

Some things to keep in mind:

Based on the above considerations, here is a set of black-box test cases for countChar:

9.3 Exercise: Black-box Testing of isBeauteous

Consider an isBeauteous function that has this contract:

Below, develop a list of black-box test cases for isBeauteous. (To use optimism, use the file testIsBeauteousUsingOptimism.py in the same folder as this notebook.)

Below is a function testIsBeateous that is does much more than the versions of testCountChar above. You are not expected to understand all the details, but here's what it does:

10. Designing Glass-box Test Cases and Minimal Counterexamples

Glass-box testing occurs when you are testing a function/program whose code you can inspect. Because you can see the implementation, you can focus on test cases that take advantage of implementation details in order to attempt to get the function to misbehave.

For example:

The main goal in glass-box testing is finding counterexamples = inputs that cause the function to misbehave. Particularly interesting counter examples are minimal counterexamples, which are the "shortest" counterexamples. E.g., in functions with string inputs, the shortest string that exhibits a bug is a minimal counterexample.

10.1 Example: Glass-Box Testing of isBeauteous1

Below is a buggy version of the isBeauteous function named isBeauteous1:

The problem with isBeauteous1 is that it just counts that the number of vowels in word is 3 without checking that they are consecutive. It will behave correctly on strings with fewer than 3 vowels or with 3 consecutive vowels, but will incorrectly return True for strings that have 3 vowels without having three consecutive vowels.

Below, give a minimal counterexample on which isBeauteous1 gives the incorrect answer:

10.2 Exercise: Minimal Counterexamples for Other Buggy isBeauteous Functions

Below are three other buggy versions of isBeauteous. Develop minimal counterexamples for each of them

11. Debugging Techniques

Test cases help us determine cases in which functions misbehave. But then how do we determine why they misbehave and how do fix them?

Here we study some debugging techniques for identifying and fixing bugs in programs. Most of these techniques involve adding print statements to a program.

You should also consult Peter Mawhorter's debugging poster, which is linked from the Reference menu of the CS111 web site.

11.1 Pay Attention to Error Messages

Sometimes bugs lead to errors when running a program. In many cases, studying the error messages will help you to identify the location of the bug. For example, can you use the error message to find and fix the bug in the following code?

In the case of Syntax Errors, Thonny often only notices these one or more lines after the actual syntax error, so you need to look for the error before the line being reported in the error message. For example, where is the bug in the following example?

11.2 Use print to show a function call with its arguments

It's generally helpful to know when a function is called and what arguments it has been called with.

Study what's printed by beats and play to help figure out why play is not correct.

11.3 Use print to show the return value of a function

In addition to showing the arguments to a function when a function is called, it's often a good idea to show both the arguments and the return value when it returns.

In order to do this, it is often necessary to introduce a variable (such as result) to first name the returned value so that it can be printed before it is returned (without recalculating it).

Here are example of code before/after adding the debugging prints:

When returns are performed in conditional branches, you should:

  1. Initialize result to None before the conditionals.
  2. Replace each return Expr by result = Expr
  3. End the function body with return result

Below are examples of some buggy code before/after adding the debugging prints. Use the printed output to help you find and fix the bugs:

11.4 Use print to show both calling and returning from a function

When a function is giving an error, it's a good idea to use print to show both when the function is called and when it returns.

Here's an example; use the printed information to find and fix the bug.

11.5 Using print to display iteration tables

We saw in Lec 08 While Loops and Lec 09 Sequences and Loop that a function adding two print statments to a loop (one right before the loop and one at the end of the loop body) can display an iteration table for the state variables of the loop.

Let's review that technique here in the context of debugging the definition of countChar given at the beginning of this notebook.

In countCharTable below, in addition to displaying the state variables i and counter, we also display word[i], since this is important for debugging.

For completeness, we might also want to print when the function is called and when it returns. But to avoid too much clutter, we will include only the iteration table prints in these examples.

We know from testing that countChar('m', 'mississippi') returns 0 when 1 is expected. Why is that? Let's see:

Ah, because i starts at 1 rather than 0, the letter starting the word is never counted. Let's fix that:

Now countCharTableFix1('m', 'mississippi') works as expected:

Below, countCharTableFix1('i', 'mississippi') still returns 3 rather than the expected 4. Why?

Oh, the loop never processes the last letter i at word[10] because the second argument to range is len(word)-1 rather than len(word). Let's fix this second bug:

With this second fix, countCharTableFix2('i', 'mississippi') now works as expected:

There's still another bug. countCharTableFix2('I', 'MISSISSIPPI') still returns 0 rather than the expected 4. Why is that?

The reason isn't obvious from the iteration table, but it does give a hint. Why is counter not being incremented when word[i] is the letter I? It's because we're comparing word[i] with char.lower() when we should be using word[i].lower() instead. Let's fix that third bug:

Now countCharTableFix3('I', 'MISSISSIPPI') works as expected.

If we fix the all three bugs in countChar, do we resolve all the test case failures?

Below, note that we comment out the debugging prints so they do not interfere with the testing. But we do not delete the debugging prints, since we may want to uncomment them for debugging purposes in the future!

Great! We now pass all the test cases. Does that mean our function is completely correct?

Not necessarily! Maybe there are some cases in which the function still doesn't work, but they're not in our list of test cases. So it may be too early to declare victory, but we've increased our confidence in the correctness of the countChar function definition.

12. Debugging Exercise with isBeauteous

Use the debugging techniques above, particularly printing iteration tables, to identify (but not necessarily fix) the bugs in the following buggy versions of isBeauteous. Test the print-augmented versions on potential counterexamples to identify bugs.