Instructions for predictText

(produced at 14:45 UTC on 2024-04-01)

This task is part of project09 which is due at 23:00 EDT on 2024-04-09.

You have the option to work with a partner on this task if you wish. Working with a partner requires more work to coordinate schedules, but if you work together and make sure that you are both understanding the code you write, you will make progress faster and learn more.

You can download the starter code for this task using this link.

You can submit this task using this link.

Put all of your work for this task into the file predictText.py
(which is provided among the starter files)

For this task, you will write a predictive text algorithm similar to the algorithm that smartphones use to suggest words as you type (the algorithm at work in the "let autocorrect fill in the rest" meme format, for example in the replies to this tweet ).

Background

How does your phone know what words to suggest when you're typing? Although the reality is slightly more complicated, the basic answer is: it knows which words you've typed before, and keeps track of which words tend to follow each other. Then it looks at the last word you typed, and makes a prediction based on that.

We're going to implement a simplified form of that using dictionaries to store the associations between each word in the text and a list of words that we've observed following that original word. For example, if we observe the words:

[ "she", "said", "that", "she", "was", "tired" ]

We will build this dictionary:

{
    "she": [ "said", "was" ],
    "said": [ "that" ]
    "that": [ "she" ]
    "was": [ "tired" ]
}

That dictionary represents the fact that after the word "she", we've seen the words "said" and "was" appear, after the word "said", the only word we've observed is "that," etc. Note that "tired" does not appear as a key in the dictionary, because we haven't seen any words following it.

To use this dictionary to make predictions, if we start with the word "she", we will pick either "said" or "was" as the next word. If we picked "was", then we would follow that with "tired" as the only choice, after which we are out of options. In that case, our algorithm will have to do something else, like use a default word.

If we use this dictionary to make a lot of predictions, we might get sentences like the following (since it's a random process, there's no guarantee of exactly what would be picked):

  • She said that she was tired.
  • She said that she said that she said that she was tired.
  • She was tired.
  • She said that she was tired she said that she was tired.

Note that the third instance of "she" in the last example above (bolded) was a default word because the dictionary has no predictions after "tired."

Part A: buildPredictions

To start with, we'll build a function which takes in a list of words (strings) and returns a prediction dictionary as described above.

For each word in the given text except the last one, a key should appear in the resulting dictionary, and every word observed after a copy of the key word should appear in a list following that initial word (note that these lists should include duplicates if a pair of words appears together more than once).

Your function must be fruitful and return the dictionary that it builds. Also note that you must use the lowercase version of all words for both keys and values in the dictionary. buildPredictions must use a loop and ideally should use only a single loop.

If given an empty list or a list with just one word in it, buildPredictions should return an empty dictionary.

These examples demonstrate how buildPredictions works.

Part B: nextWord

Now we need to use the predictions dictionary to predict words: Write a nextWord function which takes three parameters: a current word, a predictions dictionary, and a default word.

nextWord should use the chooseRandomly function (in exactly 1 place) to pick and return a random word from the list of words stored in the predictions dictionary as observations of words that followed the current word. But if there are no observations for that current word in the dictionary, it must not call chooseRandomly and should instead return the specified default word.

Here are some examples of results that you should get from nextWord.

Note that nexWord must use .lower to ensure that it looks up the lowercase version of the given word, since the current word may be capitalized but all entries in the predictions dictionary are lower-case. Also, nextWord does not need to and may not use a loop.

Important: you must call chooseRandomly exactly once per word predicted (it matters how many calls are executed, not just how many appear in your code). To test your code fairly, we will replace chooseRandomly with a non-random function during grading. If you use a random process other than chooseRandomly, or if you call chooseRandomly more than once per word predicted, or if you call it when you need to return the default word (even if you ignore the results), you will not get full credit!

Part C: predictText

Now that we can predict individual words, we'd like predict multiple words of text. Write predictText which will take three parameters:

  • startWord the first word of the text that we'll predict.
  • howMany how many words to add (in addition to the start word).
  • predictions a predictions dictionary created by buildPredictions.

It will return a list of strings that starts with the given start word and then contains howMany additional words, each predicted based on the previous word using the nextWord function. Note that predictText must use a loop and it must call nextWord within that loop. Ideally, it should not call nextWord in more than one place. Also, if it calls nextWord more than once per word predicted, it will not be able to generate the correct word sequences given our testing strategy.

The default word used with the nextWord function for words not in the predictions dictionary should be the starting word.

You can use the provided buildString function to turn a list of words resulting from predictText into a simple string for easier reading (the testing code does this).

Here are some examples of results that you should get from predictText.

Testing

We've provided a test_predictText.py file which has some optimism tests for buildPredictions, nextWord, and predictText. It even tests by using a set seed to fix the sequence of random numbers generated by the random module. If you run that file, it will also print out several random word sequences at the end that you can inspect to get a feel for what kinds of text can be generated.

Notes

Challenge: resample

This part is entirely optional, and is not graded or worth any points.

As an extra challenge, you can try writing a function called resample which takes an input filename and an output filename, along with a number of words. It should read the input file, build a predictions dictionary based on the words in that file, and then use that dictionary to generate the requested number of words (starting with the same first word as the file does). The generated words should be written, separated by spaces, to the output file.

We have included a few novels from Project Gutenberg in the novels directory of the starter code in case you'd like to try out generating text from a larger set of input words. If you try this, you'll quickly find that the results are not very coherent: it takes a bit more than just predicting based on a single previous word to actually get good text predictions.

Examples

buildPredictions examples

These examples demonstrate how buildPredictions works. Notice how in the second example, the word 'blue' appears twice in the list of words following 'the', because 'the blue' appears twice in the input.

In []:
buildPredictions(['ONE', 'two', 'one', 'three'])
Out[]:
{'one': ['two', 'three'], 'two': ['one']}
In []:
buildPredictions(['The', 'blue', 'whale', 'wore', 'the', 'blue', 'hat.'])
Out[]:
{ 'the': ['blue', 'blue'], 'blue': ['whale', 'hat.'], 'whale': ['wore'], 'wore': ['the'], }

nextWord examples

These examples demonstrate how nextWord works.

In []:
import random random.seed(111) # Fixes a specific sequence of random decisions predictions = buildPredictions([ "ONE", "two", "one", "three" ]) print(nextWord("one", predictions, 'DEFAULT')) # two possible outcomes print(nextWord("two", predictions, 'DEFAULT')) # only one possibility print(nextWord("three", predictions, 'DEFAULT')) # only one possibility
Prints
two one DEFAULT
In []:
import random random.seed(111) # Fixes a specific sequence of random decisions blueWhale = [ "The", "blue", "whale", "wore", "the", "blue", "hat." ] blue = buildPredictions(blueWhale) print(nextWord("the", blue, "the")) print(nextWord("blue", blue, "the")) print(nextWord("whale", blue, "the")) print(nextWord("hat.", blue, "the"))
Prints
blue hat. wore the

predictText examples

These examples show how predictText should work. Note that one of the arguments to predictText is the result of buildPredictions. Also notice how calling the function multiple times with the same argument gives different results, because the choices are random.

In []:
import random random.seed(111) # Fixes a specific sequence of random decisions predictions = buildPredictions([ "ONE", "two", "one", "three" ]) predictText("one", 6, predictions)
Out[]:
['one', 'two', 'one', 'three', 'one', 'two', 'one']
In []:
import random random.seed(111) # Fixes a specific sequence of random decisions predictions = buildPredictions([ "ONE", "two", "one", "three" ]) predictText("one", 10, predictions)
Out[]:
[ 'one', 'two', 'one', 'three', 'one', 'two', 'one', 'three', 'one', 'two', 'one', ]
In []:
import random random.seed(111) # Fixes a specific sequence of random decisions blueWhale = [ "The", "blue", "whale", "wore", "the", "blue", "hat." ] blue = buildPredictions(blueWhale) predictText("the", 10, blue)
Out[]:
[ 'the', 'blue', 'hat.', 'the', 'blue', 'whale', 'wore', 'the', 'blue', 'whale', 'wore', ]
In []:
import random random.seed(111) # Fixes a specific sequence of random decisions blueWhale = [ "The", "blue", "whale", "wore", "the", "blue", "hat." ] blue = buildPredictions(blueWhale) print(predictText("the", 10, blue)) print(predictText("the", 10, blue)) print(predictText("the", 10, blue))
Prints
['the', 'blue', 'hat.', 'the', 'blue', 'whale', 'wore', 'the', 'blue', 'whale', 'wore'] ['the', 'blue', 'whale', 'wore', 'the', 'blue', 'hat.', 'the', 'blue', 'whale', 'wore'] ['the', 'blue', 'whale', 'wore', 'the', 'blue', 'hat.', 'the', 'blue', 'hat.', 'the']

Rubric

Group goals:
 
unknown Do not ignore the results of any fruitful function calls
According to the "Don't waste fruit" principle, every place you call a fruitful function (built-in or custom) you must store the result in a variable, or that function call must be part of a larger expression that uses its return value.
 
unknown Do not create any variables that you never make use of
According to the "Don't waste boxes" principle, every time you create a variable (using = or by defining a parameter for a function) you must also later use that variable as part of another expression. If you need to create a variable that you won't use, it must have the name _, but you should only do this if absolutely necessary.
 
unknown All functions are documented
Each function you define must include a non-empty documentation string as the very first thing in the function.
 
unknown buildPredictions must return the correct result
The result returned when your buildPredictions function is run must match the solution result.
 
unknown buildPredictions must return the correct result
The result returned when your buildPredictions function is run must match the solution result.
 
unknown nextWord must return the correct result
The result returned when your nextWord function is run must match the solution result.
 
unknown nextWord must return the correct result
The result returned when your nextWord function is run must match the solution result.
 
unknown predictText must return the correct result
The result returned when your predictText function is run must match the solution result.
 
unknown predictText must return the correct result
The result returned when your predictText function is run must match the solution result.
 
unknown Define buildPredictions
Use def to define buildPredictions
 
unknown Call lower
Within the definition of buildPredictions, call lower in at least one place.
 
unknown Use any kind of loop
Within the definition of buildPredictions, use any kind of loop in at least one place.
 
unknown Define buildPredictions
Use def to define buildPredictions
 
unknown Use any kind of loop
Within the definition of buildPredictions, use any kind of loop in exactly one place.
 
unknown Define nextWord
Use def to define nextWord
 
unknown Call lower
Within the definition of nextWord, call lower in at least one place.
 
unknown Do not use any kind of loop
Within the definition of nextWord, do not use any kind of loop.
 
unknown Call chooseRandomly
Within the definition of nextWord, call chooseRandomly in exactly one place.
 
unknown Define predictText
Use def to define predictText
 
unknown Use any kind of loop
Within the definition of predictText, use any kind of loop in at least one place.
 
unknown Call nextWord
Within the loop within the definition of predictText, call nextWord in at least one place.
 
unknown Define predictText
Use def to define predictText
 
unknown Call nextWord
Within the definition of predictText, call nextWord in exactly one place.