predictText Instructions

This task is part of project09 which is due at 23:00 EDT on 2025-04-08.

You have the option to work with a partner on this task if you wish. Working with a partner requires more work to coordinate schedules, but if you work together and make sure that you are both understanding the code you write, you will make progress faster and learn more.

You can download the starter code for this task using this link.

You can submit this task using this link.

Put all of your work for this task into the file predictText.py
(which is provided among the starter files)

Click here to toggle highlighting for goal links

For this task, you will write a predictive text algorithm similar to the algorithm that smartphones use to suggest words as you type (the algorithm at work in the "let autocorrect fill in the rest" meme format, for example in the replies to this tweet ).

Background

How does your phone know what words to suggest when you're typing? Although the reality is slightly more complicated, the basic answer is: it knows which words you've typed before, and keeps track of which words tend to follow each other. Then it looks at the last word you typed, and makes a prediction based on that.

We're going to implement a simplified form of that using dictionaries to store the associations between each word in the text and a list of words that we've observed following that original word. For example, if we observe the words:

[ "she", "said", "that", "she", "was", "tired" ]

We will build this dictionary:

{
    "she": [ "said", "was" ],
    "said": [ "that" ]
    "that": [ "she" ]
    "was": [ "tired" ]
}

That dictionary represents the fact that after the word "she", we've seen the words "said" and "was" appear, after the word "said", the only word we've observed is "that," etc. Note that "tired" does not appear as a key in the dictionary, because we haven't seen any words following it.

To use this dictionary to make predictions, if we start with the word "she", we will pick either "said" or "was" as the next word. If we picked "was", then we would follow that with "tired" as the only choice, after which we are out of options. In that case, our algorithm will have to do something else, like use a default word.

If we use this dictionary to make a lot of predictions, we might get sentences like the following (since it's a random process, there's no guarantee of exactly what would be picked):

She said that she was tired.
She said that she said that she said that she was tired.
She was tired.
She said that she was tired she said that she was tired.

Note that the third instance of "she" in the last example above (bolded) was a default word because the dictionary has no predictions after "tired."

Part A: `buildPredictions`

To start with, we'll build a function which takes in a list of words (strings) and returns a prediction dictionary as described above.

For each word in the given text except the last one, a key should appear in the resulting dictionary, and every word observed after a copy of the key word should appear in a list following that initial word (note that these lists should include duplicates if a pair of words appears together more than once).

Your function must be fruitful and return the dictionary that it builds. Also note that you must use the lowercase version of all words for both keys and values in the dictionary. buildPredictions must use a loop and ideally should use only a single loop.

If given an empty list or a list with just one word in it, buildPredictions should return an empty dictionary.

These examples demonstrate how buildPredictions works.

Part B: `nextWord`

Now we need to use the predictions dictionary to predict words: Write a nextWord function which takes three parameters: a current word, a predictions dictionary, and a default word.

nextWord should use the chooseRandomly function (in exactly 1 place) to pick and return a random word from the list of words stored in the predictions dictionary as observations of words that followed the current word. But if there are no observations for that current word in the dictionary, it must not call chooseRandomly and should instead return the specified default word.

Here are some examples of results that you should get from nextWord.

Note that nexWord must use .lower to ensure that it looks up the lowercase version of the given word, since the current word may be capitalized but all entries in the predictions dictionary are lower-case. Also, nextWord does not need to and may not use a loop.

Important: you must call chooseRandomly exactly once per word predicted (it matters how many calls are executed, not just how many appear in your code). To test your code fairly, we will replace chooseRandomly with a non-random function during grading. If you use a random process other than chooseRandomly, or if you call chooseRandomly more than once per word predicted, or if you call it when you need to return the default word (even if you ignore the results), you will not get full credit!

Part C: `predictText`

Now that we can predict individual words, we'd like predict multiple words of text. Write predictText which will take three parameters:

startWord the first word of the text that we'll predict.
howMany how many words to add (in addition to the start word).
predictions a predictions dictionary created by buildPredictions.

It will return a list of strings that starts with the given start word and then contains howMany additional words, each predicted based on the previous word using the nextWord function. Note that predictText must use a loop and it must call nextWord within that loop. Ideally, it should not call nextWord in more than one place. Also, if it calls nextWord more than once per word predicted, it will not be able to generate the correct word sequences given our testing strategy.

The default word used with the nextWord function for words not in the predictions dictionary should be the starting word.

You can use the provided buildString function to turn a list of words resulting from predictText into a simple string for easier reading (the testing code does this).

Here are some examples of results that you should get from predictText.

Testing

We've provided a test_predictText.py file which has some optimism tests for buildPredictions, nextWord, and predictText. It even tests by using a set seed to fix the sequence of random numbers generated by the random module. If you run that file, it will also print out several random word sequences at the end that you can inspect to get a feel for what kinds of text can be generated.

Notes

As usual, each function you write must be documented.
You shouldn't create any variables you don't make use of, nor should you create any values which you don't either store in a variable or make use of. Note that the loop variable of a for loop is ignored when checking for wasted boxes, because sometimes you want to use a for loop but don't actually need the loop variable for anything.

Challenge: `resample`

This part is entirely optional, and is not graded or worth any points.

As an extra challenge, you can try writing a function called resample which takes an input filename and an output filename, along with a number of words. It should read the input file, build a predictions dictionary based on the words in that file, and then use that dictionary to generate the requested number of words (starting with the same first word as the file does). The generated words should be written, separated by spaces, to the output file.

We have included a few novels from Project Gutenberg in the novels directory of the starter code in case you'd like to try out generating text from a larger set of input words. If you try this, you'll quickly find that the results are not very coherent: it takes a bit more than just predicting based on a single previous word to actually get good text predictions.

Examples

buildPredictions examples

These examples demonstrate how buildPredictions works. Notice how in the second example, the word 'blue' appears twice in the list of words following 'the', because 'the blue' appears twice in the input.

In []:buildPredictions(['ONE', 'two', 'one', 'three'])
Out[]:{'one': ['two', 'three'], 'two': ['one']}

In []:buildPredictions(['The', 'blue', 'whale', 'wore', 'the', 'blue', 'hat.'])
Out[]:{
 'the': ['blue', 'blue'],
 'blue': ['whale', 'hat.'],
 'whale': ['wore'],
 'wore': ['the'],
}

nextWord examples

These examples demonstrate how nextWord works.

In []:import random
random.seed(111)  # Fixes a specific sequence of random decisions
predictions = buildPredictions([ "ONE", "two", "one", "three" ])
print(nextWord("one", predictions, 'DEFAULT'))  # two possible outcomes
print(nextWord("two", predictions, 'DEFAULT'))  # only one possibility
print(nextWord("three", predictions, 'DEFAULT'))  # only one possibility

Printstwo
one
DEFAULT

In []:import random
random.seed(111)  # Fixes a specific sequence of random decisions
blueWhale = [ "The", "blue", "whale", "wore", "the", "blue", "hat." ]
blue = buildPredictions(blueWhale)
print(nextWord("the", blue, "the"))
print(nextWord("blue", blue, "the"))
print(nextWord("whale", blue, "the"))
print(nextWord("hat.", blue, "the"))

Printsblue
hat.
wore
the

predictText examples

These examples show how predictText should work. Note that one of the arguments to predictText is the result of buildPredictions. Also notice how calling the function multiple times with the same argument gives different results, because the choices are random.

In []:import random
random.seed(111)  # Fixes a specific sequence of random decisions
predictions = buildPredictions([ "ONE", "two", "one", "three" ])
predictText("one", 6, predictions)

Out[]:['one', 'two', 'one', 'three', 'one', 'two', 'one']

In []:import random
random.seed(111)  # Fixes a specific sequence of random decisions
predictions = buildPredictions([ "ONE", "two", "one", "three" ])
predictText("one", 10, predictions)

Out[]:[
  'one',
  'two',
  'one',
  'three',
  'one',
  'two',
  'one',
  'three',
  'one',
  'two',
  'one',
]

In []:import random
random.seed(111)  # Fixes a specific sequence of random decisions
blueWhale = [ "The", "blue", "whale", "wore", "the", "blue", "hat." ]
blue = buildPredictions(blueWhale)
predictText("the", 10, blue)

Out[]:[
  'the',
  'blue',
  'hat.',
  'the',
  'blue',
  'whale',
  'wore',
  'the',
  'blue',
  'whale',
  'wore',
]

In []:import random
random.seed(111)  # Fixes a specific sequence of random decisions
blueWhale = [ "The", "blue", "whale", "wore", "the", "blue", "hat." ]
blue = buildPredictions(blueWhale)
print(predictText("the", 10, blue))
print(predictText("the", 10, blue))
print(predictText("the", 10, blue))

Prints['the', 'blue', 'hat.', 'the', 'blue', 'whale', 'wore', 'the', 'blue', 'whale', 'wore']
['the', 'blue', 'whale', 'wore', 'the', 'blue', 'hat.', 'the', 'blue', 'whale', 'wore']
['the', 'blue', 'whale', 'wore', 'the', 'blue', 'hat.', 'the', 'blue', 'hat.', 'the']

Rubric

Group goals:

All functions are documented

Each function you define must include a non-empty documentation string as the very first thing in the function.

buildPredictions must return the correct result

The result returned when your buildPredictions function is run must match the solution result.

nextWord must return the correct result

The result returned when your nextWord function is run must match the solution result.

predictText must return the correct result

The result returned when your predictText function is run must match the solution result.

Define buildPredictions

Use def to define buildPredictions

Call lower

Within the definition of buildPredictions, call lower in at least one place.

Use any kind of loop

Within the definition of buildPredictions, use any kind of loop in at least one place.

Define nextWord

Use def to define nextWord

Call lower

Within the definition of nextWord, call lower in at least one place.

Do not use any kind of loop

Within the definition of nextWord, do not use any kind of loop.

Call chooseRandomly

Within the definition of nextWord, call chooseRandomly in exactly one place.

Define predictText

Use def to define predictText

Use any kind of loop

Within the definition of predictText, use any kind of loop in at least one place.

Call nextWord

Within the loop within the definition of predictText, call nextWord in at least one place.

Do not ignore the results of any fruitful function calls

According to the "Don't waste fruit" principle, every place you call a fruitful function (built-in or custom) you must store the result in a variable, or that function call must be part of a larger expression that uses its return value.

Do not create any variables that you never make use of

According to the "Don't waste boxes" principle, every time you create a variable (using = or by defining a parameter for a function) you must also later use that variable as part of another expression. If you need to create a variable that you won't use, it must have the name _, but you should only do this if absolutely necessary.

buildPredictions must return the correct result

The result returned when your buildPredictions function is run must match the solution result.

nextWord must return the correct result

The result returned when your nextWord function is run must match the solution result.

predictText must return the correct result

The result returned when your predictText function is run must match the solution result.

Define buildPredictions

Use def to define buildPredictions

Use any kind of loop

Within the definition of buildPredictions, use any kind of loop in exactly one place.

Define predictText

Use def to define predictText

Call nextWord

Within the definition of predictText, call nextWord in exactly one place.

Instructions for predictText

Background

Part A: buildPredictions

Part B: nextWord

Part C: predictText

Testing

Notes

Challenge: resample

Examples

Rubric

Part A: `buildPredictions`

Part B: `nextWord`

Part C: `predictText`

Challenge: `resample`