Instructions for predictText

(produced at 09:32 a.m. UTC on 2021-11-14)

This task is part of ps08 which is due at 23:59 EST on 2021-11-16.

You have the option to work with a partner on this task if you wish. Working with a partner requires more work to coordinate schedules, but if you work together and make sure that you are both understanding the code you write, you will make progress faster and learn more.

You can submit this task using this link.

Put all of your work for this task into the file predictText.py
(which is provided among the starter files)

Note: There is a video walkthrough of these task instructions available at: https://youtu.be/_VmY3y7J25k. It does not show the latest instructions, but the changes to the problem are very minor, so it is still relevant. If you watch the video, make sure to double-check the current rubric and not rely on the video for the details of what is required!

For this task, you will write a predictive text algorithm similar to the algorithm that smartphones use to suggest words as you type (the algorithm at work in the "let autocorrect fill in the rest" meme format, for example in the replies to this tweet ).

Background

How does your phone know what words to suggest when you're typing? Although the reality is slightly more complicated, the basic answer is: it knows which words you've typed before, and keeps track of which words tend to follow each other. Then it looks at the last word you typed, and makes a prediction based on that.

We're going to implement a simplified form of that using dictionaries to store the associations between each word in the text and a list of words that we've observed following that original word. For example, if we observe the words:

[ "she", "said", "that", "she", "was", "tired" ]

We will build this dictionary:

{
    "she": [ "said", "was" ],
    "said": [ "that" ]
    "that": [ "she" ]
    "was": [ "tired" ]
}

That dictionary represents the fact that after the word "she", we've seen the words "said" and "was" appear, after the word "said", the only word we've observed is "that," etc. Note that "tired" does not appear as a key in the dictionary, because we haven't seen any words following it.

To use this dictionary to make predictions, if we start with the word "she", we will pick either "said" or "was" as the next word. If we picked "was", then we would follow that with "tired" as the only choice, after which we are out of options. In that case, our algorithm will have to do something else, like pick a word from among the keys of the dictionary.

If we use this dictionary to make a lot of predictions, we might get sentences like the following (since it's a random process, there's no guarantee of exactly what would be picked):

  • She said that she was tired.
  • She said that she said that she said that she was tired.
  • She was tired.
  • She said that she was tired that she said that she was tired.

Note that the second instance of "that" in the last example above (bolded) was picked from the dictionary's keys because there are no predictions after "tired."

Part A: buildPredictions

To start with, we'll build a function which takes in a list of words (strings) and returns a prediction dictionary as described above.

For each word in the given text except the last one, a key should appear in the resulting dictionary, and every word observed after a copy of the key word should appear in a list following that initial word (note that these lists should include duplicates if a pair of words appears together more than once).

Your function must be fruitful and return the dictionary that it builds. Also note that you must use the lowercase version of all words for both keys and values in the dictionary.

If given an empty list or a list with just one word in it, buildPredictions should return an empty dictionary.

These examples demonstrate how buildPredictions works.

Part B: predictText

Now that we can build prediction dictionaries, we'd like to use them to predict some text. Write predictText which will take four parameters:

  • startWord the first word of the text that we'll predict.
  • howMany how many words to add (in addition to the start word).
  • predictions a predictions dictionary created by buildPredictions, which will have at least one key (and at least one word in the value for that key).
  • policy a policy string which determines which prediction to pick when there are multiple options (either multiple prediction possibilities, or multiple keys to pick from when no predictions are available). This string should be one of:
    • 'first' to choose the first available option.
    • 'last' to choose the last available option.
    • 'random' to choose a random option (must use the chooseRandomly function that we've supplied).

It will return a list of strings that starts with the given start word and then contains howMany additional words, each predicted based on the previous word.

There are two types of predictions:

  1. If the previous word is in our predictions dictionary, we will use the specified policy to pick a word from that word's associated value (which is a list of words).
  2. If the previous word does not appear in our predictions dictionary, we will use the specified policy to pick a key from our dictionary. Use list(predictions.keys()) to convert the keys into a list so that you can index it or pass it to the chooseRandomly function.

The entire process simply involves making these predictions repeatedly, adding one word to the result each time, until the required number of words have been added. Note that the startWord will always be provided in lower case.

Important: in 'random' mode, you must call chooseRandomly exactly once per word predicted (it matters how many calls are executed, not how many calls appear in your code, i.e., this is a process requirement, not a procedure requirement). To test your code fairly, we will replace chooseRandomly with a non-random function during grading. If you use a random process other than chooseRandomly, or if you call chooseRandomly more than once per word predicted, you will not get full credit! As an extra goal, you can attempt to include only a single line of code that calls chooseRandomly, but the core one-call-per-iteration requriement can be satisfied even if there are multiple lines of code that would call chooseRandomly as long as they're in different branches of a conditional.

You can use the provided buildString function to turn a list of words resulting from predictText into a simple string for easier reading.

Here are some examples of results that you should get from predictText.

And here are a few examples of what the output might look like with the 'random' policy.

Testing

We've provided some optimism tests for buildPredictions and predictText, but they are incomplete. This isn't part of the rubric, but to ensure your functions are working properly, you should fill in the parts of the test function where comments indicate there is something "TODO." You should also test your predictText function's behavior when the 'random' policy is used by running it an inspecting that the results always pair words together that were paired in the original word list, except when a word with no following words (i.e., the last word of the original list) is encountered. Because it's random, you can't really use optimism to test the 'random' policy.

Notes

Challenge: buildDoublePredictions and predictTextAdvanced

This part is entirely optional, and is not graded or worth any points.

You may have noticed that the predictions above are often repetitive and turn into nonsense. To get better predictions (more like what real predictive text does) we could look at the last two words of the text, instead of just the last word, and make a prediction based on that. Of course, to do that, our predictions dictionary would have to use pairs of words as its keys. As an optional, ungraded challenge, try to write a buildDoublePredictions function to create a two-word prediction dictionary, and then a predictTextAdvanced function which can use those dictionaries to predict text more accurately. Note: you may have to fall back to one-word predictions in some cases. If you pursue this challenge, the precise function definitions should be:

def buildDoublePredictions(words):
    ...


def predictTextAdvanced(
    startWord,
    howMany,
    singlePredictions,
    doublePredictions,
    policy
):
    ...

Examples

buildPredictions examples

These examples demonstrate how buildPredictions works. Notice how in the second example, the word 'blue' appears twice in the list of words following 'the', because 'the blue' appears twice in the input.

In []:
buildPredictions(['ONE', 'two', 'one', 'three'])
Out[]:
{'one': ['two', 'three'], 'two': ['one']}
In []:
buildPredictions(['The', 'blue', 'whale', 'wore', 'the', 'blue', 'hat.'])
Out[]:
{ 'the': ['blue', 'blue'], 'blue': ['whale', 'hat.'], 'whale': ['wore'], 'wore': ['the'], }

predictText examples

These examples show how predictText should work. Note that the input to predictText comes in part from the result of buildPredictions.

In []:
predictions = buildPredictions([ "ONE", "two", "one", "three" ]) predictText("one", 6, predictions, 'first')
Out[]:
['one', 'two', 'one', 'two', 'one', 'two', 'one']
In []:
predictions = buildPredictions([ "ONE", "two", "one", "three" ]) predictText("one", 10, predictions, 'last')
Out[]:
[ 'one', 'three', 'two', 'one', 'three', 'two', 'one', 'three', 'two', 'one', 'three', ]
In []:
blue = buildPredictions(blueWhale) predictText("the", 10, blue, 'first')
Out[]:
[ 'the', 'blue', 'whale', 'wore', 'the', 'blue', 'whale', 'wore', 'the', 'blue', 'whale', ]
In []:
blue = buildPredictions(blueWhale) print(buildString(predictText("the", 10, blue, 'first'))) print(buildString(predictText("the", 10, blue, 'last')))
Prints
The blue whale wore the blue whale wore the blue whale The blue hat. Wore the blue hat. Wore the blue hat.

predictText random policy

predictText random policy examples. Notice how calling the function multiple times with the same argument gives different results, because the choices are random.

In []:
blue = buildPredictions(blueWhale) print(buildString(predictText("the", 15, blue, 'random'))) print(buildString(predictText("the", 15, blue, 'random'))) print(buildString(predictText("the", 15, blue, 'random')))
Prints
The blue hat. Whale wore the blue whale wore the blue hat. Wore the blue whale The blue whale wore the blue whale wore the blue hat. The blue hat. Whale wore The blue hat. Blue hat. Whale wore the blue hat. Whale wore the blue hat. The

Rubric

 
unknown Style Requirements
How your code is written.
 
unknown Core goals
Complete all core goals for core credit. Get partial credit for completing at least half, and more partial credit for completing at least 90%.
 
unknown All functions are documented
Each function you define must include a non-empty documentation string as the very first thing in the function.
 
unknown Procedure Requirements
What code you use to solve the problem.
 
unknown Core goals
Complete all core goals for core credit. Get partial credit for completing at least half, and more partial credit for completing at least 90%.
 
unknown Define buildPredictions
Use def to define buildPredictions
 
unknown Call lower
Within the definition of buildPredictions, call lower in at least once place.
 
unknown Use a loop
Within the definition of buildPredictions, use any kind of loop in at least once place.
 
unknown Define predictText
Use def to define predictText
 
unknown Use list(predictions.keys()).
Within the definition of predictText, you must use the exact expression list(predictions.keys()) to access the collection of all possible words as a list, in order to handle the case for a words which do not appear in the predictions dictionary as a key.
 
unknown Use a loop
Within the definition of predictText, use any kind of loop in at least once place.
 
unknown Use a conditional
Within the loop within the definition of predictText, use an if statement (possibly accompanied by an elif or else block) in at least once place.
 
unknown Call chooseRandomly
Within the loop within the definition of predictText, call chooseRandomly in at least once place.
 
unknown Extra goals
Complete all extra goals in addition to the core goals for a perfect score.
 
unknown Do not ignore the results of any fruitful function calls
According to the "Don't waste fruit" principle, every place you call a fruitful function (built-in or custom) you must store the result in a variable, or that function call must be part of a larger expression that uses its return value.
 
unknown Do not create any variables that you never make use of
According to the "Don't waste boxes" principle, every time you create a variable (using = or by defining a parameter for a function) you must also later use that variable as part of another expression. If you need to create a variable that you won't use, it must have the name _, but you should only do this if absolutely necessary.
 
unknown Define buildPredictions
Use def to define buildPredictions
 
unknown Use a loop
Within the definition of buildPredictions, use any kind of loop in exactly one place.
 
unknown Define predictText
Use def to define predictText
 
unknown Use a loop
Within the definition of predictText, use any kind of loop in at least once place.
 
unknown Call chooseRandomly
Within the loop within the definition of predictText, call chooseRandomly in at most one place.
 
unknown Product Requirements
Your code's result values.
 
unknown Core goals
Complete all core goals for core credit. Get partial credit for completing at least half, and more partial credit for completing at least 90%.
 
unknown buildPredictions must return the correct result
The result returned when your buildPredictions function is run must match the solution result.
 
unknown predictText must return the correct result
The result returned when your predictText function is run must match the solution result.
 
unknown Extra goals
Complete all extra goals in addition to the core goals for a perfect score.
 
unknown buildPredictions must return the correct result
The result returned when your buildPredictions function is run must match the solution result.
 
unknown predictText must return the correct result
The result returned when your predictText function is run must match the solution result.