This task is part of project09 which is due at 23:00 EDT on 2024-04-09.
You have the option to work with a partner on this task if you wish. Working with a partner requires more work to coordinate schedules, but if you work together and make sure that you are both understanding the code you write, you will make progress faster and learn more.
You can download the starter code for this task using this link.
You can submit this task using this link.
Put all of your work for this task into the file
predictText.py
(which is provided among the starter files)
For this task, you will write a predictive text algorithm similar to the algorithm that smartphones use to suggest words as you type (the algorithm at work in the "let autocorrect fill in the rest" meme format, for example in the replies to this tweet ).
How does your phone know what words to suggest when you're typing? Although the reality is slightly more complicated, the basic answer is: it knows which words you've typed before, and keeps track of which words tend to follow each other. Then it looks at the last word you typed, and makes a prediction based on that.
We're going to implement a simplified form of that using dictionaries to store the associations between each word in the text and a list of words that we've observed following that original word. For example, if we observe the words:
[ "she", "said", "that", "she", "was", "tired" ]
We will build this dictionary:
{
"she": [ "said", "was" ],
"said": [ "that" ]
"that": [ "she" ]
"was": [ "tired" ]
}
That dictionary represents the fact that after the word "she", we've seen the words "said" and "was" appear, after the word "said", the only word we've observed is "that," etc. Note that "tired" does not appear as a key in the dictionary, because we haven't seen any words following it.
To use this dictionary to make predictions, if we start with the word "she", we will pick either "said" or "was" as the next word. If we picked "was", then we would follow that with "tired" as the only choice, after which we are out of options. In that case, our algorithm will have to do something else, like use a default word.
If we use this dictionary to make a lot of predictions, we might get sentences like the following (since it's a random process, there's no guarantee of exactly what would be picked):
Note that the third instance of "she" in the last example above (bolded) was a default word because the dictionary has no predictions after "tired."
buildPredictions
To start with, we'll build a function which takes in a list of words (strings) and returns a prediction dictionary as described above.
For each word in the given text except the last one, a key should appear in the resulting dictionary, and every word observed after a copy of the key word should appear in a list following that initial word (note that these lists should include duplicates if a pair of words appears together more than once).
Your function must be fruitful and return the dictionary that it
builds. Also note that you must
use the lowercase
version of all words
for both keys and values in the dictionary. buildPredictions
must use
a loop and ideally should
use only a single loop.
If given an empty list or a list with just one word in it,
buildPredictions
should return an empty
dictionary.
These examples demonstrate how buildPredictions
works.
nextWord
Now we need to use the predictions dictionary to predict words: Write a
nextWord
function which takes three
parameters: a current word, a predictions dictionary, and a default word.
nextWord
should use the chooseRandomly
function (in exactly 1
place) to pick and
return a random word from the list of words stored in the predictions
dictionary as observations of words that followed the current
word. But if there are no observations for
that current word in the dictionary, it must not call
chooseRandomly
and should instead return the specified default
word.
Here are some examples of results that you should get from
nextWord
.
Note that nexWord
must use
.lower
to ensure that it
looks up the lowercase version of the given word, since the current word
may be capitalized but all entries in the predictions dictionary are
lower-case. Also, nextWord
does not need to and may not use a
loop.
Important: you must call
chooseRandomly
exactly once per word predicted (it matters how many calls are
executed, not just how many appear in your code). To test your code
fairly, we will replace chooseRandomly
with a non-random function
during grading. If you use a random process other than chooseRandomly
,
or if you call chooseRandomly
more than once per word predicted, or if
you call it when you need to return the default word (even if you ignore
the results), you will not get full credit!
predictText
Now that we can predict individual words, we'd like predict multiple
words of text. Write predictText
which will take three parameters:
startWord
the first word of the text that we'll predict.howMany
how many words to add (in addition to the start word).predictions
a predictions dictionary created by buildPredictions
.It will return a list of strings that
starts with the given start word and then contains howMany
additional
words, each predicted based on the previous word using the nextWord
function. Note that
predictText
must use a
loop and it must call
nextWord
within that
loop. Ideally, it
should not call nextWord
in more than one
place. Also, if it
calls nextWord
more than once per word predicted, it will not be able
to generate the correct word sequences given our testing strategy.
The default word used with the nextWord
function for words not in the
predictions dictionary should be the starting word.
You can use the provided buildString
function to turn a list of words
resulting from predictText
into a simple string for easier reading (the
testing code does this).
Here are some examples of results that you should get from
predictText
.
We've provided a test_predictText.py
file which has some optimism
tests for buildPredictions
, nextWord
, and predictText
. It even
tests by using a set seed to fix the sequence of random numbers generated
by the random
module. If you run that file, it will also print out
several random word sequences at the end that you can inspect to get a
feel for what kinds of text can be generated.
for
loop is ignored when checking for wasted boxes, because
sometimes you want to use a for
loop but don't actually need the
loop variable for anything.resample
This part is entirely optional, and is not graded or worth any points.
As an extra challenge, you can try writing a function called resample
which takes an input filename and an output filename, along with a number
of words. It should read the input file, build a predictions dictionary
based on the words in that file, and then use that dictionary to generate
the requested number of words (starting with the same first word as the
file does). The generated words should be written, separated by spaces,
to the output file.
We have included a few novels from Project
Gutenberg in the novels
directory of the
starter code in case you'd like to try out generating text from a larger
set of input words. If you try this, you'll quickly find that the results
are not very coherent: it takes a bit more than just predicting based on
a single previous word to actually get good text predictions.
buildPredictions
examples
These examples demonstrate how buildPredictions
works. Notice how in
the second example, the word 'blue' appears twice in the list of words
following 'the', because 'the blue' appears twice in the input.
In []:Out[]:buildPredictions(['ONE', 'two', 'one', 'three'])
In []:{'one': ['two', 'three'], 'two': ['one']}
Out[]:buildPredictions(['The', 'blue', 'whale', 'wore', 'the', 'blue', 'hat.'])
{ 'the': ['blue', 'blue'], 'blue': ['whale', 'hat.'], 'whale': ['wore'], 'wore': ['the'], }
nextWord
examples
These examples demonstrate how nextWord
works.
In []:Printsimport random random.seed(111) # Fixes a specific sequence of random decisions predictions = buildPredictions([ "ONE", "two", "one", "three" ]) print(nextWord("one", predictions, 'DEFAULT')) # two possible outcomes print(nextWord("two", predictions, 'DEFAULT')) # only one possibility print(nextWord("three", predictions, 'DEFAULT')) # only one possibility
two one DEFAULTIn []:Printsimport random random.seed(111) # Fixes a specific sequence of random decisions blueWhale = [ "The", "blue", "whale", "wore", "the", "blue", "hat." ] blue = buildPredictions(blueWhale) print(nextWord("the", blue, "the")) print(nextWord("blue", blue, "the")) print(nextWord("whale", blue, "the")) print(nextWord("hat.", blue, "the"))
blue hat. wore the
predictText
examples
These examples show how predictText
should work. Note that one of the
arguments to predictText
is the result of buildPredictions
. Also
notice how calling the function multiple times with the same argument
gives different results, because the choices are random.
In []:Out[]:import random random.seed(111) # Fixes a specific sequence of random decisions predictions = buildPredictions([ "ONE", "two", "one", "three" ]) predictText("one", 6, predictions)
In []:['one', 'two', 'one', 'three', 'one', 'two', 'one']
Out[]:import random random.seed(111) # Fixes a specific sequence of random decisions predictions = buildPredictions([ "ONE", "two", "one", "three" ]) predictText("one", 10, predictions)
In []:[ 'one', 'two', 'one', 'three', 'one', 'two', 'one', 'three', 'one', 'two', 'one', ]
Out[]:import random random.seed(111) # Fixes a specific sequence of random decisions blueWhale = [ "The", "blue", "whale", "wore", "the", "blue", "hat." ] blue = buildPredictions(blueWhale) predictText("the", 10, blue)
In []:[ 'the', 'blue', 'hat.', 'the', 'blue', 'whale', 'wore', 'the', 'blue', 'whale', 'wore', ]
Printsimport random random.seed(111) # Fixes a specific sequence of random decisions blueWhale = [ "The", "blue", "whale", "wore", "the", "blue", "hat." ] blue = buildPredictions(blueWhale) print(predictText("the", 10, blue)) print(predictText("the", 10, blue)) print(predictText("the", 10, blue))
['the', 'blue', 'hat.', 'the', 'blue', 'whale', 'wore', 'the', 'blue', 'whale', 'wore'] ['the', 'blue', 'whale', 'wore', 'the', 'blue', 'hat.', 'the', 'blue', 'whale', 'wore'] ['the', 'blue', 'whale', 'wore', 'the', 'blue', 'hat.', 'the', 'blue', 'hat.', 'the']
=
or by defining a parameter for a function) you must also later use that variable as part of another expression. If you need to create a variable that you won't use, it must have the name _
, but you should only do this if absolutely necessary.buildPredictions
must return the correct result
buildPredictions
function is run must match the solution result.buildPredictions
must return the correct result
buildPredictions
function is run must match the solution result.nextWord
must return the correct result
nextWord
function is run must match the solution result.nextWord
must return the correct result
nextWord
function is run must match the solution result.predictText
must return the correct result
predictText
function is run must match the solution result.predictText
must return the correct result
predictText
function is run must match the solution result.buildPredictions
def
to define buildPredictions
lower
buildPredictions
, call lower
in at least one place.buildPredictions
, use any kind of loop in at least one place.buildPredictions
def
to define buildPredictions
buildPredictions
, use any kind of loop in exactly one place.nextWord
def
to define nextWord
lower
nextWord
, call lower
in at least one place.nextWord
, do not use any kind of loop.chooseRandomly
nextWord
, call chooseRandomly
in exactly one place.predictText
def
to define predictText
predictText
, use any kind of loop in at least one place.nextWord
predictText
, call nextWord
in at least one place.predictText
def
to define predictText
nextWord
predictText
, call nextWord
in exactly one place.