wordfreq.html

Word Frequency Detection

Here is a method that can be used to detect words frequently used in a sentence:

	public static ObjectList usedBefore (int minSize, int minOccur, 
                                            String sentence, ObjectList sentences) {
	// Assume sentences is a list of sentence strings.
	// Returns a list of the words in sentence that are at least
	// minSize long and are used at least minOccur times in sentences.
		ObjectList words = mapLowerCase(filterSize(minSize, parseWords(sentence)));
		return wordsUsedBefore(minOccur, words, sentences);
	}

The usedBefore() method depends on the following methods, which you will define:

public static ObjectList mapLowerCase (ObjectList L)
Assumes L is a list of strings. Returns a list of strings whose elements are the lowercase versions of the strings in L. Use the String instance method toLowerCase() to find the lowercase version of a string.

public static ObjectList filterSize (int size, ObjectList L)
Assumes L is a list of strings. Returns a list, in the same order, of the strings in L whose lengths are at least size characters. Use the String instance method length() to measure the length of a string.

public static ObjectList wordsUsedBefore (int minOccur,
ObjectList words,
ObjectList sentences)
Assume words is a list of word strings, and sentences is a list of sentence strings. Returns the list of words in words that appear in at least minOccur sentences.

For defining wordsUsedBefore(), it is also helpful to have the following auxiliary method:

public static int occurrences (String word, ObjectList sentences)
Assume sentences is a list of sentence strings. Returns the number of sentences in which word appears.

Examples

Here are test cases involving mapLowerCase and filterSize, using the same test lists from the previous part:

mapLowerCase(L0) = [ ]
mapLowerCase(L1) = [you, are, not, listening, to, me, !]
mapLowerCase(L2) = [i, am, what, i, am, ;, you, are, what, you, are]
mapLowerCase(L3) = [i, am, saying, that, my, goal, is, to, get, what, is, mine, .]
 
filterSize(3, L0) = [ ]
filterSize(4, L0) = [ ]
filterSize(5, L0) = [ ]
filterSize(3, L1) = [You, are, not, LISTENING]
filterSize(4, L1) = [LISTENING]
filterSize(5, L1) = [LISTENING]
filterSize(3, L2) = [what, you, are, what, you, are]
filterSize(4, L2) = [what, what]
filterSize(5, L2) = [ ]
filterSize(3, L3) = [saying, that, goal, get, what, mine]
filterSize(4, L3) = [saying, that, goal, what, mine]
filterSize(5, L3) = [saying]

Suppose we define testSentences as list of the following sentences:

[Last night I had many dreams.,
 I pursue my dreams.,
 In my dreams, I win the lottery.,
 I like the night.,
 You are helpful.
 ]

Further suppose that we define testWords as a list of the following words:

[Last, night, I, had, many, dreams]

Then here are some test cases involving occurrences() and wordsUsedBefore():

occurrences("I", testSentences) = 4
occurrences("dreams", testSentences) = 3
occurrences("night", testSentences) = 2
occurrences("lottery", testSentences) = 1
occurrences("cat", testSentences) = 0
 
wordsUsedBefore(1, testWords, testSentences) = [Last, night, I, had, many, dreams]
wordsUsedBefore(2, testWords, testSentences) = [night, I, dreams]
wordsUsedBefore(3, testWords, testSentences) = [I, dreams]
wordsUsedBefore(4, testWords, testSentences) = [I]
wordsUsedBefore(5, testWords, testSentences) = [ ]