Graphic by Keith Ohlfs
CS111, Wellesley College, Fall 2001

Problem Set 9

Due on Wednesday, November 28, at 11pm

[CS111 Home Page] [Syllabus] [Assignments] [Documentation] [FAQ] [CS Dept.] [CWIS]

How to turn in this Problem Set

Submit the folder ps9_programs to your drop folder on the cs111 server. Before submitting your work, make sure that all the files have been saved and the projects compile and run as they should. After submitting your work, make sure to doublecheck that all the files have been uploaded correctly. If you need directions on how to submit your work or how to check whether the submission was successful, please click here. Please make sure to keep a copy of your work, either on a zip disk, or in your private directory (or, to play it safe, both).


Part I: Selection Sort using Vectors

Here is an implementation of an algorithm called selection sort.
 public static int minIndexBetween (int [ ] a, int lo, int hi) {
  // Return the index of the minimum integer in a[lo..hi], or -1 if the segment is empty
      int minVal = Integer.MAX_VALUE;
      int minInd = -1;
      for (int i = lo; i <= hi; i++) {
           if (a[i] < minVal) {
                minVal = a[i];
                minInd = i;
           }
      }
      return minInd;
 }

 public static void swap (int [ ] a, int i, int j) {
  // Swap the contents of a[i] and a[j]
      int temp = a[i];
      a[i] = a[j];
      a[j] = temp;
 }

 public static void selectionSort (int [ ] a) {
      for (int i = 0; i < a.length-1; i++) {
           swap(a, i, minIndexBetween(a, i, a.length-1));
      }
 }
 

This version sorts an array of integers. Your task is to write a version that sorts a Vector of Strings. Fill in the skeletons of indexOfMinStringBetween(), swap(), and selectionSort() in the file StringVectorOps.java, which you will find in the vectors subfolder of the ps9_programs folder.

As always, you should write a draft of your methods on paper, then add them gradually to a working program, testing each method in isolation before continuing.

 public static int indexOfMinStringBetween (Vector vec, int lo, int hi) {
  // vec is assumed to be a Vector of Strings which contains valid Strings in positions lo..hi at least
  // Returns the index of the minimum integer in vec[lo..hi], or -1 if the segment is empty
  // Uses the String instance method compareTo()
      String minVal = "zzzzz";
      int minInd = -1;
 
 
 
 
 

 }
 

 public static void swap (Vector vec, int i, int j) {
  // vec is any Vector with valid Objects (not necessarily Strings) in positions i and j
  // Swaps the contents of vec[i] and vec[j]
 
 
 
 
 

 }
 

 public static void selectionSort (Vector vec) {
  // vec is assumed to be a Vector of Strings
  // sorts vec into alphabetical order through selection sort, using the Vector version of swap
 
 
 
 
 

 }


Notes:


Part II: The Day-to-Day Database Crisis!

Now that you have mastered the basic elements of Java programming, you have decided to do some work as a programming consultant to earn a little extra holiday spending money. You answer an ad from the Day-To-Day temporary employment agency concerning some troubles they are having with their employee database. The human resources department of Day-To-Day uses a simple database program (described below) that maintains the name and email address of each active temporary employee. Unfortunately, the creators of this database program only tested the program on small test databases and did not worry about how long certain operations would take on large databases. In particular, it turns out that the database program is rather inefficient at adding each new person, so that large databases can take a very long time to create. Your job is to rewrite a portion of the database code in a more efficient form so that large databases will be created more quickly. Because the company does not want to have to retrain its human resource staff to work with a new database program, they have asked you to modify the existing program so that it has the same behavior from the user's perspective, except that certain operations should be faster.

The Features of the DatabasePrograms

The database program used by Day-To-Day can be found in the Database programs folder in the ps9_programs download folder. Run the applet DatabaseWorld.html within this folder. You will see that the program consists of a window with several buttons at the left as shown below:

If you press the Load DB File button, you will be prompted to select a database file using the usual Macintosh file dialog mechanism. All files ending in the extension .db are database files.

Loading Databse Files:

There are three database files: test4.db, test6.db, and test10.db. They are located in the Java Classes subfolder of the Database folder. When you click on the button "Load DB File", a window will open asking you to select a file to load. You need to select the Databse folder, and inside it the subfloder Java Classes, then select a database file. If you select test6.db, the database window will be updated to show the six personnel entries in this file, as shown below:

Each personnel entry has four fields: a last name, a first name, a middle name or initial (which may be empty), and an email name. An entry may be selected by clicking on it. At most one entry can be selected at any one time.

The other buttons manipulate the entries as described below.

You should play around with the applet to get a feel for these operations before attempting the rest of this problem.

The Database Class Interface

The database program is rather large. Fortunately, due to the wonders of data abstraction, it turns out that you only have to understand the Database class defined in the source code file Database.java. Instances of the Database class maintain an ordered collection of Person objects, each of which represents one entry in the database. The Database class is in some sense the heart of the database program --- all of the other classes in the program provide support for interacting with Database objects or the Person objects that they hold.

The Database class has the following interface:

Constructor:

public Database()
   Create an empty database (one that contains no persons).


Instance Methods:

public int size()
   Return the number of persons in the database.
    
public void clear()
   Make this database empty. 
    
public void add(Person p)
   Add person p to the end of this database. 
    
public void remove(String s)
   Remove the person with description string s from this database.
    If no person in the database has description string s, do nothing. 
    (The description string of a Person object is the entry string that 
    appears in the database window. This string is produced by the 
   String toString() method of the Person class. The 
   boolean hasString (String s) method on the Person 
    class is used to test if a person has a given description string.)
    
public void sort (Comparator comp)
   Sort the entries of the database according to the comp object. A
   Comparator object specifies the ordering of two Person objects via a
   boolean lessThan (Person p1, Person p2) method. 
    
public void print()
   Display the entries of the database in the stdout window. 
        
public void print(PrintStream ps)
   Write the entries of the database to the print stream ps. 
    This method is used to save databases to a file. 
        
public String [ ] entryList()
   Return an array of description strings for the entries of the database.
    The description strings have the same order as the database entries.

The Database Class Implementation

There are zillions of ways to implement the Database class interface presented above. The implementation in the file Database.java represents a database with n entries as an array (named people) of n Person objects. You should study the code in Database.java to see how each of the above constructors and methods is implemented in terms of this representation. In particular, pay attention to the implementation of the add and remove methods. The add method adds a person p to the end of a database as follows: The remove method removes the person with description string s as follows: These operations do a lot of copying work every time an entry is added to or removed from the database. It turns out that loading a database with n entries from a file calls the add method n times. Because of all the copying involved, it can take a long time to load in a large database. We will see in CS230 that the loading process takes time quadratic in the size of the database --- i.e., it is proportional to the square of the size of the database. This is bad; we would like it to take time linear in the size of the database --- i.e., it should take time proportional to the size of the database.

The purpose of part A of this problem is to explore an alternative representation of databases that makes add and remove more efficient. Part B explores the use of Vectors as the basis for yet another implementation, one that is rather convenient from the programmer's point of view.
 
 

Part A: An Alternative Database Representation Using Arrays

In this problem, you will implement an alternative database representation that improves the efficiency of the add and remove methods. One way to make the add method more efficient is to start with a database array that is a certain small size (for example 4 elements), instead of zero elements. You can then add entries without increasing the array size until you fill that array. When you reach the limit for that array, you can then create a new array that is double the size of the old array, and copy the old array into the new one (much like the current version of the program does). Thus, when you add the 5th person, you would: You can then add people to your database without copying until you reach the ninth person to be added, at which time you can double the size of the array again. The strategy of doubling the size of the array rather than incrementing the size of the array every time it is full significantly reduces the number of copy operations that need to be performed in a sequence of add invocations. In fact, the doubling strategy changes the quadratic time load process into a linear time load process.

The remove method can be implemented as follows:

Note that a remove invocation never changes the size of the array in this strategy.

Note that the number of entries in this strategy is no longer equal to the length of the people array. To keep track of the database size, a database object must have an additional instance variable (call it count) that counts the number of entries in the database. The following invariant should be preserved by all the database operations:

To begin this problem, you should follow these steps:
  1. Download the Database folder.
  2. Open the project Database.mcp.
  3. Open Database.java by double clicking on the icon in the project window. This file contains the current version of the database with the inefficient (but correct) implementation of all the methods. Use this implementation as the example of how the database behaves on various data.
  4. Write your code for the alternative array implementation in the file DatabaseArrays.java which contains definition of the class DatabaseArrays.class.
  5. Test your code by moving the file DatabaseArraysWorld.html to the topmost position in Link Order in the project folder.
  6. You can run the original implementation by moving the file DatabaseWorld.html to the topmost position in Link Order. This may be useful for testing your code.

Contrary to the earlier instructions on this web page, you don't need to rename any files or methods.

For this problem, you will need to accomplish the following tasks:

  1. Implement the DatabaseArrays() constructor of the class DatabaseArrays. The initial people array should have length 4.
  2. Implement the size() method. You can test this method using the Print button, which prints out the size of the database along with the entries.
  3. Implement the clear() method. This should not change the size of the people array.
  4. Implement the add(Person p) method as outlined above. This should double the size of the people array only when the array is full of valid entries. You may wish to insert a call to System.out.println that indicates every time the array size is doubled. You can test this method via the Add button. You can also test add(Person p) and clear() together using the Load DB File button. This clears the database and adds the entries from the file one by one. You should try loading all the provided .db files.
  5. Implement the remove() method as sketched above. This method should not change the size of the people array. You can test this method via the Remove button.
  6. It is not necessary to modify any other methods other than the ones mentioned above. As part of your problem set, you should include a brief explanation why it is not necessary to modify any of the other methods.
Note: On the small databases provided, you will not be able to observe a noticeable speedup with the new implementation. If time permits, we will try to provide a larger file that makes the speedup clear.
 

Part B: An Alternative Database Representation Using Vectors

In this problem you will develop a different implementation of the Database class using a Vector instead of an array to store the entries of the Database. The Vector version has certain advantages to the implementer (you) when compared to the array version, in the sense that the various methods of the Database class can be implemented more easily in the Vector version. For example, you don't need to explicitly increase the size of the storage structure when adding new entries, as the insertion methods of the Vector class do this for you. Also, it isn't necessary to have a separate instance variable that keeps track of the current size of the storage structure. For this problem:
  1. Write your code in the file DatabaseVectors.java which contains the definition of theclass DatabaseVectors.class: the databse implementation using vectors.
  2. To run your code, move the file DatabaseVectorsWorld.html to the topmost position in Link Order.
  3. You can run the original implementation by moving DatabaseWorld.html to the topmost position in Link Order.

For this problem, you will need to accomplish the following tasks:

  1. Implement the DatabaseVectors() constructor.
  2. Implement the size() method. You can test this method using the Print button, which prints out the size of the database along with the entries.
  3. Implement the clear() method. This method is allowed to change the size of the people Vector. There is a convenient Vector method that you can use to implement clear() but that does not appear in the Vector contract discussed in lecture. To find this additional method, consult the Vector class documentation as described below. However, feel free to provide a different implementation of clear() if you prefer.
  4. Implement the add(Person p) method. You can test this method via the Add button. You can also test add(Person p) and clear() together using the Load DB File button. This clears the database and adds the entries from the file one by one. You should try loading all the provided .db files.
  5. Implement the remove() method as sketched above. This method isallowed to change the size of the people Vector. You can test this method via the Remove button.
For some or all of the above tasks, you may find it convenient to refer to the Java API documentation for the Vector class. Follow the documentation link on the CS111 homepage, scroll down to "Java resources", go to "Java library documentation", then to the java.util package documentation, and follow the link for the Vector class.