AI Exploration

If you'd like to understand more about the use of AI for writing code, including some of the reasons that we don't allow it in this class, you can download this starter .zip, unzip it, and work on one or both of the tasks it contains using Thonny. This is a completely OPTIONAL extra task which is NOT GRADED or submitted anywhere. The package includes:

  1. plaid.py - a blank file for writing a plaid function.
  2. test_plaid.py - a file to run to check if your plaid function works correctly (and doesn't use any disallowed features).
  3. weddingSeating.py - a blank file for writing a weddingSeating function.
  4. test_weddingSeating.py - a file for testing the weddingSeating function.

You may work on one or both tasks, the goal is to try out the use of AI, but don't feel obligated to complete these functions (especially the wedding seating function, which is quite complex and which AI may not be able to solve on its own.

Important Reminder about the CS111 AI-powered Tools Policy

We begin with a reminder that about the AI-powered Tools Policy for CS 111:

Submitting code created by or with the help of AI-powered tools (see below what is included under such tools) is considered equivalent to consulting a student who took CS 111 in prior semesters or any other CS knowledgeable person. Given that such consultations are prohibited, the use of AI-powered tools for learning and completing any type of assignments is also prohibited.

This AI Exploration project is the only exception to the above policy in this course. I.e., this project is the only time in CS111 where we condone the use of AI-powered tools to work on any activity in this course.

Setup & Ethical Issues

Note: If, based on the ethical issues we've been discussing in this course, you'd rather not sign up for or use a chatbot, that's fine. We'll provide links to some pre-constructed prompt sessions that you can use as the basis of your work instead. See below for a discussion of our attempts to mitigate the ethical concerns with AI use for this assignment.

To start with, you should log into ai.wellesley.edu. This Amplify documentation has detailed instructions, but you can just use the default settings and type your message into the prompt at the bottom of the page once you log in. This site provides large language model (LLM) access for Wellesley students, faculty, and staff and should be less limited than the free accounts offered directly by large AI companies.

If you hit a limit of chat messages, you can always view our saved prompt sessions included in these instructions as a fall-back.

Note that all large language model systems, including Claude, ChatGPT, and Github Copilot, present several ethical issues with their use:

  1. The training data for these systems is usually harvested without the consent of those who created it, and is often used despite their explicit non-consent. The companies creating these systems do so in a way that does not respect the explicit sharing parameters specified by the authors of the data going into them, even though it is possible (although more costly) to train system in a way that would respect the rights and wishes of creators.
  2. Training data is harvested without regard to licences. The output of a chatbot is a mix of words connected via rules learned from many examples, and sometimes, words connected in a way which exactly duplicates a single specific example. This means that these chatbots sometimes suggest code or prose that should come with a specific license, without letting the user know that it is doing so or that there is a license. For example, open-source code under a license which permits sharing but requires attribution may be suggested verbatim by a chatbot. Someone using the suggested code would be liable for license infringement if they did not include attribution, which would be difficult since they'd have no warning from the chatbot that the code was under license. This is the biggest legal reason not to use AI chatbots, especially when writing code. For this assignment, you are only sharing your code with us, the instructors, and you're doing it for educational purposes, so the "fair use" provisions of copyright apply. They would not apply if you were writing code for a job.
  3. Even for systems whose training data is acquired only via explicit consent (these are extremely rare and we don't know of one that's publicly available), the training data needs to be filtered. The internet is full of absolutely horrific images, stories, and other material, and to avoid these becoming part of the chatbot's output, humans must look at this material and flag it. Similar to content moderation for a big platform like YouTube, the job of reading through and flagging these horrific things is mentally draining and potentially traumatic. To put it plainly: the relative "safety" with which we can use AI chatbots is bought by the traumatization of workers who filter out the horrific stuff. Rather than compensate these workers well for their time and give them plenty of time off and mental health support, companies outsource this work to desperate people and then force them to work long hours with little support.
  4. Finally, despite the efforts made to prevent chatbots from echoing the most hateful and traumatizing sentiments from their datasets, these filtering tools are not perfect, and there is still the chance that these things come out. Additionally, the chatbots reflect and even amplify more accepted forms of bias in our society, and so their use perpetuates these biases. Biases such as ablism, sanism, fatmisia, and more sanitized forms of sexism and racism (such as assuming someone's gender or race based on their profession) all go unfiltered in the data cleaning process and manifest in the outputs of the AI. We think that the chances of these biases coming into play significantly in this particular assignment are low, as we've tried to focus things on purely mechanical coding. But there is no such thing as "politically neutral" code, as long as your code is going to be used by humans. Please do bring to our attention any biases that you see manifesting in the outputs you get while working on this task.
  5. The servers used to train and run these AI tools consume huge amounts of electricity, along with water for cooling. While they aren't exactly a leading cause of CO2 emissions, they are not making the problem better, especially when put to frivolous use. We think that the resource costs associated with students using AI for this task are an acceptable tradeoff based on the importance of the learning goals, and those goals require us to use AI rather than some other substitute. But this is yet another reason that we don't casually use AI in our day-to-day programming (in addition to the reasons above and the fact that it just isn't that helpful in a lot of circumstances, as you're about to experience).

As described above, we've taken steps to mitigate some of the impacts of these ethical issues for the purpose of this specific exercise. It's perfectly valid to feel uncomfortable using AI even with these mitigations (some of us professors do), and you're free to just read along with the pre-generated example sessions or skip this exercise entirely.

There's one additional technical issue that will come up on this assignment: chatbots are not dependable. Because they run on other company's servers, those servers could go down or reach capacity just when you need them (and we've already mentioned message limits). In the longer term, venture capital funding is right now enabling lots of free AI services. But all of those are going to eventually need to make money and become paid services or otherwise try to extract value from their users in ways that may not be acceptable to you. This means that if we taught you to code only using AI chatbots now, you'd be stuck later when those chatbots start demanding money for their use. If you hit capacity limits, fall back on the example exchanges included in these instructions.

Overview

There are two tasks to try out: the plaid task which asks you to use functions and prints to print out a specific pattern of characters, and the weddingSeating task, which asks you to write a function to assign seats at tables to a group o wedding guests some of whom hate or dislike each other (and so should be seated away from each other). You could also try to use AI to write code for any of the earlier assignments, and use the provided test files to check for correctness (please do NOT submit AI generated code to Potluck, as we try to scan for this regularly).

Use the provided testing files to check correctness of your solutions, and if the AI is not able to produce a correct solution on its own, feel free to try to tweak its code by hand to get it working (but just give up if this turns out to be too complicated; that's part of the point of this exploration).

Plaid Pattern

The first task is to define a function named plaid which takes three arguments: a 'ground' character, a 'stripe' character, and a 'join' character. Do this in the provided plaid.py file. It should print out a 15x15 pattern where every 4th row and column consists of the strip character, and the spaces in between are the 'ground' character. Characters where stripes cross each other are the 'join' character. To make this more interesting, you must achieve this under the following constraints:

  1. Do not use any conditionals (even conditional expressions).
  2. Do not use any loops (even list comprehensions).
  3. Do not use print directly within your plaid function. Instead, it should call one or more helper functions which then call print.
  4. Include at most 6 calls to other functions within the plaid function.

Here is an example of what the output should look like:

>>> plaid('.', '#', '@')
...#...#...#...
...#...#...#...
...#...#...#...
###@###@###@###
...#...#...#...
...#...#...#...
...#...#...#...
###@###@###@###
...#...#...#...
...#...#...#...
...#...#...#...
###@###@###@###
...#...#...#...
...#...#...#...
...#...#...#...

The test_plaid.py file can be run to check correctness, including whether your code obeys the given restrictions or not.

We suggest you progress by first copy-pasting the instructions above into the AI prompt, and adding "what follows is the code for the plaid function" to the end to get it to write the code.

Click here to see example results with this prompt.

Initial prompt:

The first task is to define a function named `plaid` which takes three
arguments: a 'ground' character, a 'stripe' character, and a 'join'
character. It should print out a 15x15 pattern where every 4th row and
column consists of the 'stripe' character, and the spaces in between are
the 'ground' character. Characters where stripes cross each other are the
'join' character. To make this more interesting, you must achieve this
under the following constraints:

1. Do not use any conditionals (even conditional expressions).
2. Do not use any loops (even list comprehensions).
3. Do not use `print` directly within your `plaid` function. Instead, it
    should call one or more helper functions which then call `print`.
4. Include at most 6 calls to other functions within the `plaid` function.

Here is an example of what the output should look like:

>>> plaid('.', '#', '@')
...#...#...#...
...#...#...#...
...#...#...#...
###@###@###@###
...#...#...#...
...#...#...#...
...#...#...#...
###@###@###@###
...#...#...#...
...#...#...#...
...#...#...#...
###@###@###@###
...#...#...#...
...#...#...#...
...#...#...#...

The `test_plaid.py` file can be run to check correctness, including
whether your code obeys the given restrictions or not.

What follows is code for the plaid function

We tried this prompt four times in a row, and got mixed results. The first prediction included code that passed all of the tests on the first try:

def make_normal_row(ground, stripe):
    return ground*3 + stripe + ground*3 + stripe + ground*3 + stripe + ground*3

def make_stripe_row(stripe, join):
    return stripe*3 + join + stripe*3 + join + stripe*3 + join + stripe*3

def print_row(row):
    print(row)

def print_three_rows(row):
    print_row(row)
    print_row(row)
    print_row(row)

def print_pattern_block(normal_row, stripe_row):
    print_three_rows(normal_row)
    print_row(stripe_row)

def plaid(ground, stripe, join):
    normal = make_normal_row(ground, stripe)
    horizontal = make_stripe_row(stripe, join)

    print_pattern_block(normal, horizontal)
    print_pattern_block(normal, horizontal)
    print_pattern_block(normal, horizontal)
    print_three_rows(normal)

This is a reasonable general strategy, similar to solutions to the diamonds problem. The second prompt produced code that almost passed the tests, but which failed because it chose to define all of the helper functions inside the plaid function, which technically violated the limits on function calls and printing inside plaid. This is a... bizarre and unnecessary choice, but perhaps something about the prompt suggested it should only define a single function?

def plaid(ground, stripe, join):
    def print_row(row_str):
        print(row_str)

    def ground_row():
        return ground * 3 + stripe + ground * 3 + stripe + ground * 3 + stripe + ground * 3

    def stripe_row():
        return stripe * 3 + join + stripe * 3 + join + stripe * 3 + join + stripe * 3

    def print_ground():
        print_row(ground_row())

    def print_stripe():
        print_row(stripe_row())

    def print_block():
        print_ground()
        print_ground()
        print_ground()
        print_stripe()

    # Print pattern: 3 complete blocks + 3 final ground rows
    print_block()
    print_block()
    print_block()
    print_ground()
    print_ground()
    print_ground()

The third result we got was similar to the second result, and in both cases, asking the AI to fix its mistake by moving the internal definitions outside of plaid resulted in correct solutions. This second solution is a bit tortured in its logic though, using join, map, and lambda along with a dictionary with tuples of booleans as keys.

def plaid(ground, stripe, join):
    char_map = {(False, False): ground, (False, True): stripe, (True, False): stripe, (True, True): join}

    def get_char(row, col):
        return char_map[(row % 4 == 3, col % 4 == 3)]

    def make_row(row):
        return ''.join(map(lambda col: get_char(row, col), range(15)))

    def print_row(row_data):
        print(row_data)

    def print_all_rows(rows):
        list(map(print_row, rows))

    rows = list(map(make_row, range(15)))
    print_all_rows(rows)

The fourth prediction gave us a recursive "solution", but it's also quite oddly put together and it doesn't work:


def plaid(ground, stripe, join):
    print_pattern(ground, stripe, join, 0)

def print_pattern(ground, stripe, join, row):
    print_row(ground, stripe, join, row, 0) and (
        row 

This problem should be easy for you to solve even without the "help" of AI, and is included here to help make it clear that AI code generation can fail badly even on some relatively simple problems (plus to illustrate some of its inconsistencies and other limitations).

We'd also like you to reflect on the learning aspect. If we had allowed you to use AI on earlier tasks similar to this one, and the AI had come up with at least an outline of the solution, would you have learned as much as you did solving them yourself?

Wedding Seating

For the second task, we have created a problem that involves lists and dictionaries which is quite complex: beyond the normal complexity of tasks we'd usually include in this class. It's by no means impossible for you to solve it, but we'd guess that if we assigned it as a normal task it would take significantly longer than the others we've assigned. The goal of this task is to let you see for yourself whether you can use AI to solve complex problems that you don't fully understand, or whether you need to learn enough to solve those problems yourself.

The requirements for the weddingSeating function are defined in the prompt below, which you can start by passing directly to the AI. Feel free to attempt follow-up prompts and/or to tweak this prompt and start new conversations to see if you can get the AI to generate a correct solution. Also feel free to start with what it gives you and tweak the code yourself to see if you can get it to work correctly. You should write your code in the weddingSeating.py file and the provided test_weddingSeating.py file can be used to check whether your definition is working. Here is the starting prompt; below it is a refined prompt, and then an example of our prompt session using this prompt, if you'd prefer not to use the AI directly.

This first prompt is a bit vague about the exact process of placing guests, so here's a version with more detail about the exact steps to take:

Feel free to try either prompt or both of them and see what results you get.

Click here to see sample results.

Using the simpler prompt above, out of four tries the AI succeeded in its first response on two of them. For the other two, when simply prompted to fix the code without any mention of what was wrong, in one case the AI gave back code that still didn't work, with erroneous explanations about what had been "fixed," including "fixing" issues which the original code handled correctly. In the other case, the revised code worked, although the explanation about what had been changed was still incorrect.

Here's one of the results that worked on the first try:

def weddingSeating(guests, max_table_size):
    tables = []

    def can_sit_at_table(guest, table):
        # Check if guest hates anyone at the table or vice versa
        for seated_guest in table:
            if 'hates' in guest and seated_guest in guest['hates']:
                return False
            if 'hates' in seated_guest and guest in seated_guest['hates']:
                return False
        return True

    def can_sit_at_position(guest, table, position):
        if len(table) >= max_table_size:
            return False

        # Get adjacent positions in circular table
        table_size = len(table)
        if table_size == 0:
            return True

        # After insertion, the table will have size table_size + 1
        # The guest will be at 'position', so we need to check adjacency
        left_pos = (position - 1) % (table_size + 1)
        right_pos = (position + 1) % (table_size + 1)

        # Map back to current table indices
        adjacent_guests = []
        if table_size > 0:
            if position == 0:
                # Guest will be at start, adjacent to current first and last
                adjacent_guests = [table[0]]
                if table_size > 1:
                    adjacent_guests.append(table[-1])
            elif position == table_size:
                # Guest will be at end, adjacent to current last and first
                adjacent_guests = [table[-1]]
                if table_size > 1:
                    adjacent_guests.append(table[0])
            else:
                # Guest will be in middle, adjacent to position-1 and position
                adjacent_guests = [table[position-1], table[position]]

        # Check dislikes with adjacent guests
        for adjacent_guest in adjacent_guests:
            if 'dislikes' in guest and adjacent_guest in guest['dislikes']:
                return False
            if 'dislikes' in adjacent_guest and guest in adjacent_guest['dislikes']:
                return False

        return True

    # Process each guest in order
    for guest in guests:
        placed = False

        # Try to place at existing tables
        for table in tables:
            if can_sit_at_table(guest, table):
                # Try each position at this table
                for pos in range(len(table) + 1):
                    if can_sit_at_position(guest, table, pos):
                        table.insert(pos, guest)
                        placed = True
                        break
                if placed:
                    break

        # If couldn't place at any existing table, create new table
        if not placed:
            tables.append([guest])

    return tables

The response also included this explanation of the code, which was accurate:

This function:

1. Iterates through guests in order and tries to seat each one
2. Checks hate constraints by ensuring guests who hate each other are at
    different tables
3. Checks dislike constraints by ensuring guests who dislike each other
    aren't adjacent (considering circular seating)
4. Tries the first valid position at the first valid existing table
5. Creates a new table if no existing table works
6. Respects table size limits by checking against max_table_size

The adjacency logic handles the circular nature of tables - when inserting at position 0, the guest will be adjacent to the current first and last guests; when inserting at the end, adjacent to the current last and first; when inserting in the middle, adjacent to the guests at position-1 and position.

Here's the response that didn't work either initially or after a request to fix it:

def weddingSeating(guests, max_table_size):
    tables = []

    for guest in guests:
        placed = False

        # Try each existing table
        for table in tables:
            if len(table) >= max_table_size:
                continue

            # Check if guest hates anyone at this table
            hate_conflict = False
            if "hates" in guest:
                for hated in guest["hates"]:
                    if hated in table:
                        hate_conflict = True
                        break

            if hate_conflict:
                continue

            # Try each position in the table
            for pos in range(len(table) + 1):
                valid_position = True

                # Check for dislike conflicts with adjacent seats
                if "dislikes" in guest:
                    # Determine neighbors after insertion
                    left_neighbor = None
                    right_neighbor = None

                    if len(table) > 0:
                        if pos == 0:
                            # Inserting at beginning
                            left_neighbor = table[-1]  # Last person (circular)
                            right_neighbor = table[0] if len(table) > 0 else None
                        elif pos == len(table):
                            # Inserting at end
                            left_neighbor = table[pos - 1]
                            right_neighbor = table[0] if len(table) > 0 else None  # First person (circular)
                        else:
                            # Inserting in middle
                            left_neighbor = table[pos - 1]
                            right_neighbor = table[pos]

                    # Check if guest dislikes any neighbors
                    if left_neighbor and left_neighbor in guest["dislikes"]:
                        valid_position = False
                    if right_neighbor and right_neighbor in guest["dislikes"]:
                        valid_position = False

                # Also check if any current table members dislike this guest
                if valid_position and len(table) > 0:
                    # Check neighbors' dislikes
                    left_neighbor = None
                    right_neighbor = None

                    if pos == 0:
                        left_neighbor = table[-1]
                        right_neighbor = table[0] if len(table) > 0 else None
                    elif pos == len(table):
                        left_neighbor = table[pos - 1]
                        right_neighbor = table[0] if len(table) > 0 else None
                    else:
                        left_neighbor = table[pos - 1]
                        right_neighbor = table[pos]

                    if left_neighbor and "dislikes" in left_neighbor and guest in left_neighbor["dislikes"]:
                        valid_position = False
                    if right_neighbor and "dislikes" in right_neighbor and guest in right_neighbor["dislikes"]:
                        valid_position = False

                if valid_position:
                    table.insert(pos, guest)
                    placed = True
                    break

            if placed:
                break

        # If not placed at any existing table, create new table
        if not placed:
            tables.append([guest])

    return tables

This code came with the following explanation:

This function:

1. Processes guests in order - iterates through the guest list sequentially
2. Tries existing tables first - for each guest, attempts placement at existing tables before creating a new one
3. Respects table size limits - skips tables that are already at maximum capacity
4. Handles hate relationships - ensures guests who hate each other are never at the same table
5. Handles dislike relationships - ensures guests who dislike each other are not seated adjacent to each other, accounting for the circular nature of tables
6. Finds first valid position - tries positions from 0 to table_length for each table
7. Creates new tables when necessary - if no existing table can accommodate a guest, creates a new single-person table

The adjacency logic handles the circular table arrangement where the first and last guests are considered neighbors.

The issue with this code is that it doesn't check for guests at the table who might hate the new guest being placed, even though they aren't hated by the new guest. When asked to fix the code, it produced code with the same bug but with slightly different code to handle dislikes (which the original code handles correctly). It starts with the text You're right, let me fix the adjacency logic for circular tables: and then after the incorrect code includes the text:

The key fixes:

1. Proper circular adjacency: For tables with 2+ people, the left neighbor is table[(pos-1) % len(table)] and right neighbor is table[pos % len(table)]

2. Single person table handling: When there's only one person at a table, any new person will be adjacent to them

3. Empty table handling: No adjacency conflicts for the first person at a table

4. Bidirectional dislike checking: Checks both if the guest dislikes their potential neighbors AND if the neighbors dislike the guest

This seems reasonable, unless you carefully read both versions of the code and realize that nothing was wrong with these steps in the first place. Because the handling of circular tables and dislikes is the most complex part of the code, that's the part that's most likely to have an error. So since the AI generates the "most likely conversational response" here, it generates a response based on that part being wrong, and with changes to that part of the code. When prompted a second time to fix the code, without details about what to fix, it does the same thing again, claiming that the adjacency logic is wrong, and generating another incorrect version which doesn't check for hatred in both directions but which revises the working dislike-checking code yet again.

When given another prompt that explicitly states that the issue is with checking hatred, the AI finally generates the following working code:

def weddingSeating(guests, max_table_size):
    tables = []

    for guest in guests:
        placed = False

        # Try each existing table
        for table in tables:
            if len(table) >= max_table_size:
                continue

            # Check if guest hates anyone at this table OR anyone at table hates guest
            hate_conflict = False
            if "hates" in guest:
                for hated in guest["hates"]:
                    if hated in table:
                        hate_conflict = True
                        break

            # Also check if anyone at the table hates this guest
            if not hate_conflict:
                for person in table:
                    if "hates" in person and guest in person["hates"]:
                        hate_conflict = True
                        break

            if hate_conflict:
                continue

            # Try each position in the table
            for pos in range(len(table) + 1):
                valid_position = True

                if len(table) == 0:
                    # Empty table - no conflicts
                    pass
                elif len(table) == 1:
                    # Table has 1 person - new person will be adjacent to them
                    existing_person = table[0]

                    # Check mutual dislikes
                    if ("dislikes" in guest and existing_person in guest["dislikes"]) or \
                       ("dislikes" in existing_person and guest in existing_person["dislikes"]):
                        valid_position = False
                else:
                    # Table has 2+ people - check circular adjacency
                    if pos == 0:
                        # Inserting at start: adjacent to last and first
                        left_neighbor = table[-1]
                        right_neighbor = table[0]
                    elif pos == len(table):
                        # Inserting at end: adjacent to last and first (circular)
                        left_neighbor = table[-1]
                        right_neighbor = table[0]
                    else:
                        # Inserting in middle: adjacent to pos-1 and pos
                        left_neighbor = table[pos-1]
                        right_neighbor = table[pos]

                    # Check for dislike conflicts with both neighbors
                    if ("dislikes" in guest and (left_neighbor in guest["dislikes"] or right_neighbor in guest["dislikes"])) or \
                       ("dislikes" in left_neighbor and guest in left_neighbor["dislikes"]) or \
                       ("dislikes" in right_neighbor and guest in right_neighbor["dislikes"]):
                        valid_position = False

                if valid_position:
                    table.insert(pos, guest)
                    placed = True
                    break

            if placed:
                break

        # If not placed at any existing table, create new table
        if not placed:
            tables.append([guest])

    return tables

What's most interesting here is that because the AI is unable to do actual reasoning, or to do things like run the code and diagnose what is wrong with it, it is much worse at fixing issues in the code than at generating code initially, unless a human can tell it precisely what the issues are. This is why the foundational skills you're building in this class which allow you to understand code will be necessary even if AI-assisted coding becomes the norm.

Also relevant is that using the more-specific prompt that laid out things step-by-step in more code-like terms was successful in all 4 of our attempts, while using the less-specific prompt failed half the time. The ability to describe what you want in more "computer-sciencey" terms is crucial to getting working code out of the AI system, because its predicting responses based on the text you input: more precise language that uses terminology more similar to existing tutorials and/or discussions among programmers is going to be more likely to be followed up by a correct program within the training data.

Addendum: My example solution uses a single loop to check for both hatred and dislikes, which is more efficient than looping once for hatred and again for dislikes. It's quite complicated, but nothing amazing. I tried to ask the AI to generate such a solution by explicitly telling it which loops to combine, and to limit itself to only 3 loops. It was completely unable to generate the more efficient solution, instead generating several less-efficient solutions, including solutions that ignored the constraints I asked for. In this respect, it's even less capable than the average CS 111 student, because while it can predict several possible solutions to this problem, it can't actually learn anything new, and when asked to do something difficult, it usually ends up ignoring part of the constraints you asked for rather than actually satisfying all of them.

Extra Reflection Questions:

Bonus Example

Here is a very simple prompt which the AI should be able to predict correct code for on the first try, using any of the available models on ai.wellesley.edu:

Generate a Python program to check which names from a list are longer
than 12 characters. Define a function named "longEnough" that returns a
new list with just the long names in it. For example:

longEnough(["Brian Hood", "Peter Mawhorter", "David Mayer", "Jonathan Zittrain"])

would return:

["Peter Mawhorter", "Jonathan Zittrain"]

But if you go to https://chatgpt.com and put in the same prompt, you will get "I'm unable to produce a response." That's because the specific names used in the illustrative example as part of the prompt are people who have been victimized by false AI narratives, and as part of the resolution of lawsuits or complaints the official ChatGPT page refuses to process any query containing their names (this Ars Technica article explains more).