Glossary

This page defines special CS jargon that you'll encounter throughout the class. It's organized by concept topics, so that you can see what will be covered each week, and alphabetized within each topic. Where relevant it includes links to the quick reference page which has short examples and descriptions of the various Python functions and keywords you'll use when programming. It also includes some links to the official Python documentation which includes its own official glossary, although these pages assume a high level of programming knowledge. This page defines terminology we use to talk about programs, rather than the keywords we actually write in our programs.

Use the "find" feature of your browser to quickly jump to the term you're looking for. If you're looking for a concept but not sure what it's called, feel free to check the index, browse by topic, or reach out to an instructor for a pointer. Also please reach out if you run into a term you're not sure the meaning of and you don't see it on this page.

Note: The text of this page is copyright 2025 Peter Mawhorter & other CS 111 team members, licensed under the Creative Commons Attribution NonCommercial ShareAlike 4.0 International license. This means that you're free to share this work, create your own versions, and distribute them, as long as you provide credit, link to the license, indicate if changes were made, distribute your modifications under the same license, and do not use the material for commercial purposes.

Topics

Index

Extra Terms

General Terminology

Abstraction

The use of more general or less-specific terms to describe something which also has a more-specific or concrete description.

Also, the creation of a technical system which allows users to specify things at a more abstract level and have a more detailed process be deduced from the abstract specification.

For example, "add 3 and 4 to get 7" is a concrete description; "add two numbers together" is a more abstract description of the same thing, where there could be several different concrete descriptions that match the abstract description. A computer system that allows the user to type in "add 3 and 4" and which will produce the result 7 might be an abstraction relative to the actual pattern of electrical signals that happens in the physical computer to achieve that result. In the same way, many different physical computers with different designs might be able to produce the same result given different technical systems for translating "add 3 and 4" into electrical patterns.

Click here for more details

The benefit of an abstraction is that the user does not need to know all the technical details of the underlying system in order to use it. We can type "x = 3 + 4" in Python to assign the value 7 to the variable x, without needing to understand the details of how 3, 4, and 7 will be represented using electrical signals and wires in the computer, and what exact process will happen in order to produce the 7 from the 3 and the 4. Someone who did understand those things build an abstraction for us to use, and we can use it without needing to know the details.

Since an abstraction is not a full description of what's actually going on, it may "leak," meaning that some detail of the underlying process becomes relevant to the user of the abstraction in an unexpected way. In this case, the abstraction breaks down, and the user must understand the underlying actual process in order to figure out how to fix things.

Abstractions can occur in multiple layers. For example, the C programming language is an abstraction from the physical hardware of your computer. The Python programming language is written in the C language, so the abstractions it provides are built on top of the abstractions provided by C.

Algorithm

A series of instructions that can be followed to answer a question or solve a problem.

An algorithm does not have to involve computers at all, and even processes like "how to add two multi-digit numbers together by hand" are algorithms (see the Wikipedia entry for Algorihtm on the origin of the term). In computer science, algorithm is used to denote the abstract process used to achieve a desired result, and the same algorithm may be implemented by several different programs.

Whereas a program specifies the exact steps a computer should take to produce a certain result, an algorithm might specify the same steps at a higher level of abstraction. To give a concrete example, a step in an algorithm might be "add two numbers together to get their sum" whereas a computer program might accomplish this by "assigning the first number to variable x, then assigning the second number to variable y, then adding x and y and storing the result in variable z."

App

Application

A program or collection of programs that allows users to interact with a computer to achieve particular outcomes.

For example, "TextEdit" is an application for the MacOS operating system that consists of a single program that allows users to edit text files. Meanwhile, "YouTube" is an application for viewing and publishing videos that consists of several programs: a server program that accepts requests from client programs and sends or receives data to make them work, a client program within the "youtube.com" web page that allows for viewing or uploading videos via a web browser (which is a separate enabling application), and various client apps for different kinds of tablets or cell phones that each enable similar features to the web page (though not all exactly the same). Users think of "YouTube" as a single thing they interact with, but there are actually multiple programs running on different devices that work together to make those interactions possible.

Click here for more details

The term app especially in the short form has also come to mean a specific client program installed on a computer, cell phone, or tablet, possibly from an "app store." You might be "required to use the YouTube app" to access a certain feature that's not available on the youtube.com web page. Further confusing the term is that a web app refers to a web page that allows for interaction, as opposed to a regular web page that just displays information.

An application often offers a range of different functionality related to some central purpose, with the goal of helping users (who are also customers for paid apps) accomplish some particular task and deal with related tasks. For example, a taxi app might be designed to facilitate ride requests and payment, and it might also handle related tasks like giving ratings to drivers and/or riders and reporting abuse. Such an application would likely include at least three programs: one program on a central server that stores the necessary data to coordinate between drivers and customers, another running on the driver's phone or other device that lets them know where to pick up the next customer and where to drive to, and a third running on the customer's phone allowing them to input their request and destination and pay for the trip. In some contexts these would be referred to as separate apps, and in others as a single app, so the term is a bit ambiguous.

An application typically has a recognizable icon and can be run by either clicking/tapping its icon or searching for it by name in an operating system's launcher menu. When run, it will open up one or more windows that the user can interact with to use the application. Some applications might instead run automatically when the computer starts, and/or run in the background without direct interaction, like an antivirus. Programs that run in the background and only interact with other programs instead of interacting with the user directly are called "services," "daemons," or "background processes." For a Python program to have an icon and be launched directly from the operating system, it must be bundled. Alternatively, you can use a dedicated Python application (or an integrated development environment to load and run a Python program.

In this class, we'll focus on creating simple individual programs that accomplish a very limited single purpose, much simpler than most applications you may be familiar with. Creating an entire application with a complex user interface that handles multiple related sub-tasks requires practice with user interface design and software development methods for larger, more complex programs.

Bug

Defect

Debug

A defect is an issue with a program's code that can cause an error or which produces other undesirable behavior, like making the program run very slowly or consume a lot of memory.

Often referred to as a bug, and the process of fixing defects is called debugging.

In the simplest case, a defect will cause a crash, and the associated traceback will point the programmer to where the defect is. However, sometimes a defect causes a crash indirectly, and the defect is not near the line where a program crashes. Even worse, a defect might cause an incorrect result without crashing the program, so that the programmer has to investigate things in detail to figure out how to fix it.

Debugging techniques like probing, tracing, unpacking, toggling, and bisecting can be used to track down difficult-to-find bugs.

Code

A general term for the words, letters, and symbols that we write to construct a program. We can enter code piece-by-piece in a temporary way using a shell, or save a bunch of code together in a file and run it all at once. Sometimes we contrast code with comments which are parts of a program file that get ignored when running the file but are useful for humans reading the file to understand how it works.

Computer Science

Originally, a branch of mathematics studying the theoretical capabilities and limitations of programs (not a science).

Nowadays, a very broad discipline spanning mathematics, science, and engineering that deals with all aspects of computer software, from design to implementation to theory (Computer Engineering and Electrical Engineering deal with hardware).

Click here for more details

A Computer Science major typically covers software engineering and the theory of algorithms, and offers electives covering things like human-computer interaction, as well as specific applications to domains like networking, machine learning, operating systems, and/or cryptography.

Computer scientists use diverse methodologies, and the field is so broad that many computer scientists are totally unfamiliar with core methodologies of other computer scientists. For example, someone interested in user interface design might conduct scientific experiments using tools from psychology involving humans using a program and measuring their effectiveness and/or reactions. Meanwhile, a theoretical computer scientist might invent new algorithms and write mathematical proofs of their effectiveness or efficiency without doing any scientific experimentation. Similarly, a machine learning expert might figure out how to apply an existing algorithm to new data to enable some key new capability, and validate that by developing a new benchmark and showing that their setup works better than other possible solutions. These three examples use tools from science, mathematics, and engineering that are very different, and typically every computer scientist is going to specialize in the specific techniques and methodologies that interest them.

Careers in computer science are also diverse, spanning research, software engineering, technical design (like user interface or website design), technical operations (for example, IT operations or system administration), and data analysis. Institutions from small business to huge multinational corporations employ people in these roles, as do nonprofits, and there are also self-employed consultants. People who major in computer science may end up in other careers that don't require a computer science background but which benefit from it, like software production or digital game design.

Crash

When a program stops running due to an error.

In Python, this is almost always caused by an exception.

Typically, a crash is unintended behavior, and is caused by a defect that the programmer should fix.

Documentation

The comments and docstrings that are part of a program which describe how it works, along with any separate files that describe how it works or how to use it, like a 'README' file.

Documentation can include entire websites (for example, the Python language documentation site), or it may only exist as comments and/or docstrings within the code itself.

Generally, documentation that explains how to use a program is for users, while documentation that explains how a program works is for programmers who might work on the code later, including notes-to-self by the author intended to remind themself of something.

Good documentation is critical to efficient work when programming, and writing the documentation before the rest of the program is good practice as it forces you to write a high-level description of what your code is trying to do before you write the code itself. This strategy can often save time since you'll be aware of certain edge cases or other important things to watch out for from the start.

Error

A problem with or problematic unplanned behavior of a program, including crashes and incorrect results.

Can also specifically refer to an exception.

Not all defects result in errors, since a defect may also cause a program to run slowly or otherwise exhibit undesirable behavior that's not technically incorrect.

When a program crashes, it's usually obvious to the programmer and it produces an error message that can be used to help understand why the crash happened and possibly debug the issue. On the other hand, sometimes a program crashes without an error message, or an error happens without stopping the program. In these cases, it can be more difficult to figure out what happened and fix the issue.

IDE

Integrated Development Environment

A program that allows the user to write code, save it in a file, and also run the code.

Often specific to one programming language; for example Thonny is an IDE for Python. Usually a program could be written using a general-purpose text editor, but an IDE includes features like syntax highlighting that make writing code easier, and its ability to run the program directly (rather than using another command to run the program once saved) makes development smoother.

Program

A sequence of instructions for a computer to carry out to achieve some purpose. Typically defined by a file containing code that contains a sequence of statements that the computer can run to accomplish the program's purpose.

The term program is ambiguous. It is sometimes used to refer to the program's code (for example, in a statement like "the program is 100 lines long"). On the other hand, it can also refer to a process being carried out by a computer as it follows instructions specified by code, (for example, in a statement like "the program crashed").

Programmer

A person who created (or helped create) a program by writing code.

Often contrasted with the users who later use the program.

Programming Language

A set of rules for interpreting written statements as instructions for a computer to carry out.

Typically we write the statements using a specialized editor like Thonny and save them in a flie.

User

A term for the person who is running a program.

When we build and test our own programs, we are both the programmer/designer and the user during testing. Ultimately, we want to design programs so that anyone can use them, and we have to consider how other people might interact with our programs differently than we do.

Window

A region of the screen where you can enter input and see output from a program, usually rectangular.

In modern operating systems for desktop and laptop computers, most programs you run will create one or more windows that you can use to interact with them, and these windows can be dragged around, resized, and even minimized to hide them until they're needed later.

Thonny, the integrated development environment we'll use for this class, has one main window split into several parts for writing code and seeing printed output from your programs, and it uses secondary windows for displaying drawings or for showing extra information when debugging.

Python Basics

Argument

A concrete value which is provided to a function for the function to work with while resolving a particular function call frame. We write argument(s) between parentheses, separated from each other by commas, directly after the name of the function to use. So for example, if we wrote max(3, 4) then the arguments are 3 and 4.

This can also sometimes refer to the expression used to compute an argument value, so for example in print(x + 3) we might say that x + 3 is the argument. But Python simplifies that expression to a value before actually running the function, so it's more correct to say (assuming x is 5, for example) that the argument in this situation is 8.

Note: Functions which don't have parameters don't need arguments, but still need the () after their name when we want to call them.

Click here for more details

When Python calls a function to get its result value, it must first simplify all expressions that are part of the code for the function call. After that, each argument position has a concrete value to be used, and Python sets up a function call frame by assigning each parameter of the function to the corresponding argument value. So for example, if we have the following code:

  def hypotenuse(a, b):
      '''
      Returns the hypotenuse of a right triangle with side lengths 'a'
      and 'b' touching the right angle.
      '''
      return (a**2 + b**2)**0.5

  x = 2
  print(hypotenuse(3, x * 2))

Python first defines the function hypotenuse.
Next it assigns 2 as the value of the variable x.
Next it is asked to simplify the expression print(hypotenuse(3, x * 2)). It does this using the following steps:
1. First, simplify x using variable substitution, so we get: print(hypotenuse(3, 2 * 2))
2. Next, do addition, getting: print(hypotenuse(3, 4))
3. Now, we've got argument values ready for the hypotenuse function call (we're still working on the argument for print). So we can jump over to that definition to get a result before continuing here. a. To prepare our function call frame, we look at our arguments (3 and 4) and match them with our parameters (a and b). We assign a = 3 and b = 4. a. In the hypotenuse there's just one line of code: a return statement with an expression to simplify. So we have to simplify (a**2 + b**2)**0.5. For brevity we'll skip the 6 steps of that process, but the result is the floating-point number 5.0. c. Now that we've simplified the return expression for hypotenuse, we jump back to the place where it was called and replace the function call with its result.
4. We get print(5.0), so we're ready to call print now that we know its argument value.
5. print gets called, resulting in 5.0 appearing in the output. The return value of print is None, and that's the final result of the simplification process for print(hypotenuse(3, 2 * 2)). We didn't store that result in a variable or do anything else with it, so it gets ignored, which makes sense, because it's None.

Assign/Assignment

A type of statement which uses = to create a variable or update the value of an existing variable. Also refers to other situations where the value of a variable is first provided or changed, such as when calling a function or when beginning an iteration in a loop, although these are not "assignment statements," just "assignments."

To simplify an assignment statement, Python first simplifies the expression on the right-hand side of the = to a concrete value. Then it updates the variable named on the left-hand side of = to hold that value.

Click here for more details

Unlike in algebra, the left-hand side of an assignment statement may only consist of a variable name, so something like y + 3 = x is NOT permitted. Also unlike in algebra, multiple assignments to the same variable are allowed, even with different values. Each assignment statement happens when Python reaches that line of code, updating the value of the variable. While evaluating the right-hand side of an assignment statement, Python uses current variable values, as the assignment statement has not yet taken effect. So for example, if we run:

  x = 5
  print(x)
  x = x + 1
  print(x)

...we first create a variable x with value 5, then print the number 5, then update x to hold 6, and then print 6.

Assignment happens automatically during function calls and for loops. When a function is called, after simplifying each argument expression to get argument values, Python creates a function call frame and before running the code of the function, it assigns each parameter to one of the argument values, matching them up by order.

(Python has a few more advanced mechanisms for argument matching, but we don't cover those here.)

Bracket

Square Bracket

Curly Brace

Parenthesis

These words are sometimes used interchangeably in other contexts, but in Python each refers to a specific pair of symbols:

[] are open- and close- brackets, sometimes explicitly called square brackets.
{} are open- and close- curly braces, sometimes just called braces.
() are open- and close- parentheses.

In Python, each of these are used in several different ways depending on the context. For example, brackets are used both to create lists and to do indexing, as well as to look up items in a dictionary. Parentheses can be used for grouping parts of an expression, but they're also used for calling functions and for creating tuples. Curly braces are used to create dictionaries and sets but are also used in format strings.

Call

Invoke

In programming, we use the word "call" specifically to mean a step where the code has to simplify a function with known arguments, and so it needs to go run the code of that function before coming back with a result. "Invoke" is a synonym of "call."

So if you write len('he' + 'llo') the first step is to join the 'he' and 'llo' strings together to form the string 'hello', and the second step is to call the len function, measuring the length of 'hello' to get the integer 5. That first step is just a built-in operation, while the second step is a function call.

Camel Case

Snake Case

Camel case describes a word with internal capitalization that's used to suggest multiple words put together, as in "camelCase."

Snake case describes multiple words joined together with underscores, as in "snake_case."

When naming variables we often want to provide a multi-word description of their purpose or typical contents, like "next number" or "total bananas" or such. However, variable names can only contain letters, underscores, and digits (and they may not start with a digit). One approach to multi-word names involves capitalizing letters in the middle of the name to visually break it into multiple words as in "totalBananas." Another approach is to use underscores instead of spaces to separate words, as in "total_bananas." The latter is called snake case because it was first used as a widespread convention for the Python programming language, in contrast to camel case (evoking the humps of a camel) which had become the convention in older languages.

In this class, we want you to use camel case, because the underscores in snake case make it unwieldy when read by a screen reader. However, when using built-in Python functions, you'll find many of them are named using snake case, or in some cases, you'll find functions with names in all lower case (like "isupper") where it's more difficult to distinguish the word boundaries.

Comment

Comment Out

Comment In

Part of a program file that is not evaluated as a statement but instead contains notes for humans reading the program to help them understand it. In Python, a hash sign marks the beginning of a comment, and anything after that on the same line will be ignored when running the program, as it won't be considered part of the code.

At the start of a file or function, block of text surrounded by triple quotes (''' or """) is also a kind of comment, but these special comments are used in help text and are called "docstrings".

To comment out a line of code, just add a hash sign (#) at the start, turning it into a comment. A commented-out line of code can be re-enabled by removing the hash sign, which we call commenting in. Your editor will have a shortcut key for toggling all selected lines between comments & code. When we comment out code as part of the debugging process, we call this toggling.

Concatenate

Join two pieces of text together.

If we write "ab" + "cd" Python will simplify those two strings into one string "abcd" when it evaluates that expression.

Click here for more details

Concatenation also applies to other sequence types like lists and tuples, which can be joined together into longer lists or tuples.

Docstring

A special kind of comment marked by triple quotes when they occur at the start of a file or a custom function.

See the second docstring entry below for more details.

Dot

A period: .

In Python, periods are used to indicate that one function or variable "belongs to" another value. For example, methods belong to a particular type, and can be used with values of that type by writing a dot after the value and then writing the method name, like this:

word = "loud"
print(word.upper())

We'd read the second line as "print word dot upper."

Evaluate

The process of simplifying an expression to get a result value. Normally, Python runs a program by running each statement from top to bottom, and if a statement consists of just an expression, it will simplify that expression before moving on in the program.

Most other statements involve evaluating part of the statement as well. For example, assignment statements need to evaluate the expression on the right-hand side of the equals sign so they can update the variable named on the left-hand side with the resulting value.

Evaluation is a process that starts with code, and then simplifies it by:

Replacing variable names with the current values of those variables (e.g., x => 3.
Replacing operations with their results (e.g., 3 + 4 => 7).
Replacing function calls with their results, after creating a function call frame and running the function's code to get a result.

The end result of evaluation is always a single Python value, like an integer, floating-point number, or string.

Exception

A situation where Python cannot continue to execute code because the instructions it is trying to follow don't make sense, are in conflict with each other, or ask it to do something impossible.

Also refers to the message Python generates to indicate an exception has occurred. Just like values, exceptions in Python have their own types. A few relevant types of exception are:

SyntaxError - Indicates that your code could not be run at all, because of an instruction that didn't make sense. For example, the code x = "hi is missing a closing " mark, so its meaning is unclear and we can't run it.
TypeError - Indicates that an operation was attempted with a value whose type was incompatible with that operation. For example if we try to evaluate "hi" + 3 we get a TypeError, because the + operator cannot combine a string and an integer: those types are incompatible for that operator.
ValueError - Indicates that an operation was attempted where the types make sense but a specific value used is not compatible with the operation. For example, int('123') works and results in the number 123, but int('abc') fails: the type (string) is permissible, but the specific value 'abc' is not.

There are other more specific types of exception for some situations, like ZeroDivisionError if you try to divide something by zero.

When an exception occurs, Python prints an error message and stops (this is also called crashing). The error message will include a traceback which shows which line(s) of code were active when the error occurred.

Expression

A piece of code that can be simplified to a value. Often part of a larger statement (for example, the right-hand side of an equals sign in an assignment statement is an expression).

An expression alone on a line of code counts as a simple statement where all Python does is simplify the expression (usually it will have some side effects). Expressions as part of other statements each have a specific meaning, depending on the statement they're used with and where they occur.

Smaller parts of an expression are also expressions of their own. So for example, x + 3 is an expression, but (x + 3) * y is a larger expression that includes x + 3 as a sub-expression. Even just individual values when typed in code are basic expressions that evaluate to their values, so 'hello' is an expression whose value is the string 'hello'.

Unlike in mathematics, expressions do not have a fixed result value. The same expression might simplify to two different values when run in different contexts (this is particularly common for expressions that are part of function definitions). If we simplify the expression x + 3 in a context where the value of x is 4, we get 7 as the result. On the other hand, if we simplify the same expression again in a context where the value of x is 0, we get 3. So the same expression can simplify to different values if it's evaluated multiple times in the same program.

Official (highly technical) documentation on expressions.

File

A collection of symbols/data stored together in the computer. We can open different kinds of files with different applications. Notable for this class, files with names ending in .py conventionally contain Python code, and we can open these with Thonny to show us the code and let us edit it and/or run it as a program.

See the file input/output entry for more advanced info on different types of files that becomes relevant when covering file input and output.

Float/Floating Point Number

A number that is stored in a format which allows for a decimal/fractional part, in contrast to an integer which can only be a whole number. This is one of Python's basic data types. So 5.1 and 3.0 are floating point numbers (also called "floats") while 5 and 3 would be integers. Notice that 3.0 and 3 are different representations of the same number. In terms of data type one is a float and one is an integer, but they both represent the same mathematical concept.

A few operations in Python can only deal with floats or integers, while most can deal with either. Notably, the result of / (division) is always a float, while the result of // (integer division) is always an integer. Also, when repeating a string using *, the repeat count must be an integer.

Floating point numbers are stored using limited precision, which means certain values get rounded off and cannot be stored exactly. For example, the fraction 1/3 should be represented using an infinitely repeating sequence of 3's in the decimal part. However, Python truncates this to 16 3's, so it's not an exact representation, and over many operations, slight errors can creep into the math because of this. For example, if you run x = 1/3 to store that inexact number in a variable, and then evaluate x + x + x + x + x + x, you'll get the number 1.9999999999999998 instead of 2.0 like you should. The precise operations used do matter though: x * 6 will give 2.0 as expected. These floating point errors mean that in situations where exact precision is necessary like banking or certain scientific calculations, other specialized data types need to be used.

Function

A unit of code that can be repeated by using it's name followed by parentheses. Many functions require one or more arguments. All functions have a return value, so that when you use them, you get a result back (that result might change depending on the arguments used).

For example the built-in len function computes the length of a string. If you use it by running len('hello') then the string 'hello' gets sent in as the argument, and the return value will be the number 5 since that's how many letters are in 'hello'.

Python has a large number of built-in functions, and you can also define your own. Besides operators which accomplish a few basic operations on values, functions are the main way to get stuff done in Python.

Besides result values, some functions have side effects. For example, the built-in print function causes text to appear in the program's output. In some cases, a function's result value may be ignored, and we use it just to get its side effect.

Integer

A whole number (including negative numbers). This is one of Python's basic data types. Unlike a floating-point number, an integer cannot have a decimal part (not even a .0), so 4 is an integer but 4.5 and 4.0 are floating point numbers.

Python stores integers of arbitrarily large size, but using extremely large integers (e.g., integers with 20+ digits) may be slower than using smaller integers.

Keyword

A reserved word in Python syntax that has a special meaning, such as 'def,' 'if,' or 'for.'

These words cannot be used as the name of a variable even though they meet all the normal requirements for variable names. Each has a special meaning but needs to be used in exactly the allowed way for it to work.

This official documentation page lists all Python keywords.

Memory

Storage capacity in a computer. It is used to store the current values of variables, and it is also used to store the code the computer is running.

Memory may include temporary memory which gets reset every time the computer shuts off (which is used for most things a program does) as well as permanent memory that will remain even when we restart the computer (used to store files). We call the temporary memory "RAM" which stands for "Random Access Memory" and the permanent memory a "hard disk" or "hard drive."

Memory is measured in bits (each bit can hold a single 1 or 0) or in bytes (each byte is 8 bits). In Python, the exact amount of memory taken up by different kinds of values is very complex, and usually depends on what the value is, not just on its type.

We draw memory diagrams to indicate abstractly what values are currently stored in memory by a program.

Memory Diagram

A diagram that shows how Python stores values in the computer's memory. We use boxes with text labels to show the contents of each variable, and inside the box we either write a 'small' value like an integer or float, or an arrow pointing to a 'big' value like a string. The Introduction to Python lecture covers basic memory diagrams, and more advanced memory diagrams are covered when we cover things like lists.

None

NoneType

None is a special value in Python used to mean "there isn't anything here" in places where a value is required by the structure of the language. For example, all functions must return a value, but if a function doesn't really have any value to return, it can return None to indicate that. None still counts as a real value and can be stored in a variable just like other values. It has its own special type called NoneType.

Operator

A symbol that instructs Python to perform a specific simple operation on one or more values, like + or %. The operators section of the quick reference contains a list of key operators in Python.

Most typically, an operator somehow combines the values on its left and right sides into one value during simplification. For example, the + operator adds two numbers together (or concatenates two strings). During simplification, both values and the operator are combined and then that entire part of the expression is replaced by the result. For example, 3 + 4 * 5 first simplifies the 4 * 5 to 20, so it becomes 3 + 20 and then simplifies that to just 23.

As in normal mathematics, there's an order of operations which determines which operators apply in what order, and you can use parentheses to specify what should happen first. The order of mathematical operators matches that of normal math, with tied operators getting evaluated from left to right. Python has a number of programming-specific operators, this operator precedence table specifies the total precedence order of all operators that Python supports.

Official (highly technical) documentation on expressions, including operators and "primaries."

Output

The text displayed when we run a program. The print function causes the argument(s) you give it to be displayed so that someone running the program can see them. "Output" refers to all of the text that gets displayed this way over the course of the program. Unlike values stored in variables, output is not something that we can perform further operations on or refer back to (at least, unless we capture it somehow). Since output is not a value, it does not have a type: when we output the number 12 we get the same thing as when we output the string '12'.

Print

Cause text to appear as part of a program's output. Also the name of the built-in print function which accomplishes this for us. Instructions might say "write some code that prints 5" in which case we could accomplish that by writing print(5).

Quotes / Quotation Marks

Quotation marks (either single quotes ' or double quotes ") are used in Python code to indicate that part of the code should be treated as a literal block of text (a.k.a., a string). Anything in between quotation marks is not interpreted as program instructions, but instead is used as-is as part of a string value. You can use either a single or double quote to start a string, and then you must use the same mark to end it (and it must end on the same line of code).

Python has an alternative for multi-line text: triple quotes. By writing three single or three double quotes in a row, like ''', you can start a string which only ends when you write three more of the same kind of quote in a row, even spanning multiple lines of code. When a triple-quoted string spans multiple lines, the text includes those newline characters and will print out as a multi-line string as well.

Run / Execute

Tell Python to follow some code instructions step-by-step. We can run either an entire file as a single program containing multiple lines of code, in which case Python starts at the top and works its way to the bottom, or we can use the shell to ask Python to run just a single line of code at a time.

As Python follows the provided instructions, it identifies the type of statement used on each line, and then follows special rules for that statement to determine what it does. For example, when it encounters an assignment statement, it simplifies the right-hand side of the equals sign and then updates the variable named on the left-hand side.

Once Python reaches the last line of code (or if it encounters an error or runs the special exit function) it will stop. If something prevents any of those things from happening, it may continue until you force it to stop.

Normally, when running code in a file, we start fresh each time, forgetting any variables defined by a previous run of the code. In contrast, when working in Jupyter notebooks, running a cell will not start from scratch, but will retain any variables already defined by other cells that we ran before. This means that in some cases, certain cells must be run before others in order to assign the variables those other cells depend on. Restarting the kernel in a Jupyter notebook will erase any existing variables and reset things.

Shell

An interactive window where you can enter Python code one line at a time and immediately see the results of each line. Typically the prompt for code will be >>>, and if you just type an expression instead of a more complex statement, Python will display the result of that expression automatically (in contrast to running things in the shell, this automatic display of results does not happen when running an entire program at once).

Side Effect

Something that happens as a result of calling a function other than fetching a result value and simplifying the function call. This can be anything from displaying text in the output window to drawing a line or playing a sound.

In some cases, a function can have a side effect which modifies the value of a variable, although this does not normally happen since arguments are sent to a function as values, not variables. When a mutable value is used however, a function may change it and that would be a type of side effect.

Simplify

Another term for evaluate.

The process Python follows when evaluating an expression is similar to the process that is used when simplifying an expression in algebra. Note that Python only does simplification not solving, so it won't do things like subtract a value from both sides of an equation to figure out what a variable will be. The programmer has to do all the solving stuff and leave Python with just simplification steps. So the code y + 2 = x + 6 is invalid and will result in an error, but if we have already defined x = 10 and we write the solved version of the equation as y = x + 4, Python is able to simplify in two steps x + 4 => 10 + 4 (variable substitution) and then 10 + 4 => 14 (addition).

Once an expression is fully simplified, Python will usually do something with the result (in the example above, we'd store the result 14 in the variable y, throwing away any old value that y might have had previously).

The result of simplification is always a value. In algebra, simplification might leave you with variables that you don't know the values of yet. However, in Python if the simplification process encounters a value with an unknown value, Python will stop and print out an error message.

Statement

A piece of code which specifies a single instruction for the computer to follow. There may be multiple steps to the instruction (like a big expression that will take lots of steps to evaluate) but each statement has a specific meaning. For example, an assignment statement sets (or updates) the value of a variable, or a definition statement begins the definition of a new custom function.

A program is made up of multiple statements in order, and Python executes each statement one by one from top to bottom. A statement can be just an expression, in which case Python just simplifies that expression and discards the result, but other kinds of statements each have some special effect.

String

A piece of text stored as a single value. Strings are one of Python's basic data types and there are several operators that can be used with them, like + for concatenation.

Syntax

The rules for how symbols need to be arranged in order for Python to successfully interpret them as code. For example, one rule is that for an assignment statement the left-hand side of the equals sign may only contain a variable name. If you write x = 3 you are obeying this rule, but if you write 3 = x you are violating it and you'll get a SyntaxError. In general, the syntax rules of Python are very strict, so you have to get used to following them in order to get the computer to follow the instructions you intend to give it.

Traceback

A listing that names specific line numbers in specific code files indicating which line(s) of code are active (usually displayed when an error occurs). Multiple lines of code can be active at once because when a function gets called, the line of code calling the function remains paused while the code in the function is running.

The top of the traceback starts with the first line of code to activate, and as you read down the traceback, you're seeing more and more recent lines, ending with the final line of code which is what Python is actively [evaluating][#evaluate) right now (or when an error occurred). In addition to file names and line numbers, the traceback includes the actual code of the line that's active, to help you see what's going on quickly. An example traceback might look like this:

Traceback (most recent call last):
  File "...\traceback.py", line 17, in <module>
    print(outer(4, 'hello'))
  File "...\traceback.py", line 7, in outer
    r2 = inner(b)
  File "...\traceback.py", line 14, in inner
    return x * (x - 1)
TypeError: unsupported operand type(s) for -: 'str' and 'int'

The '...' here would normally include the full path of folders containing the file where the error occurred.

Click here for tips on debugging custom functions using tracebacks.

You can often infer a lot from a careful study of a traceback.

From this traceback we can see that on line 17, outer was called with 4 and 'hello' as arguments, and that line remained active while we ran the code in the outer function. outer itself on line 7 of the same file called inner, during which, on line 14 of the file, we ran into an error. On that line, we are trying to subtract 1 from x, but we get a TypeError because one of them is a string and the other is an integer.

In this case, the only subtraction on the active line was in x - 1 and since we can see that 1 is an integer, we can infer x must have been a string. We can't see where x came from in the traceback, but it's suspicious that on line 17 we are using both an integer and a string as arguments. We'd have to look at the code to know for sure, but it's possible that 'hello' on line 17 should have been an integer instead of text. In debugging based on this traceback, you should first look at line 14 to see where x comes from, looking back upward in the code to fine where x was most recently assigned a value. If x is defined as a parameter of inner, then we should look at line 7 of the file, where b is used as the argument to inner: where does b get its value? Again if b is a parameter of outer, we should then look at line 17, where outer is being called with two arguments. If b is the second parameter in the definition of outer, then we can conclude that 'hello' on line 17 should not have been a string.

Type

A category of data that Python can store. Every value has a type, and the type changes how it works when operators are applied. For example, '3' is a string whereas 3 is an integer, so '3' + '3' is '33' (concatenation of strings) while 3 + 3 is 6 (addition of integers). Functions like str, int and float can be used to convert between types, although there are limitations as to what values can be converted (e.g., you can't convert 'hello' to an integer, but you could convert '123').

Python will give you a TypeError if you try an operation on incompatible types, like '15' + 15.

Underscore

The 'underline' character _.

Underscores can be used as part of variable or function names in Python, and some built-in function names use them. In this class we prefer using camelCase to using underscores because underscores make for very long names when read by a screen reader.

In some circumstances, Python uses two underscores at the beginning and end of a name that has special meaning. This design makes it unlikely that someone would accidentally give their variable the same name as a special built-in value.

Value

A concrete thing stored in a variable or which is computed as the result of evaluating an expression. For example, 3 or 'hello'. Values are the results of expressions, but an expression like 3 + 3 is not itself a value (and thus cannot be stored in a variable). Every value has a type which influences how operators apply to it.

Variable

A name used to store and refer back to a specific value, but it may also be reassigned a different value, so its value may change as a program runs. An assignment statement is used to define a new variable or give an existing variable a new value. Python fully simplifies the right-hand side of the statement before storing the result in the variable.

For example, if we have the code:

whale = 3
giraffe = 2 + 8
whale = whale + 1

we have defined 2 variables named whale and giraffe. The value of whale starts as 3, but then it changes to become 4 once the third line runs (the old value is used when simplifying the expression for a reassignment). Meanwhile, giraffe has the value 10.

Note that variables in Python behave differently than variables in algebra. In algebra, a variable has a single value and we might have multiple equations that help us narrow down that value, but it's impossible to say something like x = 3 and also x = 4. In Python, it's fine to say x = 3 on one line of code and x = 4 on another line of code: the variable takes on one value, and later changes its value to a different value. This means that in Python, the question "What is the value of variable x?" is incoherent: you need to instead ask "What is the value of variable x when we're on line y of the code?"

Variables also behave differently than cells in a spreadsheet. In a spreadsheet, a cell can have both a formula and a value. If it has a formula the value of the cell is computed using that formula, and when any of the other cells used in that formula change values, the formula is re-evaluated. So if cell A2 has formula "=A1 + 1" and cell A1 changes value from 2 to 3, cell A2 in a spreadsheet would change value from 3 to

However, Python variables do NOT work like that. If we have the code:

car = 10
bike = car * 2
car = 12

on line 1, we assign a new variable car the value 10. On line 2, we simplify car * 2 to the value 20 and store that in a new variable bike. But bike doesn't remember the formula that was used, only the value. So when we change car to 12 on the next line of code, bike does NOT change to 24, but instead remains 20. In Python if we want to re-compute a formula, we have to include another line of code which re-runs that formula. So in the following code, the final value of bike will be 24:

car = 10
bike = car * 2
car = 12
bike = car * 2

Importing and Modules

Import

Loading code from another file for use in your program. The import statement specifies a name to import, and Python looks for a file with that name plus a .py ending. So if you say import fox Python will look for a file called fox.py. When you import a file, Python runs that file's code, and then collects any variables it defines (including functions) into a module. Functions from that module can be accessed using the . operator, so if the fox.py file defines a function called play, we can say fox.play to refer to that function after we do import fox.

Python will look for the file to import in the current directory as well as in other directories where built-in modules and modules installed via the package manager are located.

Module

A program defined in a file which can be imported by another file. Variables and functions defined by this program can be used by another program which imports them ("module" also refers to this collection of variables and functions).

A module will often not actually do anything when run by itself, but instead just define many functions that other programs can use.

Dealing with Text

Backslash

The \ symbol (whereas / is a "forward slash"; yes, the distinction is arbitrary).

It has a special meaning when it occurs inside a quotation marks: it changes the meaning of the next character to an alternate meaning. For example in \n the 'n' becomes a newline character. Other special characters include:

\t is the "tab" character, which is also what you get if you hit the 'tab' key.
\r is the "carriage return" character, which moves back to the beginning of the current line of text before continuing to print.
\b is the "bell" character, which sometimes makes a sound when it gets printed (but lots of computers don't do that any more, sadly.
\" and \' are normal quotation marks, but they DON'T end a string as they usually would. So the code 'don\'t' is a string that has a quotation mark inside it, even though without the backslash 'don't' would be an error since Python thinks that the quote between the 'n' and the 't' is the end of a string 'don' and then doesn't know what to do with the extra 't' and quote.

There are many more special characters available, but we won't cover them all here.

Another use of backslash is that if it's NOT inside quotes and it's at the end of a line of code, then it prevents the end of the line from happening normally, and so the next line of code counts as if it were part of the line above it. For example, you can write:

x = \
  5

and that will work the same as writing x = 5 on one line. For this application, the backslash MUST be the very last character in the line.

Character

A single letter or other symbol within a string, including special things like a space or a line break.

In Python, unlike some other programming languages, there is no separate type to represent a single character. Instead, we can use a string with just one character in it.

Format String

A string that includes an 'f' before the opening quotation mark, allowing the use of curly braces inside the string to interpolate expressions to form a final string value.

For example, if we write:

x = 5
st = f'x = {x} and x + 2 = {x + 2}'

The final value of the st variable will be computed by simplifying the expressions inside the curly braces within the string, converting the results to strings, and then replacing the curly brace regions with those results. Since the first set of curly braces contains the expression x which simplifies to 5, and the second set has the expression x + 2 which simplifies to 7, the value of st will be the text "x = 5 and x + 2 = 7".

Format strings are a convenient way to assemble messages from multiple variables or complex expressions, where otherwise you'd have to manually convert the results of those expressions to strings and use concatenation to combine everything. Without using a format string, to achieve the same result you'd have to do:

x = 5
st = "x = " + str(x) + " and x + 2 = " + str(x + 2)

Format strings are often used as the final step in formatter programs.

Hash Sign

The '#' symbol.

In Python, everything following a hash sign until the end of that line is considered a comment. This effect is ignored if it occurs inside of quotation marks of course, so you can still use hash signs in strings.

Line Break

Newline Character

When dealing with strings, we sometimes want to include multiple lines of text. A line break occurs between each line. To represent this, there's a special character called the newline character that represents moving down to the beginning of a new line. This can be written in code as '\n', where the backslash means "next character has a special meaning" and the special meaning of 'n' is "new line."

For example, if we write "red\nblue\nyellow" in our code, the corresponding value has the words "red," "blue," and "yellow" on three separate lines, and will appear that way if we use it with the print function (in other contexts it may be displayed with quotes around it and \n inside just like it was entered).

Python also allows for triple-quoted strings see quotes that can span multiple lines and therefore include line breaks naturally.

String Interpolation

The process of putting values together along with fixed scaffolding into a single string.

For example, if we do:

x = 5
print("The value is: " + str(x))

We are interpolating the value of x into the string that we print.

The term interpolation also refers to blending between two values using a number to control the result.

Defining Functions

Argument

See above.

Block

One or more lines of code which are all indented at the same level as each other, or are further indented than the base level. Function definitions are based on indented blocks, among other things. For example in this code:

def addThree(n):
    n = n + 3
    return n

the lines n = n + 3 and return n are both part of the same indented block.

One block may contain smaller blocks when there are multiple levels of indentation. For example, in this code there are three blocks:

n = 43
if n &gt; 50:
    if n % 2 == 0:
        print('even')
    else:
        print('odd')

The if n % 2 == 0 line is the start of a block which encompasses the next 4 lines (including the print lines).
Each of the print lines is its own block (both of which are also part of the first block).

Body

An indented block of code which follows a special kind of statement such that the statement above it controls that code. For example a definition statement uses the indented block of code below it as its body. For a function definition, we run the body code when the function is called.

Call

See above.

Dead Code

Code which will never execute because of where it is placed in the program structure.

For example, since a return statement stops execution of the current function call frame, any code at the same indentation level after a return statement is dead code: based on its placement, it should run next after the return happens, but because return will stop the function it's in, the code will never be reached.

Dead code is almost always a mistake and should usually be removed (or the program should be restructured so that it's reachable).

Definition/Define (functions)

A special statement that defines a custom function using the keyword def. The definition must include a function name followed by parentheses, and any paramters the function will require must be given names inside those parentheses. The line must end with a colon, and then at least the next line of code must be indented farther to the right. That line, and any subsequent lines indented at the same level or further, will become the body of the function.

After a custom function is defined, it can be called just like any other function: write its name followed by parentheses, and supply any necessary arguments within the parentheses. When a function is called, we run the body code, and then simplify the function call by replacing it with the return value of the function. In some cases, we will use that result, while in others it may be ignored and we just want to call the function for its side effects.

For example, in this code:

def plural(word):
    return word + 's'

print(plural('whale'))

the name of the function being defined is plural and it has one required parameter called word. When we run it, it just evaluates a single return statement that determines the result value by concatenating the argument value with the letter 's.' After the definition, we call it as part of an expression that uses the built-in print function. Python will simplify that last line by doing the following:

Starting from print(plural('whale')) it first figures out that 'whale' is just a piece of text. Having simplified the argument expression, it's ready to actually callplural(whose result is needed for theprintcall, soplural` must happen first).
To figure out what plural('whale') will simplify to, Python runs the body of the plural function until it gets to a return statement, and then returns the value it gets by simplifying the expression after the return keyword. In this case, since 'whale' was the argument, within the function the variable word will have the value 'whale', so when we simplify word + 's' we get 'whale' + 's' and then that simplifies to 'whales'. That becomes the "return value" of the function.
Now that we're done with the function, we can replace the function call code plural('whale') with the result that we got. So print(plural('whale')) becomes print('whales').
Finally, we are ready to call print, which displays the word whales as output, and simplifies to the value None.

Here is the highly technical official documentation for function definitions.

Docstring

See above.

Docstrings will automatically be displayed when the help function is used on the module or function they belong to.

Every custom function you define should include a docstring, and writing the docstring before you write the function's code is a good way to force yourself to be more deliberate in your planning, which will usually save you time in the end.

A docstring for a custom function may contain doctests.

Doctest

An example of how a custom function should work that's written as part of a docstring. It's written using three > symbols to emulate the Python default prompt, plus some code followed by the expected results of that code, exactly as if that code had been run in the Python shell.

So for example, if you are using the Python shell to do some math you might see the following:

&gt;&gt;&gt; 3 + 4
7
&gt;&gt;&gt; 5 * 2
10

If you copied those four lines and pasted them into a docstring, then if you wrote the following at the end of your file:

import doctest
doctest.testmod()

Python would run the code after the >>> lines and confirm that the results matched the two result lines, printing out an error message if it did not. Here's a complete example of a custom function definition that includes a doctest as part of its docstring:

def hypotenuse(a, b):
    """
    Computes the hypotenuse of a triangle with near sides a and b.
    Returns a floating-point number.

    For example:

    &gt;&gt;&gt; hypotenuse(3, 4)
    5.0
    &gt;&gt;&gt; hypotenuse(6, 8)
    10.0
    &gt;&gt;&gt; hypotenuse(2, 3)
    3.605551275463989
    """
    return (a**2 + b**2)**0.5

# These lines run all doctests for functions in this file
import doctest
doctest.testmod()

Doctests are useful because you can confirm that your code will work in a variety of situations. Writing doctests before you write your function can be a good way to plan ahead for the different kinds of inputs/arguments you will need to handle.

Note that your output must match what Python would display EXACTLY, although you can enable a special "ELLIPSIS" setting to get around this for certain things. Refer to the official documentation on the doctest module for more details.

Frame (Function Call Frame)

A separate unit of memory used when evaluating the result of a function. Each time a function is called Python creates a new function call frame, and the frame is destroyed when the function ends and returns.

Local variables used by a function, including its parameters, are stored in this function call frame, which means that they are separate from variables used by other functions, even if they share the same name.

Even when calling the same function multiple times, each call gets its own separate function call frame. This feature makes recursion much more powerful, since multiple function call frames for the same function can stack up and may include different values.

Indentation

Indentation is space and/or tab characters at the beginning of a line of code.

In Python, indentation determines which block of code each line belongs to. For code in the same block, indentation must be consistent (or must be more than the base indentation level for sub-blocks).

Theoretically you can pick any number of spaces and/or tabs for indentation, but practically using four spaces for each block level is a widespread convention that you should almost always follow.

Local Variable

Non-Local Variable

Any variable that's assigned a value within a function, including all parameters. These variables are stored in the function's function call frame and have separate values from other variables (even other variables with the same name). These variables are forgotten when their function call frame ends.

Note that when a function refers to the value of a variable that it does not assign (and which is not a parameter), we call this a "non-local variable." Functions are not allowed to modify the values of non-local variables, if they include an assignment statement for a non-local variable, then that creates a new local variable with the same name instead. It's an error to first refer to a non-local variable you have not yet assigned, and then assign a local variable with the same name. For example, this code will generate an exception:

x = 5

def f():
    print(x)  # error on this line (local x not yet defined)
    x = 3  # this line implies x MUST be local
    print(x)

f()

In contrast, the code below does not generate an exception, but instead defines x as a local variable in function f and separately uses x as a non-local variable in function g:

x = 5

def f():
    x = 3  # local variable; doesn't affect non-local variable x
    print(x)  # prints 3

def g():
    print(x)  # prints 5 (non-local variable)

f()
g()

Click here for more details

These should usually be avoided, but the keywords global and nonlocal can be used to declare that you will be using a non-local or global variable so that you can modify its value instead of creating a new local variable when you do an assignment statement. global refers to variables defined outside of any function definition, while nonlocal can be used when multiple functions are defined inside each other to refer to the closest local variable that's outside the current function definition.

We will not use these keywords in this class, nor will we ever need to define functions inside each other.

Here are the official reference pages for the global and nonlocal statements.

Parameter

A special kind of local variable whose name is included in the definition of a function and which gets its value from an argument passed to the function when the function is called. For example, consider the following code:

def play(squirrel, robin):
    print("Squirrel start", squirrel)
    print("Robin start", robin)
    squirrel = squirrel + 1
    robin = robin * squirrel
    print("Squirrel end", squirrel)
    print("Robin end", robin)
    print("Result is", squirrel + robin)
    print("---")

play(3, 4)
play(5, 2)

In this code, we define one function named play, and then we call it twice. Each separate call to a function will set its own values for the parameters of that function. On the def line, we wrote squirrel, robin inside the parentheses, so that defines play as a two-parameter function. Any time it is called, two arguments must be supplied, otherwise play's two parameters would not be able to get their values. So calling play as play(1) or play(1, 2, 3) would generate an exception since play needs exactly two arguments. The values of squirrel and robin are assigned based on the values of the two arguments for each particular call made to play, using the order of arguments and the order of parameters to match them up. So in the first function call above, squirrel is 3 and robin is 4, while in the second function call squirrel is 5 and robin is 2. Parameters are very powerful because they allow us to run the same code with different starting variables just by providing different arguments to a function.

Note that like any other local variable, an assignment statement can be used within a function to update or replace the value of a parameter.

For the code above, the output will be:

Squirrel start 3
Robin start 4
Squirrel end 4
Robin end 16
Result is 20
---
Squirrel start 5
Robin start 2
Squirrel end 6
Robin end 12
Result is 18
---

Parameters are a bit of a slippery concept. You can try copying the code above into your editor and running it with various changes:

What happens if you change the values used as arguments?
What happens if you change the name of a parameter?
What happens if you change the number of parameters used?

Pass

Send an argument into a function so that one of that function's parameters gets a certain value. This happens when we call a function.

For example, when we write:

def f(x):
    return x + 1

print(f(3))
print(f(10))

On the 4th line, we are passing the value 3 as the parameter x, and on the 5th line, we are passing the value 10 instead. We could say something like "When 3 is passed to f, the result is 4."

Note that there is also a pass statement which is a statement that does nothing. It's useful when Python requires a statement but you don't want the code to actually do anything.

Return

Generate a result value from a function.

The result gets sent back to where the function was called so that evaluation can proceed there, since during evaluation we need to replace function calls with their results. We use the return statement to set the result will be: the expression after the word return gets evaluated and the resulting value becomes the result of the function.

If there is no expression after return, then the value None is used as the result value, and this also happens if we reach the end of a function's body without executing a return statement.

If we do execute a return statement, we immediately return to the code that called the function, destroying the function call frame whose result we now know. This means that any code in a function's body that happens after a return statement is dead code that will never be executed. This also means that using return will stop progress through a chained conditional statement or a loop.

Official documentation for return statements.

Booleans and Conditionals

Boolean

A type of value which is either True or False, with no other possibilities.

Boolean values are used to represent the outcome of tests or decisions, so for example the expression 3 y and then bigger will hold either True or False as its value, depending on what the values of x and y where when evaluating that assignment.

Branch

A block of code associated with a particular conditional.

Particularly when multiple if/elif/else statements are chained together, each of the associated blocks is sometimes referred to as a "branch" or "case." For example, in this code:

if n &gt; 5:
    print(2 * n)
else:
    print(n - 1)

...we might refer to the first print as being the "n > 5 branch" or the "greater than five case" or the like.

Case

Can be used as a synonym for branch.

Case can also refer to a particular set of arguments or other specific program state in which a program produces a certain output or crashes due to a certain error. For example, you might hear things like "In the case where x = 2..."

Chained

A situation where two or more if/elif/else branches of a conditional statement are placed next to each other such that control flow for the later branch depends on the earlier branch. For example, an elif placed after an if can only trigger if the condition of the if is falsey. Chaining is limited by certain rules, for example, an if never chains with the block above it.

See conditional for full details.

Condition

An expression which is used as part of a conditional statement to determine what will happen.

Conditions are also found in other kinds of statements, like while loops.

Typically, a condition should simplify to a boolean value, but if it doesn't, the decision that depends on the condition will be made based on whether the value is truthy or falsey.

Conditional

A statement which uses one of the keywords if, elif, or else to control the flow of execution within a program.

Normally, lines of code are executed in order from top to bottom, perhaps jumping around to figure out the results of function calls. When if, elif, and/or else come into the picture, certain blocks of code may be skipped depending on the values of condition expressions. For example, with if, when the simplified value of the condition is truthy, the block of code after the if statement gets run normally, but if the value of the condition is not truthy, that block of code gets skipped.

Click here for more details

if, elif, and else statements at the same level of indentation form connected groups: if statements chain with elif or else statements directly below them, elif statements chain with if or elif statements above them and with elif or else statements below them, and else statements chain with if or elif statements above them. The connected statements form a chained conditional. Note that if statements do NOT chain with statements above them, nor do else statements chain with statements below them, and any statement other than if, elif or else at the same indentation level will break a chain. For example, the following code contains 5 conditional statements grouped into 3 chained conditionals:

if A:
    print('A')
if B:
    print('B')
elif C:
    print('C')
if D:
    print('D')
else:
    print('E')

The first if is alone, because the if below it cannot connect with anything above it. That second if does connect with the elif below it, but that elif cannot connect to an if below itself. That third if connects to just the else below it.

The rules of control flow for a chained conditional are:

Check the first condition on the initial if (every group begins with an if). If it's truthy, run that first block of code and skip all the rest.
If the condition of the first statement was not truthy, repeat this process starting from the second statement in the chain, again skipping all the rest if that statement's condition is truthy.
If you get to an else without executing any block of code yet, run the else block (and then you're done with that chain, since else can only appear last in a chain).
If you get to some other code that's not an elif or else at the same indentation level as the conditionals in the chain, then the chained conditional is over and you resume running that code normally. This includes the case where there's an if on the line below the final block of a chained conditional, since if will NOT connect to conditionals above it.

As an example, if we look at the code above and assume that A and C are both True while B and D are False, it would check A, then print 'A' as a result of that check, then it would check B, and since B is False, it would skip printing B and check C, whose value would cause it to print C. Then it would check D, and since that's not True, it would run the else block (which only depends on the check of D, not any of the other checks) and it would print 'E'.

Control Flow

The order in which lines of code are executed when running a program. A "control flow statement" is a statement which alters the normal flow of control.

Normally, after executing one statement, Python moves down to the next statement in the code ("normal control flow"). However, special statements or expressions like function calls or conditionals can cause something different to happen.

Edge Case

A specific set of circumstances (case in the second sense) which is either unlikely to happen, or which despite being likely, causes weird behavior or represents an odd configuration of inputs.

For example, pages of a book are not usually given negative numbers, so a negative number as input to a function which expects a book page as the argument might be an edge case. It often takes extra effort to anticipate and correctly handle edge cases, so errors happen more frequently in edge cases.

Equivalent

A situation where two values are similar enough to count as the same value when using the == operator, either because they are actually the same, or because they are different values but their structures/contents are effectively the same.

Contrast == with the is operator, which ignores equivalence.

When using the == operator, the result is True if the two values being compared are equivalent, and False if not (the result is always a boolean). Some examples:

3 == 3 - Values are the same, so are equivalent. Evaluates to True.
3 == 3.0 - Values are not identical, but are equivalent. One is an integer and another is a floating point number but they represent the same underlying number, so they count as equivalent. This evaluates to True.
3 == 3.1 - Values are neither identical nor equivalent. This evaluates to False
1 == True - Values are different types, but are considered equivalent. This evaluates to True. 0 == False also evaluates to True.
2 == True - Values are different types, and are NOT considered equivalent, even though both are truthy.
[1, 2] == [1, 2] - Values are two different objects, but their contents are equivalent, so they are considered equivalent. This evaluates to True.

Mutually Exclusive

A situation where different parts of the code are set up such that only one of a group can be run.

Usually conditional is used to achieve this. Using an if statement followed by one or more elif statements, because of how if and elif statements force Python to skip subsequent elif and/or else blocks if the condition of one above them is truthy, the blocks of code will be mutually exclusive. Note that using several if statements in a row does NOT make those blocks mutually exclusive, since the conditions of subsequent ifs will be evaluated whether or not earlier ifs were true or false, so it's possible to have multiple blocks of code run if multiple conditions are true.

The use of return statements (which stop execution of further statements in the same function) can also be one way to achieve mutual exclusivity, as can careful structuring of multiple conditions (even in cases where code isn't sequential). For example, in the following code, the print('A') and print('B') are mutually exclusive, because it's impossible for the conditions of their respective ifs to both be satisfied.


x = 5
if x &gt; 10:
    print('A')

print('middle')

if x

Predicate

A function which always returns a boolean.

A predicate can have any number of parameters, but usually has at least one. Calling a predicate gives a true/false answer about its argument(s), so it can be thought of as a procedure that provides the answer to some question about its argument(s). For example, we might define:

def isEven(num):
    return num % 2 == 0

This function is named "isEven" because it answers the question "is this number even?" and if the number is even, the result will be True while if the number is not even, the result will be False. Predicates are often used in conditions.

Truthy/Falsey

A property of all values which determines what happens when they're the result of a condition.

Values which count as "truthy" cause the code that included the condition to do one thing (e.g., execute a block of code below an if) and values which count as "falsey" cause something else to happen (e.g., skip that block). Normally, we should use boolean values in these situations, where True is truthy and False is falsey. However, every value counts as either "truthy" or "falsey:"

For integers and floating point numbers the number 0 (or 0.0) counts as falsey, and all other numbers (including negative numbers) count as truthy.
For strings, the empty string '' counts as falsey and all other strings count as truthy (the empty string has no characters between the quotation marks, not even a space, so its length is 0).
The special value None counts as truthy.
Most other special values (like functions) always count as truthy.

Some truthy values count as equivalent to True, and some falsey values count as equivalent to False, but these equivalences don't hold for all truthy/falsey values (for example, 2 is truthy, but 2 == True simplifies to False so it's not equivalent to True.

Sequences and Loops

Collection

A group of individual values.

Unlike a sequence a collection does not necessarily have an order (or that order is incidental). Values in a collection are often called elements.

Construct (noun)

The general term for different kinds of statements. Refers not just to the line of code where the statement appears, but to all lines of code controlled by that statement too, and can be used to refer to a combination of statements in a common pattern.

For example, both conditionals and loops are code constructs. We might also talk about an if/elif/else construct composed of multiple statements.

Element

One value that's part of a larger collection.

Since sequences are a kind of collection, this also applies to values in a sequence, where the elements may be identified by their indices.

For Loop

A loop using the for keyword which iterates its loop body once for each element in a specific collection.

Often used with strings, lists, ranges, and other ordered sequences, as it will go through items in order one at a time. For example:

for x in [1, 2, 3]:
    print(x)

In this for loop, x is the loop variable and [1, 2, 3] is the loop sequence. Python will run the print line three times with 1, 2, and then 3 as the values of x during each run respectively.

For loops get their name from the fact that they iterate "for each" element in their loop sequence.

Index

An integer indicating a position within a sequence.

In Python, these numbers start at 0 for the 1st position, 1 for the second position, and so on. Python in most situations also allows for a negative number to refer to one of the last items in a sequence, so -1 refers to the last item, -2 to the second-to-last, and so on.

Quite often when dealing with sequences, we will use an index variable along with an indexing operation to read and/or update values in the sequence.

Indexing

The act of pulling a specific item out of a sequence using an index.

Python uses square brackets for this, written directly after the value (or expression) that we want to pull an item out of. For example:

index = 0
nums = [1, 2 3]
one = nums[index]

In this code, the square brackets on the second line are NOT an indexing operation, but instead define a new list. Then the square brackets on the third line index that list, pulling out the first item.

Iteration

The process of running some code repeatedly. Also refers to a single repetition within that process.

For example, "solve this problem using iteration" is the first sense, while "what is x on the first iteration of this loop?" is the second sense.

An iteration table can be used to show the values of key variables on different iterations of a loop, with one row of the table per iteration of the loop.

The verb iterate refers to this process.

Iteration Table

A written table which shows the values of key variables on different iterations of a loop.

For example, "solve this problem using iteration" is the first sense, while "what is x on the first iteration of this loop?" is the second sense.

An iteration table can be used to show the values of key variables on different iterations of a loop, with one row of the table per iteration of the loop.

List

A mutable data type that holds multiple values in a fixed order.

Lists are a kind of sequence. Each entry in the list holds a single value, just like a variable can, and just like a variable, it can hold any type of value. Often, however, we use a list where each entry has the same type. Lists are defined using square brackets, with commas to separate the values. You can use expressions in the brackets and Python will simplify those expressions and store the resulting values in the list.

Some examples:

[1, 2, 3] is a simple list where each of the 3 elements is an integer.
['ab', 'c', 'hi'] is a list of strings.
[1, True, -4.5, 'st'] is a list that has multiple different types of values.
[[1, 2], [3, 4]] is a list with two values, each of which it itself a list (of two values). We call this a nested list. The inner lists are sometimes called "sublists."

Values in lists can be accessed via indexing which uses square brackets that appear immediately after another expression (often a variable reference). For example:

nums = [1, 2, 3]  # creates a list
nums[0] = 7       # uses indexing to modify an element in the list
print(nums[1])    # uses indexing to access an element in the list

In this code, nums[0] = ... changes the value of the first element of the list, while nums[1] refers to the value of the second element (without changing it since there's no equals sign).

In contrast to strings (which are also sequences), lists support several methods that can change their contents, including append, insert, and pop.

Since each list entry is like a variable, in a memory diagram each list entry gets its own arrow (or holds a simple value like an integer or Boolean).

Loop

A code construct that allows you to repeat a block of code.

The two types of loops in Python are for loops and while loops. Both types have a loop body which is a block of code that gets repeated.

Loop Condition

In a while loop the repetition is controlled by a loop condition which appears after the word while and before the colon in the while statement at the top of the loop. Before the first and each subsequent iteration, the condition is simplified, and if it is truthy then the iteration proceeds, but if it's not, the loop stops.

Because the loop continues to iterate until the beginning of a new iteration when the condition is false, it can also be called a continuation condition.

Note that the condition is only checked at the beginning of each iteration. If the condition becomes false during the execution of the body of the loop, the loop will still continue to the end of that iteration (and if the condition becomes true again before the end of that iteration, it will continue to the next iteration as well).

Loop Sequence

Loop Variable

In a for loop, the loop is controlled by a loop sequence and the elements of that sequence are stored, one by one, in a loop variable.

Method

A function that's associated with a particular data type and which is used by writing a dot after a value of that type followed by the method name and then parentheses where arguments may be supplied.

For example, in this code:

x = 'HEllo'.upper()
y = x.lower()

there are two methods being used: the .upper and .lower methods for the string](#string_2) type. These methods are associated with the string type because after simplifying, the value which appears before the dot in each case is a string. Neither accepts any arguments, but the parentheses are still required in order to [call](#call) them. The whole expression before the dot, plus the method and its arguments, will [simplify](#simplify) to the result of the method (established viareturnas usual for a function). Theuppermethod returns an upper-case version of the string it's applied to, while lowerreturns a lower-case version, so in this code,xwill have the value'HELLO'whileywill have the value'hello'`.

Memory Diagram

See above.

When diagramming a list, m

Mutable

Immutable

Every Python value is either mutable or immutable. A mutable value is one that can be changed, an immutable value cannot be changed. This concept is very slippery, however, because a variable can always be assigned a new value. So variables are always able to change, even when the value inside them may not be.

Click here for more details

The distinguishing feature of a mutable value is thus that it's possible to change the value without needing to create a new variable or change the value of a variable. For example, a list is a mutable value, and we can append to it or pop from it which changes it, without changing any variables, as in this code:

orig = [1, 2, 3]
nums = orig      # an alias of the original
nums.append(4)   # value of 'nums' has changed, without writing "nums ="
                 # value of 'orig' has also changed since both variables
                 # store the same value

In contrast, a tuple is an immutable value, so there is no way to change it: we cannot append or pop, and to "add to" a tuple, we need to construct an entirely new, longer tuple based on the original one. We could do:

orig = (1, 2)
nums = orig           # an alias of the original
nums = nums + (3, 4)  # 'nums' gets a new value based on old value, but
                      # the value of 'orig' is not changed

This means that when dealing with a mutable type, it's possible to use any alias to change the value, but for an immutable type, even if an alias is created, there's no way to use that alias to change the value stored in another variable, and this limitation is one of the primary advantages of immutable types because it prevents certain types of subtle errors.

In Python, tuples and strings are immutable, along with integers and floating-point numbers. Other data types like lists and dictionaries are mutable.

Most immutable types are hashable whereas most mutable types are not.

One final wrinkle: for immutable types that can contain other values, like tuples, it's possible to store a mutable value inside them, such that the entire value cannot be changed at the top level, but parts within it can be. In this case, the entire value will no longer be hashable.

Nested

Two things are nested if one is inside the other.

This can apply to data as well as code. For example, one conditional could be nested inside another, or one list could be nested inside another list, or a conditional could be nested inside of a loop.

The term is usually used when two of the same thing appear, one inside the other, but that's not necessary for it to apply.

Click here for more details

An example of a nested conditional:


if x 

<dt id="range">Range</dt>
<dd markdown="1">
A [sequence](#sequence) of [integers](#integer)
[constructed](#construct_2) using [the built-in `range`
function](quickref#range).
</dd>

<dt id="sequence">Sequence</dt>
<dd markdown="1">
Any kind of [value](#value) that can be viewed as a series of individual
parts in some order. [Strings](#string_2), [lists](#list), and
[tuples](#tuple) are all **sequences**.

[Indexing](#indexing) requires a **sequence** to operate on, but works
the same way on any kind of **sequence**. Different kinds of
**sequences** are composed of different individual [elements](#element):
tuples and lists can have any kind of value stored in each spot, while
the parts of a string are themselves single-letter strings.

Every **sequence** is a [collection](#collection) but not every
collection is a sequence (unordered collections are not sequences).
</dd>

String

See above.

In Python, a string is a kind of sequence and so it can be indexed. In a somewhat weird twist, the elements of a string are also strings: they're single-letter strings. For example:

var = 'word'
print(var[2])  # prints 'r' (counts from 0)
oneLetter = var[2]
print(oneLetter)  # also prints 'r'
print(oneLetter[0])  # also prints 'r'
print(oneLetter[0][0][0][0][0])  # also prints 'r'

Since after you extract a character from a string, it still counts as a string, you can technically extract the "first letter" as many times as you want and you'll still have the same string.

Python strings use an encoding called Unicode, so each element can represent any character that's part of the Unicode standard, including symbols from many different languages as well as things like mathematical symbols and emojis. However, Python strings don't contain formatting information, so you can't specify a font or make them bold or italic.

Tuple

An immutable data type that holds multiple values in a fixed order.

Works a lot like a list, except that you cannot add, remove, or change the values in it. A tuple is defined using parentheses with commas to separate the values, or in some cases, commas alone can be used. Some examples:

(1, 2, 3) is a tuple with three integers in it.
1, 2, 3 also defines the same tuple.
('hi', 'bye') is a tuple with two strings in it.
([1, 2, 3], [4, 5, 6]) is a tuple containing two lists. Because lists are mutable, this tuple won't be hashable.
(,) is a zero-element tuple.
(1,) is a single-element tuple. The comma is necessary, because otherwise the parentheses are treated as redundant grouping parentheses.

While Loop

A loop using the while keyword which iterates its loop body repeatedly as long as its loop condition continues to be truthy.

The condition is only evaluated once before each iteration of the loop, not continuously as the loop body executes, so even if it becomes false during an iteration, that iteration will continue to the end of the loop body. For example:


x = 2
while x

Memory Model

Alias

Two variables are called aliases of one another when they both refer to the same object.

In this situation, they will always be equivalent, since they are actually identical. Using the id function on either variable will yield the same result.

Contrast with clones.

Clone

Copy

Two objects are clones of each other if they are equivalent and indistinguishable. If we inspect their values by printing them, the situation may seem the same as with aliases. However, because they are really two different objects, they'll have different IDs, and if they are mutable and we modify one, they will no longer be equivalent.

ID

Identity

Each object has an identity which by default is just the memory location at which it is stored. The id function returns the identity of any object.

Because two expressions that refer to the same object will have the same ID, whereas two expressions that refer to different objects will have different IDs, the id function can be used to determine whether two equivalent expressions are clones or aliases.

Instance

One specific value, relative to a type.

Each object is an instance.

For example, if we write x = 1 then the value 1 which is stored in the variable x is an instance of the integer type. Note that the object/value is an instance, not the variable.

Click here for more details

If we write:

x = 1
y = x

Then the two variables x and y are each holding the same instance. For mutable types like lists, the notion of an instance is more important, since you can have two equivalent instances which are not the same object. If we write:

nums = [1, 2, 3]
nums2 = [1, 2, 3]
nums3 = nums

then in this case we have two list instances, each of which contains the numbers 1, 2, and 3 in order. These two instances are equivalent to each other, so they count as clones. We also have three variables. Variable nums3 is an alias of nums (and vice versa).

Object

A value which is stored in memory along with extra bookkeeping information. In Python, all values are objects. Other programming languages make a distinction between objects and primitive types, but Python does not have this distinction. A class defines a new type which can then be used to construct new, custom objects.

Click here for more details

If we say that two things are "the same object" then we mean that they are stored in the same place in memory and thus have the same ID. For mutable objects, since they are aliases of each other, any change made via one reference will also be visible via the other reference.

In contrast, if we say that two things are "the same value" then we mean that they are equivalent which does not distinguish whether they are clones or aliases of each other. For example:

x = [1, 2]
y = [1, 2]
z = x

In this example we have two list objects, each with two slots containing integers. Because of how Python handles integers, there are actually only two integer objects, which are shared between the two lists. Since integers are not mutable, this doesn't matter much. We can say that x and y have "the same value" but they are not "the same object." On the other hand, x and z are "the same object" and are aliases of each other. x and z of course also have "the same value."

Refer

Reference

An variable refers to the object that it contains.

When we say that a variable "holds" or "contains" a particular object, those are other ways of saying it refers to that object.

Variable reference has a different, unrelated meaning: each variable reference is one place in your code that the name of a variable is written. Because there can be both local and non-local variables with the same name, the same written letters in code can refer to different variables depending on their context.

Variable

See above.

Variables actually refer to objects: specific places in memory where values are stored. This distinction is important in situations where cloning occurs: two different variables can refer to two different equivalent objects, so that they have "the same value" but refer to different objects.

Dictionaries and Sets

Collection

See above.

The keys and values of a dictionary are collections. The items in a set are a collection.

Hash

Hashable

A hash is an integer that is computed based on some more complicated value such that you get the same integer every time for the same value, but different values are likely to yield different integers. Python's built-in hash function can give you a hash from any hashable value.

A value is hashable if a hash can be created from it. Values that are mutable are usually not hashable, because if we got a hash and then changed the value, the hash would be invalid. To avoid that situation, Python disallows the use of the hash function on mutable values, making them unhashable. Even if a value has an immutable type, it may still not be hashable if it contains another value that is mutable. So a tuple containing integers or strings would be hashable, since those types are immutable (and thus hashable themselves) but a tuple that contained a list would not be hashable since lists are mutable and not hashable. Technically, it's possible to create a custom type that's both mutable and hashable, but none of Python's built-in types are like that.

Hash values are used to speed up comparisons between complex values. If two values have different hashes, we know they cannot be the same as each other. If they have the same hash, then we need to double-check that they're really equivalent, since it's possible for two different values to have the same hash, but since this is unlikely, we can save a lot of time by using hashes for comparisons. This idea is part of the reason that dictionary lookup can be so fast.

Key

A hashable value used in a dictionary to keep track of an associated value so that we can later look up the associated value using the key.

For example, if we wanted to store a record of how many points were scored by different players on a sports team, we could use players' names as keys and integers as the associated values.

Anything used as a key in a dictionary must be hashable.

Lookup

The process of using one piece of information, called a key to find an associated piece of information (called that key's value. Can also refer to searching for a value that matches some criteria within a collection of values.

Dictionary

A mutable data type that can associate keys with values, and quickly look up a value based on a key.

When keeping track of associations, a dictionary is more efficient than a list, because to look up something in a list requires iterating through it.

The keys used in a dictionary must be hashable, so tuples are often used when complex keys are needed. A dictionary can only associated a single key with one value at a time. However, it's possible to store any kind of value under a key, so to store "multiple values" we can store a single list and then put more than one value inside that list.

Dictionaries are declared using curly braces, with colons separating keys from values and commas separating key-value pairs from each other.

Some examples:

scores = {'abc': 3, 'xyz': 4, 'tlmn': 12}  # string keys, integer values
mixed = {1: 'hi', 's': 4.5, (4, 5): ['one', 'two'] }
  # This dictionary has lots of different kinds of keys and values

abcScore = scores['abc']  # look up value for an existing key; value is 3
scores['newkey'] = 5      # Brackets on right of = sets value for key
scores['ab'] = 8          # Same syntax to update exiting key's value.

See this quickref section on dictionary methods for things you can do with a dictionary besides using square brackets to add, change, or look up values.

Difference

Symmetric Difference

How much two things differ, or how far it is between them. For example, the difference between 5 and 3 is 2.

When used with sets, there are two kinds: normal set difference and symmetric difference. Set difference refers to all elements that are in one set but not in another. This notion isn't symmetric, since the set difference between A and B doesn't include elements in B that aren't in A, and vice versa. Symmetric difference of sets is all elements that are in either set but not in both, and as its name indicates, it's symmetric.

The - operator is used to compute differences for both sets and other types; the ^ operator is used for symmetric set differences.

For example, if we have sets A = {1, 2, 3} and B = {2, 3, 4} then the difference A - B is the set {1}, the difference B - A is the set {4}, and the symmetric difference A ^ B (equivalent to B ^ A) is the set {1, 4}.

Intersection

The overlap between two sets. This is a set that has all of the elements that appear in common between two sets, but which does not include any elements that are only in one of the two. The & operator is used to get the intersection of two sets.

For example, if we have sets A = {1, 2, 3} and B = {2, 3, 4} then the intersection A & B is the set {2, 3}.

Contrast with union and difference.

Set

A collection of hashable values.

Unlike a list, The values in a set are not ordered, so a set is NOT a sequence, and you cannot index it to retrieve an element. Sets are useful in cases where order doesn't matter and the in operator operator is used, as they can return an answer faster than a list. Sets cannot have repeated elements: adding the same element to a set more than once has no effect.

A set is constructed using curly braces, without the use of colons to separate keys and values (which would be a dictionary instead). For example:

nums = { 1, 2, 3 }
nums.add(4)         # add new element
nums.add(3)         # already in set; no effect
print(4 in nums)    # True
print(5 in nums)    # False
empty = set()       # Can't use '{}' for an empty set

Sets can be combined using a number of methods/operators like |, & and -.

Union

The combination of two sets. This is a set that has all of the elements that appear in either of two sets. The | operator is used to get the union of two sets.

For example, if we have sets A = {1, 2, 3} and B = {2, 3, 4} then the union A | B is the set {1, 2, 3, 4}.

Contrast with intersection and difference.

Value

See above.

We also use the term "value" to refer to an item stored in a dictionary associated with a particular key. Since those items are themselves values in the original sense, this isn't that confusing, although technically, the keys of a dictionary are themselves also values in the original sense, which does make the overlap tricky to think about.

Files and Directories

Absolute Path

A path that starts from the root of the file system.

For example, on Windows, "C:\Users" is an absolute path because it starts with "C:" which identifies the drive to look in. On MacOS, "/home" would be an absolute path, because it starts with / which is the root of the file system.

Ancestor

A node In a tree which is somewhere above another node, called its descendant.

Child

A node in a tree which is a direct descendant of another node, called its parent.

In a tree, each node may have zero or more children, but each child node has exactly one parent.

Since a file system is a kind of tree, this term (or more explicitly child directory) may be used to refer to a subdirectory.

Close

Ends a file object's access to a file, forcing any pending data for that file to be written.

If a file has not been closed yet, pending data may not yet be in the file, and if the program crashes, this may lead to file corruption, so it's important to eventually close every file you open. Using a with statement can ensure that this will happen automatically.

Context Manager

An object that manages the context of a with statement.

Python will automatically call certain methods of this object when exiting the with block. For example, when a file object is used as a context manager for a with block, it will automatically be closed when control flow exits that block.

Current Directory

Current Working Directory

The directory that Python is working in right now.

By default, any files you create will be created in this directory, and if you attempt to import any modules or open any existing files, they must be in this directory in order to be loaded (although built-in modules and modules installed via the package manager can be imported from elsewhere).

The os.getcwd function returns the path to this directory. At the start of a path string, a single period refers to this directory, so for example the path "./a/b.txt" refers to a file named "b.txt" that's in a subdirectory named "a" which is in the current directory.

Data

Aggregate term for information, especially when it's stored in a computer.

Python objects/values are highly structured information that we can act on using operators and functions to produce other values, but the term data refers to any kind of information. When loading data from a file, Python can't break it down into objects automatically, so it gets loaded as one big string instead. If you know the format of the data, you can write code that will break apart the string into objects, or in some cases, like with CSV or JSON data, Python has built-in functions that can do that for you.

Descendant

A node that's below another node in a tree.

A direct descendant (also called a child) is directly connected to its parent node; an indirect descendant is connected via at least one other intermediary node.

If node A is a descendant of node B, then we can also say that node B is an ancestor of node A.

Directory

Folder

A named collection of files and/or sub-directories which is part of the file tree.

This is an operating system concept, not a Python type. Each file system has a root directory, and then because directories can contain other directories, the file system forms a tree structure.

File

See above.

Files are stored in a file system and organized within directories into a file tree.

File Input/Ouptut

Operations that read from or write to one or more files.

The built-in open function is used to create a file object in either "read mode" or "write mode" that can then be used to either read or write data from/to a file. Unless using a special parser, data read from a file is read as a string, and data written to a file must be a string. We can therefore think of what's stored in a file as one long string, to be re-interpreted by a parser (and/or constructed by a formatter) if necessary.

File Object

A special object returned by the open function which can be used to read from or write to a file.

A file object is not synonymous with the file that it reads from/writes to. For example, one can have multiple file objects that read from or write to the same file at once. File objects provide methods like .read and .write which cause data to be read from or written to the underlying file. Note that data may not actually be written to the file until the file object is closed using the .close method; this is called "buffering." Code will typically do something like:

fileObject = open('myFile.txt', 'r')  # create file object in read mode
fileContentsString = fileObject.read()  # read from the file; store data in Python variable
fileObject.close()  # file object no longer needed
print(fileContentsString)  # use data read from file

File objects can be used with with statements to ensure they'll be closed properly when no longer needed.

File System

An organized collection of files stored in a computer, typically in a partition on a disk/drive.

Files in a file system are normally organized using directories into a file tree.

File Tree

A collection of files organized in directories starting from a root directory.

Each file belongs to a particular directory, and each directory (except the root directory) belongs in exactly one other directory, so the whole structure has the form of a tree, where the parent of each file is a directory, and each directory has a single parent directory until you get to the root directory.

This structure mirrors the organization of real-life paper files within folders in a filing cabinet, and by grouping files together conceptually, can help people remember where to find the files they're looking for. Folder names could refer to the type of files they contain (e.g., 'documents'), the origin of those files (e.g., 'downloads'), when/where those files were created (e.g., '2015_trip'), etc. At each layer of the tree, you only have to be able to pick the correct folder at that layer to find your file, so even if you don't remember everything about the file you're looking for, a file tree can help you find it by reminding you of the different sub-categories relevant to that file.

Leaf

Within a tree, a node that has no descendants.

Unless a tree consists of just a single node (which would be a leaf), each leaf in a tree has a parent node.

Node

An individual element within a larger collection, particularly one that's organized into a graph or tree.

Because files and directories are organized into a file tree, they both count as nodes.

In a tree, nodes are distinguished as either leaf nodes if they have no descendants or branch nodes if they have at least one descendant.

Open

The process of establishing a connection to a file so that data can either be read from or written to it.

The built-in open function does this and returns a file object that can be used to read or write data, depending on whether the file was opened in read mode or write mode.

Whenever a file is opened, it should eventually be closed. Using a with statement can ensure that this happens automatically.

Parent

A node in a tree which has at least one child node.

A node which isn't a parent is a leaf. A parent node B (of node C) may also be a child node (of another node A).

Since a file system is a kind of tree, this term (or more explicitly parent directory) may be used to the directory that contains a particular file or sub-directory.

Path

File Path

A string which contains directory names separated by either / on Mac/Linux or \ on Windows, possibly with a file name at the end.

Used to specify a file or directory, either relative to the current directory, or as an absolute path starting from the root directory of the file system.

The special string '..' means "the parent of the current/previous directory" and can be used multiple times to refer to farther ancestor directories. So for example, the path:

../../homework/cs111/assignments.txt

refers to a file named "assignments.txt" which is located in a directory named "cs111" which is in the "homework" directory that's in the grandparent of the current directory.

Read (file operation)

Take data stored in a file and copy it into Python as a value that can be used within a program.

Since reading from a file makes a copy of the data, changing the variable where you store the result of the read does not change what's in the file.

In Python, using the default mechanisms for reading like the read method of a file object automatically decode data stored in the file into a string, using a default encoding like UTF-8.

Relative Path

A path which does not start at the root of the file system. Any path which is not an absolute path is a relative path.

Root

Root Directory

The root of a tree is the only node in that tree which does not have a parent.

The outermost directory in a file system is the root of that file tree and so it is often called the root dirctory.

Subdirectory

Sub-Directory

A directory that is contained within another outer directory. Also sometimes called a child directory.

The directory that contains it is called a parent directory.

Tree

An graph in which each each node has a single parent node, except for a single root node that doesn't have a parent. Since multiple nodes might share a parent, the whole thing forms a tree-like structure.

Usually drawn upside-down relative to real-life trees, with the root at the top and the leaves at the bottom.

with Statement

A statement using the with keyword that sets up a context manager to ensure that certain cleanup steps happen when leaving the associated block of code.

When a file object is used as a context manager, this ensures that the file will eventually be closed, even in cases where control flow leaves the block via a return statement or an exception.

The context manager is assigned to a variable as part of the with statement, so that it can be used within the statement body. For example:

with open('names.txt', 'r') as fileReader:
    for line in fileReader:
        print(line)

Write (file operation)

Take data stored in a Python value and copy it into a file, possibly overwriting old contents of the file.

Because of buffering (see file object), the data may not immediately be written into the file in some cases, and it's safest to close the file object you used to write the data first.

By default, Python requires you to use string data when writing to a file, and encodes that data using a default encoding like UTF-8.

Recursion

BaseCase

RecursiveCase

A base case is a case in a recursive function where no recursive call gets made.

A recursive case is a case where there is at least one recursive call.

Recursive functions typically start with a conditional which allows them to respond differently for different argument values. For some values, they will make an recursive call and thus revisit the conditional in another function call frame with different arguments, while for other values, they will not need to do this and can just do the necessary work immediately without recursion. The cases where recursive calls are needed are called recursive cases and the ones where they aren't needed are called base cases. If a function has only recursive cases, then it will always create new recursive function call frames, and eventually, Python will run out of memory for function call frames, resulting in a RecursionError. This can also happen even if there are one or more base cases, if the pattern of arguments established by the recursive cases does not ever reach any base case.

The code for the base case(s) is usually relatively simple, and it's often good to start a recursive definition by defining one or more base cases. Often, having more than one base case can prove helpful, although many problems only need one.

The case-by-case method for solving recursive problems begins by defining multiple base cases which help you see what pattern of recursive cases will be necessary, and then simplifying by removing unnecessary base cases once a suitable recursive case has been defined.

Fractal

A self-similar structure, meaning that one or more smaller part(s) of the structure are reminiscent of the whole.

This self-similarity can persist at multiple (or even theoretically infinite) scales.

Many natural structures are fractal, for example trees: The entire tree is made up of a trunk that splits into branches, but those branches also split into smaller branches, which split into twigs, etc. A zoomed-in view of a small branch that splits into 3 twigs is similar to a zoomed out view of a whole tree that splits into 3 branches, even though they're not exactly the same.

When we call a recursive function, the resulting process is fractal, in the sense that parts of the process are reminiscent of the whole process. For this reason, recursive functions are often used to draw fractals.

Infinite Recursion

When a recursive function either does not have a base case, or it has a base case which is never reached, theoretically function call frames will keep generating more of themselves indefinitely.

Python actually has a limit on the number of function call frames allowed at once, and you will get a RecursionError if you exceed these limits, which any infinitely recursive function will do. Some programming languages are able to analyze the correct result of an infinitely recursive function in some cases, but Python does not support this, so we need to avoid infinite recursion when programming in python.

Recursion

Recursive

A repetitive process where one step in a larger process has the same general sequence as the larger process it is a part of.

In Python, recursion can be achieved by defining a custom function which includes a function call to itself. We call such function calls (which occur within the definition of the function being called) recursive calls, and the entire function is called a recursive function. Take the following count-down function as an example:

def countDown(n):
    if n == 0:  # base case
        print('Blast off!')
    else:  # recursive case
        print(n)
        countDown(n - 1)

Whenever this function is called, unless the argument is a 0, it will print whatever that argument is. But then after doing that, it will call itself again, with a smaller value of n. So if we call countDown(3) that will print 3, then it will call countDown(2), which will print 2, then call countDown(1), which will print 1, then call countDown(0), which will finally print "Blast off!" and finish without calling itself any more. The total effect is to print:

3
2
1
Blast off!

Click here for more details

Note: The first branch of the if/else statement in this function is called a base case because it does not contain a recursive function call, while the second branch that does contain a recursive call is called a recursive case.

We could have achieved the same result via a loop, like this:

def countDownLoop(n):
    for i in range(n, 0, -1):
        print(n)
    print('Blast off!')

We can think of the process in two different ways. The iterative way of describing the process is: "To count down from N, print each number from N down to 1, and then print 'Blast off!'". The recursive way of describing the process is: "To count down from N, first print N, then count down from N - 1. If N is 0, instead print 'Blast off!'".

A recursive process is a kind of fractal since it is self-similar.

To avoid infinite recursion, we must ensure that every recursive function has a base case, and that the pattern of arguments generated by recursive calls will eventually reach that base case.

A call to a recursive function generates a nested series of function call frames, since the initial call's frame stays active while its recursive call is being resolved, and that one stays active while its recursive call is being resolved, and so on, until the frame which enters the base case. Once that frame returns, control flow returns to the penultimate frame, and so on back through the entire series of frames.

Data Formats

Bit

The smallest unit of computer memory and thus also data, holding a single 1 or 0.

On a disk, memory is usually organized into bytes, and it's usually not possible to read or write individual bits. Grouped together, bits can represent numbers, letters, or more complex values via an encoding scheme.

Byte

A group of eight bits.

On some historical systems bytes were sometimes made up of more or fewer bits, but eight has emerged as a nearly universal modern standard.

One byte is enough to encode a single letter in the extended ASCII encoding

CSV

TSV

Comma-Separated Values

Tab-Separated Values

Comma-separated values (or CSV) is a file format for data organized into tables, in which entries that should be in different columns are separated by commas, while entries in different rows are on separate lines.

Because different entries may have different widths, you can't necessarily see things lined up within columns in a CSV file, but common programs like Microsoft Excel, Google Sheets, and Numbers on a Mac can all load CSV files and display their entries as tables.

In general, all entries are assumed to be strings, but some programs will load entries that can be converted into numbers as integers and/or floating-point numbers.

Interesting edge cases include:

How can we encode an entry that includes a comma as part of its text?
How can we encode an entry that includes a newline as part of its text?

In both cases, if an entry starts with a double-quote right after the previous comma (or at the start of a line) then that entry continues until the next double quote, even if that includes commas and or newlines. This begets yet another question:

How can we encode a value which includes a comma, a newline, and a double-quote as part of its text?

To solve this problem, when an entry starts (and thus also ends) with a double-quote, then within that entry two double-quotes in a row will be interpreted as a single double quote that's part of that entry's text, rather than as the end of the entry.

CSV is a common format for tabular data, because it can be read and written by many different programs, and because it can viewed or be edited in a normal text editor if necessary, plus since it's just text data in a specific arrangement, it can even be copy-pasted via the clipboard. In Python, the built-in csv module allows you to easily read and write CSV files.

The related tab-separated values (or TSV) format uses tabs instead of commas to separate entries, but is otherwise pretty much the same. An advantage is that tab characters appear in table entries much less frequently than commas, and so fewer entries need to be quoted on average. The fact that the entry separators aren't immediately visible to users viewing the file in a text editor is a disadvantage though.

Data

See above.

Technically, all digital data is composed of bits, usually grouped into bytes. In Python, a byte string can be used to represent this low-level view of data.

Decode

Encode

Formatting

Parsing

Decoding is the process of taking encoded data and turning it back into one or more objects according to the rules of a specific format (also called an "encoding scheme").

Encoding is the reverse process of arranging one or more objects into data.

Parsing is another term for decoding, usually used for more abstract or complex decoding, for example turning a string containing statements in a programming language into a program.

Formatting is another term for encoding. Like parsing, it's usually used for more abstract/complex encoding processes.

Usually the result of encoding is a byte string that specifies a specific sequence of bytes that can be stored in a file, although in the abstract we can talk about encoding from some more complex object (for example, a tree stored as a dictionary) into some simpler object (like a string or a list) that's still more complex than raw bytes.

Decoder

Parser

A program that can convert one particular kind of data into a specific arrangement of objects. Python has built-in parsers available for common data formats like CSV and JSON.

"Decoder" is more often used when converting low-level data or when the data is has a simpler structure, like when converting from bytes to a string, as opposed to converting from a string in a certain format to a more complex group of objects like a list of dictionaries (which would more likely be called a "parser").

Encoder

Formatter

The opposite of a parser: takes a specific arrangement of objects and turns it into formatted data, which a matching parser could turn back into equivalent objects.

"Encoder" is more often used for lower-level formatters, like one that converts a string into bytes. In contrast, a function that converts a dictionary into a string is more likely to be called a formatter than an encoder.

Encoding

Encoding Scheme

Format

A convention for how data should be decoded into objects.

A format/encoding is sometimes also called an encoding scheme.

When data is stored in a file, it is usually organized according to a known format, so that when we load the data from the file, we can convert it back from raw data into objects. Well-known formats like CSV or JSON have built-in functions for parsing them, but even the default process of loading file data as a string involves decoding it from the bytes stored in the file.

Technically, the Python programming language itself is a format, and when you run a Python program, it gets parsed into a series of statements before those statements get executed.

How individual bits/bytes are interpreted as values like integers or strings is also a kind of format/encoding. Strings in particular have multiple possible encodings like ASCII and Unicode (Python uses the UTF-8 Unicode encoding for strings by default).

JSON

Javascript Object Notation

A format for storing several types of common programming values in a file or string based on the Javascript programming language.

Javascript and Python use very similar syntax, so the JSON format also looks a bit like Python code. One difference is that JSON only allows for values to be stored, not expressions or other statements.

JSON supports the following types:

None, encoded as null.
Booleans, encoded as true and false.
Integers and floating-point numbers encoded as usual in Python (for example, 5 or 3.14).
Strings, using double quotes only.
Lists of values, encoded using square brackets and commas as usual in Python.
Dictionaries, where the keys may only be strings, encoded using curly braces, colons, and commas as usual in Python.

Neither lists nor dictionaries allow an extra comma after the last element, although Python does allow that. No other types are allowed, although it's common to create a system for turning other types into a combination of allowed types and back again for storage as JSON. Notably, tuples and sets cannot be stored directly as JSON, neither can dictionaries that include non-string keys.

The built-in json module makes it easy to load values from JSON or store them as JSON. Although JSON has its limitations, since it can store complex objects directly into a file and load them again later with minimal fuss, it's a convenient choice for storing values in a more permanent form. Note that when we store an object in a JSON file, we're making a copy of the data, and the same is true when loading an object from a JSON file. If you store an object and then load it, you'll have a copy of the original, but it won't be the same object. This also means that it's not possible to store objects that contain aliases and have them retain their original structure.

Errors, Testing, and Debugging

Bisect (debugging)

A debugging technique where you first comment out one half of the code and then the other half, in order to locate where an error originates.

Once you know which half of the code the error is in, you can repeat the process with the two quarters that make up that half, and so on, allowing you to quickly narrow down the exact location of an error.

Of course, commenting out half your code is likely to cause more problems than the one that you're trying to locate, so you have to be careful about interpreting the errors you get. Usually it's faster to use targeted probes to establish where an error is occurring, but in extreme cases where that's not working, bisecting the code is an option.

You can also use this with data structures instead of code. For example, suppose your code produces a list with 1 million elements, and then calls an external library function to process the list, but the library function indicates the list is invalid without telling you which element is the problem. Printing out the list is impractical here, but you could instead try to use slicing to send only the first 500,000 elements, then if that has no error, the second 500,000, and whichever of those has an error, start narrowing your slice ranges by halves until you have few enough elements that it's reasonable to print them out and look at them.

Exception

See above.

When an exception occurs, Python starts terminating function call frames one by one, starting with the innermost frame, to generate a traceback. If it finds that the exception occurred inside of a try block, then instead of crashing, the program will continue from after the end of the try block, possibly executing any matching except blocks and/or a finally block if one is present. This means that if you know your code is going to generate an exception under certain conditions, you can customize the response to that exception instead of having the program crash.

Probe (debugging)

To add a statement using the print function to a program in order to display the value of an important variable or expression as it runs, in order to help debug it.

Sometimes when a program crashes the traceback will contain enough information to figure out what the error is and fix it. However, if there is an error which does not result in a crash, and/or if the information from the traceback is confusing or insufficient, using a probe to help figure out more information is usually a good next step. For example, consider this program:

def f(a, b):
    return a + b

x = 5
y = 'hello'

print(f(x, y))

If we run it, we get the following traceback:

Traceback (most recent call last):
  File "probe.py", line 7, in <module>
    print(f(x, y))
  File "probe.py", line 2, in f
    return a + b
TypeError: unsupported operand type(s) for +: 'int' and 'str'

Although the traceback lets us know that the a + b operation failed on line 2 in the f function, that's not really the cause of the error, as there's nothing wrong with the f function (other than its mysterious name, poorly named parameters, and lack of documentation). Instead, the issue is that we've tried to mix together two different types of values, but that's due to the particular values of x and y which aren't shown in the traceback itself. Of course, this simple example is probably one where we could figure things out quickly from looking at line 7 and then above it, but imagine if the definitions of x and y were even farther away from the point of failure. Perhaps they got their values from complex functions that we didn't fully understand...

In order to start figuring out where the real issue lies, given this traceback, it might make sense to probe the values of x and y just before line 7, which we could do by adding a print:

def f(a, b):
    return a + b

x = 5
y = 'hello'

print('XY', x, y)
print(f(x, y))

In this example we've included a tag in the probe so that in the output, we'll know to look for the line that starts with 'XY'.

Normally, adding a probe will give you some extra information about an error, but you may want to add more probes or try other debugging techniques to get a better understanding of an error. If you add a probe but don't see the output from it when you run the program again, there are several things to check for:

Is it possible you've saved multiple versions of a file, and the one that you're editing is not the same as the one that you're running?
Did you add the probe at a point in the code after a crash happens?
Did you put your probe in a function, [conditional](#conditional], or loop? In that case, maybe that part of the code isn't actually running at all. Try putting another probe earlier in the code, like at the top of the same function or top of the whole file.

If you end up adding more probes throughout your code, you may effectively be tracing the code.

Regression (testing)

A situation where an error that occurred earlier in development was fixed, but new changes to the code cause the same error to occur again.

A good suite of test cases helps detect regressions quickly so that harmful changes can be undone before the code is released as a new version and users encounter the error.

Tag (debugging)

A short string that is added into a probe or other debugging print statement, to identify that particular statement within the program's output.

For example, a probe might look like:

print('X', x)

The string 'X' here is the tag. If instead we didn't use a tag, we could do this:

print(x)

However, without the tag, as we add more prints or other prints that were already in the program run, it may be hard to tell which lines of output come from the probe and which are other stuff, especially if the value of x is not what we were expecting, or is a hard-to-notice value like an empty string. The tag helps avoid this problem since it will be very clear which line of output was produced by the probe.

Test

A piece of code designed to check whether some other code works or not.

Writing tests can help us avoid errors, and it's also a good idea to come up with some tests early on in the design process (this is called test-driven development).

When debugging code, it can be helpful to isolate the behavior of one function at a time, and create new tests that express what it should do, so that you can be sure that an error isn't happening in one part of the code. They also help avoid (or at least quickly detect) regression.

Test-Driven Development

A software development strategy that involves writing tests and documentation before the code that will satisfy those tests.

For some kinds of programming, it's often easy to think of concrete examples of what the finished product will be able to do, and laying out a few examples to work from often helps scaffold the work of writing the code that will satisfy those examples.

Test-driven development aims to help you think about edge cases early, as well as nailing down what kinds of parameters your code will accept and what kinds of return values your code will produce. This in turn helps you avoid dead-end design choices and write code faster, plus you'll be able to debug it faster with a ready-to-go suite of tests as soon as it's complete.

It's generally a more challenging approach when it's not easy to specify these things (for example, if you're building a game with lots of randomness, the same inputs don't always produce a predictable output).

Toggle (debugging)

Temporarily disabling (or re-enabling) a line of code for debugging purposes by adding a hash sign to turn it into a comment.

This technique is very useful when you're not quite sure exactly what is going on, and/or when there's a line of code that other techniques like probing indicate may be problematic but you're not sure why/how. Trying to just completely disable a line of code and seeing what will happen when you do that can generate an informative traceback or, even in the case that nothing changes, point towards some other part of the code as being the place where the error is located.

In the extreme case, we can toggle many lines of code at once to hunt down an error, which is called bisecting.

Trace

A sequential record of events that happen as part of a larger process.

It's also the name of a debugging technique where you add print calls to your code so that it prints out a trace of its process as it runs. In a loop or recursive function, even adding a single statement that uses print can produce a trace since that line of code may be executed repeatedly. You can print different kinds of relevant things as part of a trace, such as the values of key variables or just messages to indicate which branch of a conditional was taken (see probing).

try Statement

except Statement

finally Statement

A try statement just consists of the keyword try followed by a colon, and then an indented block. If any code within the block generates an exception, instead of causing the program to crash, the code will skip to the end of the try block and continue executing code outside the block as if the exception had not occurred, as long as there is at least one except statement that matches the type of the exception that was generated. If none of the except statements match an exception, that exception continues to terminate function call frames until it finds another outer try block or the program crashes.

If the try statement has a finally block, the code in that block is executed no matter what: If no exception occurs, it gets executed when control flow reaches the end of the try block; if an exception does occur and gets matched, the finally block is executed after any matching except blocks, and if an exception does occur but is not matched, the finally block gets executed before the exception terminates the current function call frame. finally blocks are thus useful for ensuring that critical code gets run even if something unexpected happens.

Each except block uses the keyword except followed by one or more exception types. Any exception of one of the listed types will match that block, causing the code in that block to run. Except blocks are checked in order, and only the first matching block gets run.

For example:

try:
    x = 3
    y = x / 0  # causes a ZeroDivisionError
except ValueError:
    print('nope')  # doesn't get run
except ZeroDivisionError:
    print('yup')  # this block matches
except Exception:  # would match any kind of exception
    print('nope')  # doesn't run because block above it matched first
finally:
    print('done')  # runs at the end, even if there's no exception

Unpack (debugging)

To take a complicated expression and break it down into multiple smaller expressions by storing intermediate results in variables.

Not directly related to unpacking (tuples).

For example, if we have a complex expression like this:

x = (128 * y + int(z)) / 34

We could unpack that into:

yVal = 128 * y
numerator = yVal + int(z)
x = numerator / 34

Unpacking has several benefits:

If an error is occurring on a complicated line of code, after unpacking it may be easier to see which specific function call or operation is causing the error.
It offers more opportunities for probing or tracing intermediate values.
If sub-expressions are being re-used in several places, computing them once and storing them in a variable can help isolate the cause of an error and also helps make the code easier to read and maintain according to the DRY principle.

Program Design Concepts

Descriptive Naming

Picking variable names that describe what kind of values they will hold.

For example, calling something numberOfCopies instead of x. This has a huge impact on the readability of your code and helps make debugging faster. If you find yourself struggling to think of a good descriptive name for a variable, this can be a sign that you need to stop and think more about what you intend to use that variable for, and/or what process is producing its value.

DRY

Don't Repeat Yourself

A design principle that states that repetition of code should be minimized.

In general, it's good to avoid copy-pasting code, and to use intermediate variables whenever multiple expressions share identical sub-parts. If you find yourself typing similar lines of code again and again, consider bundling them into a function and using parameters to allow a single function to accomplish multiple similar sub-steps shared by different processes.

The benefits of following the DRY principle are:

Your code will be easier to debug, as errors will be closer to the code that caused them, and you won't have to fix the same error in multiple places throughout your code.
Your code will be easier to read, since it will be shorter, and it will be broken into smaller pieces that each have a simple purpose, and there won't be sections that repeat the same functionality.
Your code will be easier to test, since you'll be able to test and verify shared functionality in a single location.

Extra: Custom Types

(We don't expect you to know these terms as part of this class.)

Class

A statement using the class keyword that defines a custom type; also refers to custom types defined this way and to built-in types.

For example if you run type(4) you'll see the text <class> because int is a class. In this sense class is interchangeable with type.

Within a class statement, an indented block defines the fields and methods of the custom type. A special method named __init__ defines how new objects of that type can be created. Just like a function, a class should have a docstring that explains its purpose and something about how it works, although each method should also have its own docstring. __str__ is another special method which will be called whenever an object of the class needs to be converted into a string (for example, by the built-in str function).

Here's an example:

class Pair:
    """
    A pair of numbers to be displayed with a custom separator between
    them. The `between` field defines the separator that will be used.
    """
    between = "-"
    """
    This field defines the separator to use between the numbers. It can
    be changed with the `setSeparator` method.
    """

    def __init__(self, a, b):
        """
        Sets up the pair using `a` and `b` as the numbers.
        """
        self.a = a
        self.b = b

    def __str__(self):
        """
        Formats this pair as a string, using the `between` field as the
        separator.
        """
        return str(self.a) + self.between + str(self.b)

    def setSeparator(self, separator):
        """
        Sets up a new custom separator string for this pair.
        """
        self.between = separator

    def add(self, other):
        """
        Adds the `a` and `b` values from another pair to this pair's `a`
        and `b` values, returning a new pair.
        """
        newA = self.a + other.a
        newB = self.b + other.b
        return Pair(newA, newB)

    def first(self):
        """
        Returns the first number in the pair.
        """
        return self.a

    def second(self):
        """
        Returns the first number in the pair.
        """
        return self.b

p1 = Pair(1, 2)
p2 = Pair(2, 3)
p3 = p1.add(p2)
p3.setSeparator('~')
print(p1, p2, p3)  # prints "1-2 2-3 3~5"

Notice that:

When we use the name of the class as if it were a function by putting parentheses with arugment expressions after it, we create a new object and Python automatically calls the __init__ method to set up that object, using the fresh object as self and the additional arguments we supplied as the rest of __init__'s arguments. So the line that says p1 = Pair(1, 2) will create a new object and call __init__ with the new object as self, 1 as a, and 2 as b.
All of the methods use a parameter called self, but when we call these methods, we only provide the other arguments, not the self argument (for example, the line p3.setSeparator('~')only supplies one argument even though the setSeparator method has two parameters including self. Python automatically sets the object that the method is called on as the self value when calling a method. So for that line, self would be set to the object stored in the p3 variable.
Not all fields are declared as part of the class. In particular, the __init__ method (or any other code) can create a new field simply by assigning to it using a dot, as in the line that says self.a = a. Any field declared as part of the class (like between in this case) will be added to each object of that class that gets created, but the __init__ method may add further fields, and you could even have other methods that add new fields when called. Typically, it's best to create fields only by declaring them as part of the class definition or as part of the __init__ method, and to document as part of the class docstring any fields that other code might want to use (here between is documented but a and b are not).

Construct (verb)

The process of creating a new instance of a class.

This can be a built-in class like list, or a custom class with its own __init__ method that will dictate what happens. Each time a new instance is constructed, new memory is used for the resulting object.

__init__

A special method which defines what happens when constructing a new instance of a custom class.

The first parameter should always be called self, and will automatically be set to a newly-constructed object when this method is called. Any additional parameters require matching arguments when creating an object of the class.

Rather than calling __init__ directly, it gets called automatically when we write a class name followed by parentheses (and possibly arguments in those parentheses).

Here's an example:

class Powers:
    """
    Stores a number along with the same number to the second and third
    power.
    """
    def __init__(self, base):
        """
        You must specify the base number to use.
        """
        self.base = base
        self.second = base * base
        self.third = base * base * base

p = Powers(3)
print(p.third)  # prints 27

In this example, we write the class name (Powers) followed by (3) to pass 3 as the value of the base parameter when creating our new object. The self parameter gets filled in automatically with a new blank object that will become the result of the setup process. In this example, that setup process just creates three fields called base (value 3), second (value 9) and third (value 27).

Field

A variable that's attached to each instance of a class, possibly holding different values for each instance.

A class definition can define fields and give them default values, and additional fields can be attached to an instance in the __init__ method. Fields can be accessed and/or updated by typing an expression for an object followed by a dot and then the field name. For example:

class Example:
    """
    An example class.
    """
    field = 3  # field defined as part of class

    def __init__(self, v):
        """
        You must provide a value.
        """
        self.value = v  # field added during setup

e1 = Example(7)  # create an instance
e2 = Example(10)  # create another instance

print(e1.value)  # access field value (prints 7)
print(e2.value)  # prints 10

print(e1.field)  # prints 3
e2.field = 5  # update field value
print(e2.field)  # prints 5
print(e1.field)  # still prints 3

When a field is defined as part of a class instead of being attached during setup in the __init__ method, each instance of that class gets an alias of the same value for that field. This means that if you define a field with a mutable value like a list, every instance of the class will share a single list, and mutating one of them will appear to change all of them (since in reality, there's only one). For this reason, it's usually better to create fields within the __init__ method rather than as part of the class definition.

Instance

An object as defined relative to its class.

If we create three objects from a custom class called Example we would say that each of them is "an instance of the Example class." Or whenever we create a Example new object we might say that we're "creating a new instance of the Example class."

The term implies that many identical or similar instances may exist or be created, and indeed custom classes allow us to easily create many similar objects that share certain properties, but which each have their own identity and which can be modified separately from other instances.

Method

See above.

A function definition that is inside of a class statement automatically becomes a method.

Objects of the custom type defined by that class can use any of its methods. Write the object first, then a dot, and then the method name, followed by the usual parentheses and argument expressions. For a method, the first parameter should be named self, but you will supply one fewer argument when calling it. The specific object that appeared before the dot will automatically be provided as the self argument.

Python also supports static methods and class methods using decorators, but these are beyond the scope of this glossary.

For an example of methods in action, consider the following code:

class Counter:
    """
    A counter that holds a number. You can add to the counter using the
    `count` method and use the `value` method to get the current value.
    """
    n = 0

    def count(self, num):
        """
        Adds the given number to the counter.
        """
        self.n = self.n + num

    def value(self):
        """
        Returns the counter's current value.
        """
        return self.n

c1 = Counter()  # starts at 0
c2 = Counter()  # starts at 0
c1.count(3)  # now at 3
c2.count(5)  # now at 5
c1.count(1)  # now at 4
print("C1 is at:", c1.value())  # prints 4
print("C2 is at:", c2.value())  # prints 5

In this code, when we write c1.count(3) how does Python know which object's number to increase? The c1 before the dot is a variable whose value is the first Counter we created, and in the self.n = self.n + num code of the count method, self will be set to the same value as c1, because that's what appears before the dot. So we would say this code "applies the count method to c1." The argument we supplied (3) is passed in as the second argument num, since the first argument for a method is assigned automatically to the object the method is being applied to. So count is defined with two parameters, but only gets called with one argument. At the same time we need to call count using a base object and a dot, because otherwise there would be no self object available.

self

self is by convention the name used for the first parameter of a method, which gets supplied automatically rather than via an argument.

In theory this parameter could have any name you want; it's the position that matters, not the name, but using self for these automatic parameters is a convention that helps make it easier for everyone to read your code.

Special Method

A method which has one of a few reserved names that start and end with double underscores, and which gets used automatically by Python under certain circumstances.

This list of special method names includes all special methods that Python recognizes. The two most ubiquitous special methods are __init__ for defining how new objects get set up, and __str__ for defining how any object can be converted to a string when the built-in str function is applied to it. Other special methods can define how an object should work when used with operators or when being hashed.

__str__

__repr__

A special method which defines how an object of a custom class should be converted into a string.

When using the str or print functions, for example, Python automatically converts objects to strings. This also happens inside format strings. In these cases when it encounters a custom object, it will use a default format that just names the object's class and shows where in memory the object is stored, like this:

<__main__.a object at>

However, we might want to have our custom object be displayed with more detail as a string, and so we can define a __str__ method for it. This method has just one parameter: self, and it needs to return a string. For example, we might define a class like this:

class Greeter:
    """
    Stores a name and includes that name along with a greeting when
    converted to a string.
    """
    def __init__(self, name):
        """
        You must provide the name to store.
        """
        self.name = name

    def __str__(self):
        """
        Converts to a string by including a greeting along with the name.
        """
        return "Hello " + self.name

g = Greeter('Peter')
print(g)  # prints "Hello Peter"

When print is called with g as the argument, Python needs to convert g to a string automatically. Since it's a custom class, Python calls that class' __str__ method with g as the self argument, and gets the string "Hello Peter" as the result, which is what it displays.

The related __repr__ special method also takes just a self argument and is used to convert an object to a string whenever the built-in repr function is called on an object. Whenever possible, the repr result should be a string that if used in code, would result in an equivalent object. So for the class we defined above, we might define:

    def __repr__(self):
        """
        Returns a string that could be used in code to construct an
        equivalent Greeting.
        """
        return "Greeting(" + repr(self.name) + ")"

Note how repr is used within the __repr__ special method: the representation result for a string will include quotation marks (and will correctly handle cases where the string itself includes quotation marks), so the result of calling repr(g) after the code shown above would be "Greeting('Peter')" and this matches the line of code that was used to construct g in the first place.

Not all objects can realistically have a repr result that can be used to reconstruct an equivalent object, but to the extent that this is possible, it's the convention to do so. repr is mostly used for debugging purposes or otherwise displaying an object in a more technical context. </main.a>

Extra: Data Formats

(We don't expect you to know these terms as part of this class.)

Archive

Extract

TAR

Zip File

An archive is a single file that can be extracted to produce multiple files and/or directories.

Archives are often compressed to save disk space and transmission bandwidth.

A zip file is an archive in the zip format. TAR (for "Tape ARchive"; sometimes called a "tarball") is another common format for code archives. The built-in Python zipfile module can be used to interact with zip archives.

ASCII

American Standard Code for Information Exchange

An older, simple endcoding for storing text as binary data.

It assigns each integer from 0 to 127 to a specific symbol, so that a sequence of numbers (for example, a byte string) can be displayed as text. Negative numbers and numbers beyond 127 are not assigned any symbol, so you may get an error if they're included in data that you try to decode using ASCII. "Extended ASCII" includes 128 additional symbols, going up to 255, but there are several different versions that were designed to support the alphabets of different countries. Languages that use characters which aren't part of ACSCII need to use other encodings; the Unicode standard defines an encoding which includes many languages from all over the world as well as special symbols and emoji.

In theory, because each ASCII code is between 0 and 127, and all of those numbers can be represented using some combination of 7 bits, ASCII text could be stored using 7 bits per character. These days, an 8-bit byte has become standard, so ASCII text is often stored inefficiently using 8 bits per character.

This page has a chart of all ASCII codes.

Byte String

A Python type that holds a sequence of bytes.

Whereas a regular string is a sequence of characters (which themselves are really single-character strings), a byte string is instead a sequence of integers. Python doesn't have a dedicated type to represent a byte, but since each byte is made of 8 bits, and if we interpret those bits as a number, it will be a number between 0 and 255, Python uses integers between 0 and 255 to represent bytes.

Byte strings can be written in Python by writing a 'b' before a normal string, and using only ASCII and escape sequences. For example:

b'a0\xf3'

The byte values in this string will be 97 (ASCII code for letter 'a'), 48 (ASCII code for digit '0'), and 243 (escape sequence \xf3 encodes byte with hexadecimal value f3 which is 15*16 + 3 = 243.

Compress

To take data in one format and put it into another format in which is takes up less space.

Escape Character

Escape Sequence

A series of characters in a string or bytes in a byteString which has a special meaning. These start with an escape character, typically backslash.

For example, in a string, the sequence '\n' is an escape sequence that represents a single newline character. If you type a backslash followed by an n in your code as part of a string, Python will turn that into a newline character in the resulting program.

In byte strings, values above 127 must be entered using escape sequences when the b'...' syntax for creating a byte string, because only ASCII characters are supported.

Here is the full list of escape sequences that Python accepts.

Graph (data structure)

Edge (graph)

A collection of nodes linked together by edges.

The colloquial term "graph" usually refers to a chart or diagram, often drawn with one or more axes and points plotted in space to show relationships among values. But a graph data structure is an unrelated concept, holding nodes connected by edges. For example, a "social graph" might represent users of a social media platform as nodes, and edges could be added wherever one user follows another.

Graphs come in different flavors depending on their edges: a directed graph has edges which are directional, whereas an undirected graph has edges that don't have a direction. In a directed graph, if there's an edge connecting nodes A to node B, it doesn't necessarily mean that node B connects back to node A (that would be a separate edge in the other direction). But in an undirected graph, those two edges are the same thing, and there's only one possible edge between two nodes.

Graphs can be used to represent many kinds of real-world relationships, such as which users interact with each other in a multi-user application, which countries border each other on a map, which chemicals are used to synthesize which other chemicals in a factory, etc. Accordingly, there are a number of graph algorithms that analyze various properties of a graph and which can be useful in many different scenarios. Examples include a reachability algorithm which determines whether one node can be reached from another by crossing one or more edges, or a shortest-path algorithm that determines the shortest possible path from one node to another (i.e., the path that traverses the fewest edges).

Python has no built-in data structure that can store a graph, nor does it have dedicated syntax for creating graphs. However, dictionaries and/or lists can be used to store graphs, and there are open-source third-party libraries like networkx that can be used to create and analyze graphs.

Interpolation

A mathematical process for blending between two different values according to a third number, which usually takes the form of a floating-point number between 0 and 1, where 0 indicates 100% of the first value, 1 indicates 100% of the second value, and numbers in between indicate different blends of the two.

See also the unrelated term string interpolation.

An example of numerical interpolation might be interpolating between two 2-D points, like this:

# Point 1
x1 = 1
y1 = 1
# Point 2
x2 = 5
y2 = -5
# Interpolation variable
t = 0.7
# Result:
# t = 0 gives us (x1, y1)
# t = 1 gives us (x2, y2)
# In-between values are somewhere in between the two points
x = x1 * (1 - t) + x2 * t
y = y1 * (1 - t) + y2 * t
print(x, y)

Numerical interpolation is called "linear" when the formula proceeds directly from one point to the other at a steady rate. We could also use other kinds of formulas, for example quadratic or cubic interpolation, where an exponent is added to the interpolation variable that changes how quickly a change in the interpolation variable translates into a change in the mixture between the two things being interpolated.

Unicode

A widely-adopted, regularly updated encoding for storing text as binary data which supports many languages, mathematical symbols, and emoji.

In contrast to ASCII, which uses 7 bits per symbol, Unicode uses 32 bits per symbol (so each symbol is assigned a number between 0 and 4294967295, which is the largest number that 32 bits can hold). There are 8-bit and 16-bit versions of Unicode which are more efficient if you are mostly using characters from a limited part of the encoding space; for English text UTF-8 (the 8-bit version) is almost as efficient as ASCII.

In Python, characters in strings are stored using Unicode, so any Unicode symbol can be included in a string, and [escape codes](#escape code) can be used to include a symbol according to its numerical code, even if you can't type it on your keyboard.

The official Unicode website has lots of details, including downloadable charts of character codes and an index of character codes by name.

Extra: Software and Hardware

(We don't expect you to know these terms as part of this class.)

Analog

Data which has a smooth, continuous value, rather than being broken up into discrete units like digital data.

Analog data can often be described or approximated by mathematical equations.

Bandwidth

The amount of data that can be transmitted via a certain connection per unit time, usually measured in bits per second (or a larger denomination like bytes, megabytes, gigabytes, etc. per second).

Bandwidth determines how fast data can be transferred, and for certain applications like streaming audio or video, whether the experience will be smooth or not. (Streaming refers to sending data for immediate consumption, like sending video data to be displayed immediately rather than to be stored in a file and played back later.)

Bandwidth is typically limited, so using compression to save bandwidth is the norm.

Binary

A number system that uses only the digits 0 and 1 to represent numbers. Whereas in the decimal system each place-value represents 10, in binary each place-value represents 2.

Sometimes also used as a synonym for digital, since digital data is also made of ones and zeroes. Sometimes also used as a synonym for executable, because the contents of an executable are "raw" CPU instructions which humans don't usually read directly, so compared to code, they're thought of as an arbitrary mass of binary data.

Boot

The startup process for a computer.

This is short for "bootstrap" from the idiom "pull yourself up by your bootstraps" meaning "to start from nothing and build something." When a computer is first powered on, it must begin from a saved sequence of CPU instructions. Typically, these instructions locate a default drive and load further instructions from that drive to start the operating system.

Corruption

A situation where data that should be in one format has been cut off or changed so that it doesn't match the original format any more.

Attempting to parse corrupted data will usually fail, although this could mean anything from an error to a situation where the parsing appears to succeed, but some of the objects it produced have the wrong values.

CPU

Central Processing Unit

CPU Instruction

A central processing unit or CPU is a physical device in a computer which receives and transmits patterns of electricity via wires.

A CPU instruction is a special pattern of electricity which tells the CPU to carry out a specific operation; CPUs are designed so that when they receive CPU instructions one after the other, they transmit patterns of electricity which encode the results of those instructions. By connecting a CPU to devices like a keyboard, mouse, and monitor, the computer can run an operating system program which provides an interface for a user to run and manage other executables.

Digital

Composed of bits, that is, ones or zeroes.

In contrast to analog data, which records a continuous value over time and might be expressed as a smooth equation, digital data is composed of bits. Analog data is typically more true to what is happening in the real world, but more difficult to interpret, whereas digital data is typically only an approximation of the real world but is easier to interpret and work with.

Digital data can be stored directly on a disk, whereas analog data would have to be encoded somehow, or stored on a different kind of storage device.

Disk

Drive

Disk refers to a physical object that stores data. Many modern disks are no longer disc-shaped, but the term comes from an era when most data storage devices were disc-shaped and data was accessed by spinning them around. Music records were the model for computer disks.

A drive (or disk drive) is a slot that a disk can be placed into to make its contents accessible to a computer. On Windows (and earlier, in DOS, which stands for "Disk Operating System"), each disk drive is identified by a single letter, and "C" (the third drive) is usually where the main file system is stored on a hard drive or solid-state drive, with drives A and B being used for ejectable floppy disks.

In the digital realm, both terms can refer to the data, stored on a disk, made available via a disk drive, and usually split into one or more partitions each containing a file system. So the phrase "C drive" might really refer to a disk, a drive, a partition, a file system, or even the root directory of that file system...

Executable

Script

An executable is a file containing CPU instructions which can be directly executed by a computer's CPU.

In contrast, a script is a file containing code which relies on an interpreter to run.

Python programs are scripts which run via the Python interpreter.

Floppy

Floppy Disk

Floppy Drive

A floppy disk (or just floppy) is a disk made of a flexible magnetic material used to store data, usually held in a plastic shell so that the disc itself is only exposed when inserted into a drive (called a floppy drive) in a computer.

HDD

Hard Drive

Hard Disk

Hard Disk Drive

A disk drive which uses rigid disks storing magnetic data, typically permanently contained within an enclosure. Unlike a floppy disk, the disk part is permanently stored in the drive rather than being ejectable, so hard disk, hard drive, and hard disk drive can all refer to the same object.

In contrast to a solid state drive, a hard drive stores information using magnetic fields that are read (or written) using a tiny needle-like device that travels over the surface of the disk as it spins around. The physical limitations in terms of rotation speed, read/write head size, and so on limit how densely information can be stored and how fast it can be read or written. Hard drives are typically slower than solid state drives, but they are also typically more durable.

Hardware

The physical device(s) that make up a computer, in contrast to software.

Hardware determines how a device is capable of interacting with the real world and what physical inputs it can accept and outputs it can generate. However, most computational devices don't do much without complicated software to control them.

Interpreter

An executable which reads from a script file and allows that script to be run on a CPU.

The Python interpreter is an executable program that allows Python code to run on a CPU. When you tell an IDE to run your program, it will run the Python interpreter and point it at your program as the script to run.

Some other programming languages, called "compiled languages," instead first compile a program from the base language into an executable made of CPU instructions that can be run directly on the CPU.

Partition

Mount

A portion of a physical disk that stores data, separate from the data stored on other partitions.

Each partition usually contains its own file system, although some special partitions may contain data that's not organized into a file system. Different partitions of the same physical disk may be made available as different directories within the same operating system. For example, on Windows, two partitions of the same disk might be labeled as the C and D drives, even though they aren't actually two separate physical drives. It's also possible to store different operating systems on different partitions and select which operating system to run during the boot process when turning on the computer.

The process of making a partition's data available via an operating system is called mounting the partition. In some cases, an operating system may make it appear as though a file system stored in one partition contains directories that are actually other file systems mounted from other partitions. This means that a file tree can span multiple file systems.

Normally, the data stored in a partition cannot be accessed unless it is mounted, although some low-level tools can do things like copy all of the data to a new partition without mounting it.

RAM

Random Access Memory

RAM stands for Random Access Memory, which is a kind of physical circuit that can be used to store digital information. A RAM circuit only stores information as long as it is powered on, so whatever is stored there disappears when the computer shuts down. In a computer, these circuits are used to hold temporary memory, while anything that needs to be stored permanently should be written to a disk, usually by writing to a file in a file system.

All objects used in a Python program are stored in RAM, and do not persist beyond the end of the program. If you want to save information permanently, you can write it to a file.

Server

A computer that's used to relay data to other computers, rather than being controlled directly by a user.

For example, when you load a web page, the data for that page is typically sent by a server that has no keyboard or mouse connected to it, and which is designed to just responds to web page requests.

In an application that involves both server and client programs, if the server becomes unavailable, the client programs will typically display some kind of error message and the application won't work. Similarly, even if the server is working, if a particular user cannot make a connection to it (maybe they're using a cell phone and out of service range) their application won't work because the client program can't function without being able to exchange data with the server program.

For large applications with many users, there will typically be many different interchangeable servers that users can connect to so that even if some fail, others can pick up the slack and the application will still function. When servers fail or when a connection is unreliable, data corruption can result.

Software

Programs, in contrast to hardware.

While hardware refers to the physical devices that allow for computation to happen, software refers to the programs like operating systems and applications that run on the hardware and determine much of its actual user-facing functionality. Given specific hardware, we can run certain software, and when running certain software, the user can accomplish certain tasks. In some cases the same software can run on different kinds of hardware, in other cases the same hardware can be used to run different kinds of software.

SDD

Solid-State Drive

A data storage drive that uses flash memory circuits to store information.

Unlike RAM, a solid-state drive retains stores information even when powered off, but unlike a hard drive, it has no moving parts and uses electrical charge instead of magnetic fields to store data.

Because it has no moving parts, an SSD can typically be read from and written to much faster than a hard drive. However, the circuits used degrade over time, particularly when data is overwritten, and so SSDs fail sooner than hard drives. When memory cells in an SSD wear out, they may not store the data that they are supposed to, possibly leading to data corruption, although modern drives have mechanisms to detect and warn about cell failure.

Extra: Software Development

(We don't expect you to know these terms as part of this class.)

Build

The process of preparing the code of a package for release, often also combining the relevant files and directories into an archive.

In some programming languages that are compiled, the build process includes compiling the code into one or more executable or [shared library}(#sharedLibrary) files.

While Python code can optionally be compiled into "bytecode," this is not necessary for building a package, and instead the process mostly consists of bundling together code and metadata into an archive that will be formatted correctly for use by a package repository.

Compile

The process of turning code into CPU instructions to produce an executable.

Programming languages which rely on a compilation step are called "compiled languages," in contrast to languages like Python that use an interpreter which are called "interpreted languages."

Technically, Python code can be compiled into "bytecode," but this process is optional as the Python interpreter can produce the bytecode it needs as a program is being run.

Dependency

A package that another package or program depends on in order to function correctly.

While user-facing programs often use libraries to support their functionality, libraries also use other libraries to handle parts of what they need. For example, a photo-processing application might use an image-loading library to load images in different formats. That library might in turn rely on individual format-specific libraries like a JPG library or a PNG library to read data stored in different formats. In this case, the image-loading library is a dependency of the application, and the format-specific libraries are dependencies of the image-loading library.

Taken together, the direct and indirect dependencies of any program form a tree. In order to run the program, all of the packages it depends on need to be available. Typically, a package manager will automatically manage dependencies, figuring out what's required and installing everything needed based on a request for a single package.

The situation gets complicated when versions are considered. Sometimes one package depends on a specific version (or range of versions) of another, but a second package depends on an incompatible version. In this case, there's no way to satisfy the dependencies of both packages at once.

Developer

Software Development

Software Engineering

A developer or software engineer is someone who develops/engineers software, usually also a programmer.

Software development/engineering is the process of building programs to solve particular problems or otherwise satisfy design goals.

There are aspects of software development beyond just programming, such as interface design and system design, so on a larger team, not every developer is necessarily a programmer. The role is distinct from a producer whose job is to manage scheduling and planning plus take care of other team organizational tasks, without worrying about the actual design or implementation details directly. Of course, it's possible to be both a developer and a producer if you're doing work that fits into both categories.

Outside of the computer science context, of course, developer may refer to people who do other kinds of development. The "engineering" form emphasizes that software is built to satisfy external design constraints which originate in both humans and nature, and that we seek to build software that operates reliably within those constraints and fails safely when it encounters an error or is pushed beyond its limits. Some software development is more craft- or art-like (e.g., developing a video game) while other software development is more engineering-like (e.g., building a controller for an industrial assembly arm).

There are many different ways to approach software development, including team-based and solo strategies, different ways of discovering and organizing your goals, and different strategies for how to build complex programs.

Interface

API

Application Programming Interface

UI

User Interface

GUI

Graphical User Interface

A physical mechanism or software abstraction that allows a user to interact with a computer in some way.

For example, a keyboard is an interface that allows for typing in text, while a mouse or touchscreen might allow for indicating positions in two dimensions and sending "click" or "tap" actions. On the software side, for example a web browser is an interface for viewing web pages; a web page could also be opened with a text editor but you wouldn't be able to interact with it in the same way.

Every program has one or more interfaces, even if it's just a supporting library designed to be used by other programmers. In that case, we call the collection of functions and other variables that the library makes available an application programming interface or API because it can be used by others to program an application.

Programs that interact with their users directly (for example, by accepting text input or responding to mouse clicks) have an interface which encompasses all of the ways they respond to user inputs and all of the things they display to help the user understand what to input or provide the user with other information or feedback. This is called a user interface or UI. A graphical user interface or GUI is an interface that displays graphics like images beyond just text, and which will often allow for interaction via a mouse or touch screen rather than just accepting text input from a keyboard.

Library

A collection of code designed to be imported and used by other code, rather than to run as a program by itself. Can be organized into one or more modules.

Python has a substantial built-in standard library of modules that can all be imported by any Python program, and the Python Package Index holds a huge number of additional packages that can be used to install more libraries and programs.

License

Copyleft

A license is a document that specifies how one is allowed to use a program.

Licenses may require payment or otherwise restrict the use of software, or they may instead allow for it to be used and shared freely, with some conditions (like providing attribution to the original author). Such permissive licenses are used by open-source projects.

A copyleft license is one that requires that anyone who distributes a modified version also allow others to distribute modified versions of the new version, so that nobody can copy the original, make some changes, and then distribute their version with a more restrictive license. Some licenses also restrict certain uses, or restrict the ability to charge money for access or use. The GNU General Public License is an example of a copyleft license that also restricts commercial use.

The Wikipedia page on software licenses has many more details. The Creative Commons also publishes a list of licenses with various restrictions that are not particular to software and are often use for art.

Maintainer

Someone who handles long-term maintenance tasks for a piece of software, such as a library.

As the versions of dependencies evolve and new bugs are discovered, or features need to be added, there's always more changes that need to be made to keep things working, and so maintainers are necessary for any software that gets used by a significant audience over several years or more.

Metadata

Data which is extraneous to the core purpose of a file or other collection of data.

For example, an image file needs to store color data, but it may also store information like the name of the person who created the file, when the file was created, or if it's a photo, where it was taken and what camera settings were used. None of this extra data is necessary to display or even edit the image itself, but it's useful for other purposes, like searching through a collection images. Some file formats include a way to store metadata within the file, or in other cases, metadata may be stored in an archive or otherwise stored somewhere else.

When metadata is stored separately from the main data it pertains to, it can get lost (for example, if metadata about song authors is stored in a database by your music player, if you share a song file with a friend that information may not be included). On the other hand, when metadata is stored in the same file as the data it pertains to, it may be shared even when the user doesn't intend to share that much information (for example, when posting a selfie to social media, metadata about where the picture was taken may be included automatically, allowing other users to figure out where you live). Web sites that handle images often strip out metadata by default to avoid this problem.

Open Source

FOSS

Free and Open Source Software

Software or hardware where the code or hardware specifications is/are freely available and can be shared by anyone is open-source.

In contrast, many applications that cost money or are developed by corporations are provided as executables where the original "source" code is not available.

The developer of a piece of open-source software may still charge money for it via a restrictive license, but in many cases, open-source software is free and open-source software, or FOSS, meaning that anyone is allowed to use it without charge, and the code is available to be shared freely as well. FOSS is generally produced by volunteers and hobbyists rather than by corporations, although some FOSS developers are paid for their work via crowdfunding or other sources.

A large part of the software that makes up modern operating systems is free and open-source software, maintained by volunteers. The Python programming language itself is FOSS.

OS

Operating System

A management program that starts as soon as the computer is turned on during the boot process and which enables basic user interfaces that allow the person using the computer to launch other programs, access files on file systems, and close programs or shut down and restart the computer when necessary.

Typically, each program is designed to work with one particular operating system, although there are some programs that can work on multiple operating systems. Because Python interpreter programs exist for many different operating systems, a program written in Python can run on any of them.

Package

A collection of code that can be installed or uninstalled using a package manager.

A package may contain a single program, or it may contain one or more modules designed to be imported and used as libraries. Packages may depend on other packages, and a package manager will install all required dependencies when a package is requested.

Package Manager

A tool for downloading, installing, and uninstalling packages from one or more package repositories.

Using a package manager makes it easy to install and upgrade packages, since it can keep track of different versions and dependencies between them, and can automatically install all required dependencies when the programmer wants to install a specific package.

Package Repository

A collection of packages, usually available via the internet, including metadata like descriptions and license information and an API so that they can be accessed by a package manager.

A package repository usually contains only free software (often mostly open-source) so that packages can be installed without worrying about payment.

The Python Package Index is the standard repository for Python packages not published by the Python developers themselves.

Producer

Someone who manages the logistical aspects of a software development team.

A producer is not typically the manager for the team in charge of hiring and promotions, but does have authority to set schedules, call meetings, etc. Their job is to make sure that everyone else on the team has what they need when they need it to do their work, and that team members communicate and collaborate effectively.

Producers typically need some level of technical expertise to understand the work their colleagues are doing, but since they aren't necessarily developers they don't need to be experts at writing code or designing software.

Prototype

Paper Prototype

An incomplete application designed to be used to solicit feedback about the design or otherwise test some of the full functionality before the entire thing is ready.

This can range from a paper prototype where a mock-up of the user interface is made out of paper and manipulated by a designer to mimic intended functionality, all the way to a full-fledged alpha version that is nearly complete but still undergoing final testing and revisions. Code prototypes are usually assigned version numbers less than 1; paper prototypes usually do not have version numbers except when doing very extensive user testing.

Release

Upload a build of a package to a package repository.

More generally, making some code available to a broader audience. As a noun, release usually refers to a specific version of a package, for example "the 2.12.10 release of Python."

User Interface Design

The process of designing, testing, and revising a user interface, sometimes with the goal of making it easy to use.

Since an application can accept input and provide feedback to its users in a huge variety of ways, it can be tricky to figure out what design is easiest for users to use. User testing and prototyping can be valuable for understanding how a certain design will actually be used and then improving it. Accordingly, user interface design is usually a cyclical process of developing a new design prototype, testing it, and then improving it before testing it again.

The domain of human-computer interaction studies user interfaces and best practices in user interface design.

Quite often, the goals of the interface designer may be counter to the goals of the user. For example, the designer of a banking application may want as many users to opt into a different account structure that has more hidden fees, even though this is not what the users want. In these cases, the goal of the designer is no longer "ease of use" and interfaces may become more difficult to use, especially where monopolies exist that avoid the risk of users switching to a competitor's application. More generally, there are a wide range of design goals besides "easy to use" that designers often care about.

User Testing

The process of letting users use an application or prototype and observing how they interact with it, usually in order to understand how to improve its user interface during the user interface design process.

Typically, a designer will observe the user's interactions without interfering at all, since once an application is published, most users will not have a chance to get guidance from the designer on how to use it. This can be frustrating for the designer when the user who is testing does not understand some aspect of the interface or gets stuck, but such occurrences point to places where it may be necessary to provide more information or context, or otherwise change interaction design to avoid them.

Version

Version Number

A specific collection of code for a package that's been saved at a particular point in time, usually identified by a version number.

As maintainers keep fixing and upgrading packages, they periodically build a specific version of the code and release it via a package repository. Certain versions of one package may depend on a specific range of versions of other packages. A package manager can figure out which version(s) of which packages are needed as dependencies for a particular version of a desired package, and can manage package upgrades.

Click here for more details

Since software is continually evolving, instructions that used to work months or years ago may not work any more with the most up-to-date versions of the packages involved. Things get even more complicated when packages depend on hardware features that might no longer be available.

A standard called semantic versioning is used for many version numbers in modern packages. A semantic version number is made up of one to three integers separated by periods, identifying a "major," "minor," and "patch" version. If there are minor fixes or improvements to a package, that will typically be indicated by a new patch version, with the major & minor versions remaining the same. If there are new features added but all the old features still work the same way, that will be a new minor version. If major changes are made, especially if the new version doesn't support all the same operations that the previous version did, or if the kinds of results for some operations changed, a new major version number will be used.

Python itself uses semantic versioning: For example, the differences between versions 2.11.4 and 2.11.5 will be minor fixes, differences between versions 2.11 and 2.12 would include new features, and differences between versions 2 and 3 include major changes to how statements are written so that code which works with version 2 might not work at all with version 3, and vice versa.

Terms like "alpha" and "beta" can also be used to categorize versions based on their maturity.

Extra: CS Disciplines & Careers

(We don't expect you to know these terms as part of this class.)

AI

Artificial Intelligence

A sub-field of computer science that involves developing programs that appear to be intelligent.

Because "intelligence" has many different definitions and might involve things like creativity, solving puzzles, learning from mistakes, etc., there are many different branches of AI research that focus on different things.

The term AI has also become (around 2020) a synonym for particular technologies that generate text or images based on prompts, namely large language models for generating text and generative adversarial networks for generating images. These particular techniques are only a small part of the broader field of AI within computer science.

Compilers

A sub-field of computer science that focuses on the design and optimization of programs that compile code into executables.

Any improvement to the efficiency of executables generated by a compiler can be automatically applied to any program written in the programming language that the compiler uses, so new compiler optimizations can have a big impact. Unfortunately, they often represent tradeoffs rather than strict improvements, and in some cases, they can cause the abstractions provided by the compiler to leak, meaning that to take advantage of them programmers need to understand the details of the hardware their code is being compiled for.

The study of compilers is closely related to the study of programming languages.

Computer Engineering

Electrical Engineering

Computer Engineering is a field related to computer science that studies how to build computer hardware.

Also a career in the design and manufacturing of computers and related systems.

Electrical engineering is broader and covers all kinds of electrical systems, including non-computer electronics like power grids, lighting, and heating circuits.

Computer Science Education

A sub-field of computer science focused on how to teach computer science concepts.

Includes questions such as what programming languages are best for beginners, what order different concepts should be taught in, and how to evaluate the impact of different teaching technologies.

Teaching any subject can be approached as a craft and/or as a science, and most computer science professors do not necessarily do computer science education research, despite being practitioners.

A sub-field of computer science that involves the study of hardware, operating systems, and other less-abstract aspects of computers.

Often focuses on speed, power efficiency, or other performance metrics, and ideally finds breakthroughs that can be applied to make computers more capable or efficient while maintaining existing abstractions so that existing software does not need to change.

Computer systems is related to computer engineering but places more emphasis on the combination of hardware and software rather than focusing on the hardware alone.

Data Science

A career involving gathering, managing, and analyzing data.

Data scientist positions can be found in corporations, nonprofits, government, journalism, and elsewhere. Almost any large organization can benefit from analyzing its own data to improve efficiency and/or analyzing customer or user data to improve its services. Of course, such analysis is often used to serve the interests of the organization even when they run counter to the interests of the people outside the organization that it serves or interacts with.

Despite the name, data scientists often skip over steps or ignore the scientific method, in part simply because a strictly scientific approach limits the insights one can gain from data, and because organizations are willing to make changes based on potential data patterns and then react to the results, rather than waiting for a strict scientific conclusion (which may be impossible to reach) before taking action.

GAN

Generative Adversarial Network

A machine learning system which uses opposing "generator" and "detector" networks to generate images (or other data like music or videos) that matches the style of a set of input images used as training data.

Along with large language models, generative adversarial networks are part of the contemporary idea of AI circa 2020-2025. As with large language models, once trained on a sufficiently large dataset, GANs can reproduce a wide range of images in different styles based on a short text prompt.

GANs, as currently deployed by large corporations, have all of the same ethical issues as LLMs.

HCI

Human-Computer Interaction

A sub-field of computer science that focuses on user interfaces and user interface design, with the goal of understanding how people interact with computers and how the design of applications can shape those interactions.

Draws on methodology from psychology and anthropology since it studies human activities, perceptions, and reactions, but also often involves technical programming work to prototype new or alternative interfaces. Often uses the scientific method to develop formal theories of how people use computers.

IT

Information Technology

Devices that manage and exchange information, typically computers (including desktops, laptops, cell phones, and all sorts of "smart" devices).

Also refers to people or departments who work with these devices to maintain or upgrade them.

Also refers to an academic discipline studying these devices which is very close to computer science but with more emphasis on technology and less on theory (in some contexts, it may just be a synonym for computer science).

Information Technology Operations

DevOps

IT Operations is a career involving deploying and maintaining information technology systems that support an application, web site, or an organization's employees.

Overlaps with system administration.

An IT operations role might involve purchasing and setting up physical hardware like servers to support an application being built by software engineers, then installing operating systems and the new application server software onto those machines and making sure that they can connect to the internet, and finally maintaining them and fixing problems with software or hardware as the application gets released.

DevOps is a career that integrates software development with operations, so that the same team which develops an application is also responsible for deploying and maintaining it.

LLM

Large Language Model

A statistical model of word sequences that can predict text that might follow a prompt.

LLMs are a machine learning technique which is the underlying technology behind AI "chatbots" that give the illusion of a conversation by predicting what text might occur next in a conversation after the user types in their message. When trained on hundreds or thousands of examples, their predictions are very limited and humans can easily see their limitations, but when trained on billions of examples, they can start to give a convincing illusion of intelligence.

Large language models have some inherent limitations, most notably the fact that they will sometimes produce text that's a verbatim copy of the examples they were trained on, which can be a copyright issue, and the fact that since they are just predicting word combinations, they do not reason about the truth of the text they predict, and can therefore produce true-sounding statements which are completely false (this is sometimes called "hallucination" although that term makes it sound like the LLMs are thinking rather than making statistical predictions).

Modern LLMs circa 2020-2025 are haunted by several ethical issues:

The methods by which corporations acquire and filter the data necessary to train large language models are unethical, including piracy and worker abuses.
When these models produce verbatim copies of their training data, the result is plagiarism, but current designs have no way of altering the user to this.
Chatbots built using these models have encouraged their users to commit suicide and otherwise generated harmful text, despite attempts to prevent this.
The ever-increasing energy and water demands of datacenters planned to support projected future LLM use are incompatible with the changes in energy and water use necessary to combat global climate change.
Despite being trained on human writing without any compensation to the authors of the training data, and despite their mediocre performance including regular significant errors, LLMs are competing with humans for jobs in copywriting and related fields, and humans are using LLMs to generate short stories and books which make it more difficult for human authors to sell their work.
Websites generated using LLMs containing a mix of truths and untruths have begun to crowd out older human-authored websites in search results, making accurate information harder to find on the internet. Other uses of LLMs for spam and information warfare cause further harms.

This is a partial list of issues with how large corporations are deploying LLMs. These are not all underlying issues with the technology itself; many of these just result from cost-cutting measures.

Machine Learning

A sub-field of artificial intelligence that involves developing algorithms for computers to "learn" from patterns in data (called "training data") so that they can accomplish tasks without being explicitly programmed for that specific task.

Large Language Models and Generative Adversarial Networks are machine learning techniques that are colloquially referred to as "AI." although that term is much broader than those specific technologies.

Software Engineering (career)

Software Development (career)

A career involving writing code and designing applications.

See the entry on software engineering above.

Programming Languages (discipline)

A sub-field of computer science focused on the study of programming languages, including how they are designed and how they can be made more powerful or easier to use.

System Administration

A career involving maintaining shared computer systems, including keeping their hardware and operating systems running smoothly and managing accounts and permissions for their users.

System administration uses some of the same skills as information technology operations, but typically involves managing one or several systems for a base of users who form a shared community, rather than for customers.