Glossary
This page defines special CS jargon that you'll encounter throughout the class. It's organized by concept topics, so that you can see what will be covered each week, and alphabetized within each topic. Where relevant it includes links to the quick reference page which has short examples and descriptions of the various Python functions and keywords you'll use when programming. It also includes some links to the official Python documentation which includes its own official glossary, although these pages assume a high level of programming knowledge. This page defines terminology we use to talk about programs, rather than the keywords we actually write in our programs.
Use the "find" feature of your browser to quickly jump to the term you're looking for. If you're looking for a concept but not sure what it's called, feel free to check the index, browse by topic, or reach out to an instructor for a pointer. Also please reach out if you run into a term you're not sure the meaning of and you don't see it on this page.
Note: The text of this page is copyright 2025 Peter Mawhorter & other CS 111 team members, licensed under the Creative Commons Attribution NonCommercial ShareAlike 4.0 International license. This means that you're free to share this work, create your own versions, and distribute them, as long as you provide credit, link to the license, indicate if changes were made, distribute your modifications under the same license, and do not use the material for commercial purposes.
Topics
Index
Extra Terms
General Terminology
- Abstraction
-
The use of more general or less-specific terms to describe something which also has a more-specific or concrete description.
Also, the creation of a technical system which allows users to specify things at a more abstract level and have a more detailed process be deduced from the abstract specification.
For example, "add 3 and 4 to get 7" is a concrete description; "add two numbers together" is a more abstract description of the same thing, where there could be several different concrete descriptions that match the abstract description. A computer system that allows the user to type in "add 3 and 4" and which will produce the result 7 might be an abstraction relative to the actual pattern of electrical signals that happens in the physical computer to achieve that result. In the same way, many different physical computers with different designs might be able to produce the same result given different technical systems for translating "add 3 and 4" into electrical patterns.
Click here for more details
The benefit of an abstraction is that the user does not need to know all the technical details of the underlying system in order to use it. We can type "x = 3 + 4" in Python to assign the value 7 to the variable x, without needing to understand the details of how 3, 4, and 7 will be represented using electrical signals and wires in the computer, and what exact process will happen in order to produce the 7 from the 3 and the 4. Someone who did understand those things build an abstraction for us to use, and we can use it without needing to know the details.
Since an abstraction is not a full description of what's actually going on, it may "leak," meaning that some detail of the underlying process becomes relevant to the user of the abstraction in an unexpected way. In this case, the abstraction breaks down, and the user must understand the underlying actual process in order to figure out how to fix things.
Abstractions can occur in multiple layers. For example, the C programming language is an abstraction from the physical hardware of your computer. The Python programming language is written in the C language, so the abstractions it provides are built on top of the abstractions provided by C.
- Algorithm
-
A series of instructions that can be followed to answer a question or solve a problem.
An algorithm does not have to involve computers at all, and even processes like "how to add two multi-digit numbers together by hand" are algorithms (see the Wikipedia entry for Algorihtm on the origin of the term). In computer science, algorithm is used to denote the abstract process used to achieve a desired result, and the same algorithm may be implemented by several different programs.
Whereas a program specifies the exact steps a computer should take to produce a certain result, an algorithm might specify the same steps at a higher level of abstraction. To give a concrete example, a step in an algorithm might be "add two numbers together to get their sum" whereas a computer program might accomplish this by "assigning the first number to variable x, then assigning the second number to variable y, then adding x and y and storing the result in variable z."
- App
- Application
-
A program or collection of programs that allows users to interact with a computer to achieve particular outcomes.
For example, "TextEdit" is an application for the MacOS operating system that consists of a single program that allows users to edit text files. Meanwhile, "YouTube" is an application for viewing and publishing videos that consists of several programs: a server program that accepts requests from client programs and sends or receives data to make them work, a client program within the "youtube.com" web page that allows for viewing or uploading videos via a web browser (which is a separate enabling application), and various client apps for different kinds of tablets or cell phones that each enable similar features to the web page (though not all exactly the same). Users think of "YouTube" as a single thing they interact with, but there are actually multiple programs running on different devices that work together to make those interactions possible.
Click here for more details
The term app especially in the short form has also come to mean a specific client program installed on a computer, cell phone, or tablet, possibly from an "app store." You might be "required to use the YouTube app" to access a certain feature that's not available on the youtube.com web page. Further confusing the term is that a web app refers to a web page that allows for interaction, as opposed to a regular web page that just displays information.
An application often offers a range of different functionality related to some central purpose, with the goal of helping users (who are also customers for paid apps) accomplish some particular task and deal with related tasks. For example, a taxi app might be designed to facilitate ride requests and payment, and it might also handle related tasks like giving ratings to drivers and/or riders and reporting abuse. Such an application would likely include at least three programs: one program on a central server that stores the necessary data to coordinate between drivers and customers, another running on the driver's phone or other device that lets them know where to pick up the next customer and where to drive to, and a third running on the customer's phone allowing them to input their request and destination and pay for the trip. In some contexts these would be referred to as separate apps, and in others as a single app, so the term is a bit ambiguous.
An application typically has a recognizable icon and can be run by either clicking/tapping its icon or searching for it by name in an operating system's launcher menu. When run, it will open up one or more windows that the user can interact with to use the application. Some applications might instead run automatically when the computer starts, and/or run in the background without direct interaction, like an antivirus. Programs that run in the background and only interact with other programs instead of interacting with the user directly are called "services," "daemons," or "background processes." For a Python program to have an icon and be launched directly from the operating system, it must be bundled. Alternatively, you can use a dedicated Python application (or an integrated development environment to load and run a Python program.
In this class, we'll focus on creating simple individual programs that accomplish a very limited single purpose, much simpler than most applications you may be familiar with. Creating an entire application with a complex user interface that handles multiple related sub-tasks requires practice with user interface design and software development methods for larger, more complex programs.
- Bug
- Defect
- Debug
-
A defect is an issue with a program's code that can cause an error or which produces other undesirable behavior, like making the program run very slowly or consume a lot of memory.
Often referred to as a bug, and the process of fixing defects is called debugging.
In the simplest case, a defect will cause a crash, and the associated traceback will point the programmer to where the defect is. However, sometimes a defect causes a crash indirectly, and the defect is not near the line where a program crashes. Even worse, a defect might cause an incorrect result without crashing the program, so that the programmer has to investigate things in detail to figure out how to fix it.
Debugging techniques like probing, tracing, unpacking, toggling, and bisecting can be used to track down difficult-to-find bugs.
- Code
-
A general term for the words, letters, and symbols that we write to construct a program. We can enter code piece-by-piece in a temporary way using a shell, or save a bunch of code together in a file and run it all at once. Sometimes we contrast code with comments which are parts of a program file that get ignored when running the file but are useful for humans reading the file to understand how it works.
- Computer Science
-
Originally, a branch of mathematics studying the theoretical capabilities and limitations of programs (not a science).
Nowadays, a very broad discipline spanning mathematics, science, and engineering that deals with all aspects of computer software, from design to implementation to theory (Computer Engineering and Electrical Engineering deal with hardware).
Click here for more details
A Computer Science major typically covers software engineering and the theory of algorithms, and offers electives covering things like human-computer interaction, as well as specific applications to domains like networking, machine learning, operating systems, and/or cryptography.
Computer scientists use diverse methodologies, and the field is so broad that many computer scientists are totally unfamiliar with core methodologies of other computer scientists. For example, someone interested in user interface design might conduct scientific experiments using tools from psychology involving humans using a program and measuring their effectiveness and/or reactions. Meanwhile, a theoretical computer scientist might invent new algorithms and write mathematical proofs of their effectiveness or efficiency without doing any scientific experimentation. Similarly, a machine learning expert might figure out how to apply an existing algorithm to new data to enable some key new capability, and validate that by developing a new benchmark and showing that their setup works better than other possible solutions. These three examples use tools from science, mathematics, and engineering that are very different, and typically every computer scientist is going to specialize in the specific techniques and methodologies that interest them.
Careers in computer science are also diverse, spanning research, software engineering, technical design (like user interface or website design), technical operations (for example, IT operations or system administration), and data analysis. Institutions from small business to huge multinational corporations employ people in these roles, as do nonprofits, and there are also self-employed consultants. People who major in computer science may end up in other careers that don't require a computer science background but which benefit from it, like software production or digital game design.
- Crash
-
When a program stops running due to an error.
In Python, this is almost always caused by an exception.
Typically, a crash is unintended behavior, and is caused by a defect that the programmer should fix.
- Documentation
-
The comments and docstrings that are part of a program which describe how it works, along with any separate files that describe how it works or how to use it, like a 'README' file.
Documentation can include entire websites (for example, the Python language documentation site), or it may only exist as comments and/or docstrings within the code itself.
Generally, documentation that explains how to use a program is for users, while documentation that explains how a program works is for programmers who might work on the code later, including notes-to-self by the author intended to remind themself of something.
Good documentation is critical to efficient work when programming, and writing the documentation before the rest of the program is good practice as it forces you to write a high-level description of what your code is trying to do before you write the code itself. This strategy can often save time since you'll be aware of certain edge cases or other important things to watch out for from the start.
- Error
-
A problem with or problematic unplanned behavior of a program, including crashes and incorrect results.
Can also specifically refer to an exception.
Not all defects result in errors, since a defect may also cause a program to run slowly or otherwise exhibit undesirable behavior that's not technically incorrect.
When a program crashes, it's usually obvious to the programmer and it produces an error message that can be used to help understand why the crash happened and possibly debug the issue. On the other hand, sometimes a program crashes without an error message, or an error happens without stopping the program. In these cases, it can be more difficult to figure out what happened and fix the issue.
- IDE
- Integrated Development Environment
-
A program that allows the user to write code, save it in a file, and also run the code.
Often specific to one programming language; for example Thonny is an IDE for Python. Usually a program could be written using a general-purpose text editor, but an IDE includes features like syntax highlighting that make writing code easier, and its ability to run the program directly (rather than using another command to run the program once saved) makes development smoother.
- Program
-
A sequence of instructions for a computer to carry out to achieve some purpose. Typically defined by a file containing code that contains a sequence of statements that the computer can run to accomplish the program's purpose.
The term program is ambiguous. It is sometimes used to refer to the program's code (for example, in a statement like "the program is 100 lines long"). On the other hand, it can also refer to a process being carried out by a computer as it follows instructions specified by code, (for example, in a statement like "the program crashed").
- Programmer
-
A person who created (or helped create) a program by writing code.
Often contrasted with the users who later use the program.
- Programming Language
-
A set of rules for interpreting written statements as instructions for a computer to carry out.
Typically we write the statements using a specialized editor like Thonny and save them in a flie.
- User
-
A term for the person who is running a program.
When we build and test our own programs, we are both the programmer/designer and the user during testing. Ultimately, we want to design programs so that anyone can use them, and we have to consider how other people might interact with our programs differently than we do.
- Window
-
A region of the screen where you can enter input and see output from a program, usually rectangular.
In modern operating systems for desktop and laptop computers, most programs you run will create one or more windows that you can use to interact with them, and these windows can be dragged around, resized, and even minimized to hide them until they're needed later.
Thonny, the integrated development environment we'll use for this class, has one main window split into several parts for writing code and seeing printed output from your programs, and it uses secondary windows for displaying drawings or for showing extra information when debugging.
Python Basics
- Argument
-
A concrete value which is provided to a function for the function to work with while resolving a particular function call frame. We write argument(s) between parentheses, separated from each other by commas, directly after the name of the function to use. So for example, if we wrote
max(3, 4)
then the arguments are3
and4
.This can also sometimes refer to the expression used to compute an argument value, so for example in
print(x + 3)
we might say thatx + 3
is the argument. But Python simplifies that expression to a value before actually running the function, so it's more correct to say (assumingx
is 5, for example) that the argument in this situation is8
.Note: Functions which don't have parameters don't need arguments, but still need the
()
after their name when we want to call them.Click here for more details
When Python calls a function to get its result value, it must first simplify all expressions that are part of the code for the function call. After that, each argument position has a concrete value to be used, and Python sets up a function call frame by assigning each parameter of the function to the corresponding argument value. So for example, if we have the following code:
def hypotenuse(a, b): ''' Returns the hypotenuse of a right triangle with side lengths 'a' and 'b' touching the right angle. ''' return (a**2 + b**2)**0.5 x = 2 print(hypotenuse(3, x * 2))
- Python first defines the function
hypotenuse
. - Next it assigns
2
as the value of the variablex
. - Next it is asked to simplify the expression
print(hypotenuse(3, x * 2))
. It does this using the following steps:- First, simplify
x
using variable substitution, so we get:print(hypotenuse(3, 2 * 2))
- Next, do addition, getting:
print(hypotenuse(3, 4))
- Now, we've got argument values ready for the
hypotenuse
function call (we're still working on the argument forprint
). So we can jump over to that definition to get a result before continuing here. a. To prepare our function call frame, we look at our arguments (3 and 4) and match them with our parameters (a
andb
). We assigna = 3
andb = 4
. a. In thehypotenuse
there's just one line of code: areturn
statement with an expression to simplify. So we have to simplify(a**2 + b**2)**0.5
. For brevity we'll skip the 6 steps of that process, but the result is the floating-point number5.0
. c. Now that we've simplified the return expression forhypotenuse
, we jump back to the place where it was called and replace the function call with its result. - We get
print(5.0)
, so we're ready to callprint
now that we know its argument value. print
gets called, resulting in5.0
appearing in the output. The return value ofprint
isNone
, and that's the final result of the simplification process forprint(hypotenuse(3, 2 * 2))
. We didn't store that result in a variable or do anything else with it, so it gets ignored, which makes sense, because it'sNone
.
- First, simplify
- Python first defines the function
- Assign/Assignment
-
A type of statement which uses
=
to create a variable or update the value of an existing variable. Also refers to other situations where the value of a variable is first provided or changed, such as when calling a function or when beginning an iteration in a loop, although these are not "assignment statements," just "assignments."To simplify an assignment statement, Python first simplifies the expression on the right-hand side of the
=
to a concrete value. Then it updates the variable named on the left-hand side of=
to hold that value.Click here for more details
Unlike in algebra, the left-hand side of an assignment statement may only consist of a variable name, so something like
y + 3 = x
is NOT permitted. Also unlike in algebra, multiple assignments to the same variable are allowed, even with different values. Each assignment statement happens when Python reaches that line of code, updating the value of the variable. While evaluating the right-hand side of an assignment statement, Python uses current variable values, as the assignment statement has not yet taken effect. So for example, if we run:x = 5 print(x) x = x + 1 print(x)
...we first create a variable
x
with value 5, then print the number 5, then updatex
to hold 6, and then print 6.Assignment happens automatically during function calls and
for
loops. When a function is called, after simplifying each argument expression to get argument values, Python creates a function call frame and before running the code of the function, it assigns each parameter to one of the argument values, matching them up by order.(Python has a few more advanced mechanisms for argument matching, but we don't cover those here.)
- Bracket
- Square Bracket
- Curly Brace
- Parenthesis
-
These words are sometimes used interchangeably in other contexts, but in Python each refers to a specific pair of symbols:
[]
are open- and close- brackets, sometimes explicitly called square brackets.{}
are open- and close- curly braces, sometimes just called braces.()
are open- and close- parentheses.
In Python, each of these are used in several different ways depending on the context. For example, brackets are used both to create lists and to do indexing, as well as to look up items in a dictionary. Parentheses can be used for grouping parts of an expression, but they're also used for calling functions and for creating tuples. Curly braces are used to create dictionaries and sets but are also used in format strings.
- Call
- Invoke
-
In programming, we use the word "call" specifically to mean a step where the code has to simplify a function with known arguments, and so it needs to go run the code of that function before coming back with a result. "Invoke" is a synonym of "call."
So if you write
len('he' + 'llo')
the first step is to join the'he'
and'llo'
strings together to form the string'hello'
, and the second step is to call thelen
function, measuring the length of'hello'
to get the integer5
. That first step is just a built-in operation, while the second step is a function call. - Camel Case
- Snake Case
-
Camel case describes a word with internal capitalization that's used to suggest multiple words put together, as in "camelCase."
Snake case describes multiple words joined together with underscores, as in "snake_case."
When naming variables we often want to provide a multi-word description of their purpose or typical contents, like "next number" or "total bananas" or such. However, variable names can only contain letters, underscores, and digits (and they may not start with a digit). One approach to multi-word names involves capitalizing letters in the middle of the name to visually break it into multiple words as in "totalBananas." Another approach is to use underscores instead of spaces to separate words, as in "total_bananas." The latter is called snake case because it was first used as a widespread convention for the Python programming language, in contrast to camel case (evoking the humps of a camel) which had become the convention in older languages.
In this class, we want you to use camel case, because the underscores in snake case make it unwieldy when read by a screen reader. However, when using built-in Python functions, you'll find many of them are named using snake case, or in some cases, you'll find functions with names in all lower case (like "isupper") where it's more difficult to distinguish the word boundaries.
- Comment
- Comment Out
- Comment In
-
Part of a program file that is not evaluated as a statement but instead contains notes for humans reading the program to help them understand it. In Python, a hash sign marks the beginning of a comment, and anything after that on the same line will be ignored when running the program, as it won't be considered part of the code.
At the start of a file or function, block of text surrounded by triple quotes (
'''
or"""
) is also a kind of comment, but these special comments are used in help text and are called "docstrings".To comment out a line of code, just add a hash sign (
#
) at the start, turning it into a comment. A commented-out line of code can be re-enabled by removing the hash sign, which we call commenting in. Your editor will have a shortcut key for toggling all selected lines between comments & code. When we comment out code as part of the debugging process, we call this toggling. - Concatenate
-
Join two pieces of text together.
If we write
"ab" + "cd"
Python will simplify those two strings into one string"abcd"
when it evaluates that expression. - Docstring
-
A special kind of comment marked by triple quotes when they occur at the start of a file or a custom function.
See the second docstring entry below for more details.
- Dot
-
A period:
.
In Python, periods are used to indicate that one function or variable "belongs to" another value. For example, methods belong to a particular type, and can be used with values of that type by writing a dot after the value and then writing the method name, like this:
word = "loud" print(word.upper())
We'd read the second line as "print word dot upper."
- Evaluate
-
The process of simplifying an expression to get a result value. Normally, Python runs a program by running each statement from top to bottom, and if a statement consists of just an expression, it will simplify that expression before moving on in the program.
Most other statements involve evaluating part of the statement as well. For example, assignment statements need to evaluate the expression on the right-hand side of the equals sign so they can update the variable named on the left-hand side with the resulting value.
Evaluation is a process that starts with code, and then simplifies it by:
- Replacing variable names with the current values of those
variables (e.g.,
x
=>3
. - Replacing operations with their results (e.g.,
3 + 4
=>7
). - Replacing function calls with their results, after creating a function call frame and running the function's code to get a result.
The end result of evaluation is always a single Python value, like an integer, floating-point number, or string.
- Replacing variable names with the current values of those
variables (e.g.,
- Exception
-
A situation where Python cannot continue to execute code because the instructions it is trying to follow don't make sense, are in conflict with each other, or ask it to do something impossible.
Also refers to the message Python generates to indicate an exception has occurred. Just like values, exceptions in Python have their own types. A few relevant types of exception are:
SyntaxError
- Indicates that your code could not be run at all, because of an instruction that didn't make sense. For example, the codex = "hi
is missing a closing"
mark, so its meaning is unclear and we can't run it.TypeError
- Indicates that an operation was attempted with a value whose type was incompatible with that operation. For example if we try to evaluate"hi" + 3
we get aTypeError
, because the+
operator cannot combine a string and an integer: those types are incompatible for that operator.ValueError
- Indicates that an operation was attempted where the types make sense but a specific value used is not compatible with the operation. For example,int('123')
works and results in the number123
, butint('abc')
fails: the type (string) is permissible, but the specific value'abc'
is not.
There are other more specific types of exception for some situations, like
ZeroDivisionError
if you try to divide something by zero.When an exception occurs, Python prints an error message and stops (this is also called crashing). The error message will include a traceback which shows which line(s) of code were active when the error occurred.
- Expression
-
A piece of code that can be simplified to a value. Often part of a larger statement (for example, the right-hand side of an equals sign in an assignment statement is an expression).
An expression alone on a line of code counts as a simple statement where all Python does is simplify the expression (usually it will have some side effects). Expressions as part of other statements each have a specific meaning, depending on the statement they're used with and where they occur.
Smaller parts of an expression are also expressions of their own. So for example,
x + 3
is an expression, but(x + 3) * y
is a larger expression that includesx + 3
as a sub-expression. Even just individual values when typed in code are basic expressions that evaluate to their values, so'hello'
is an expression whose value is the string'hello'
.Unlike in mathematics, expressions do not have a fixed result value. The same expression might simplify to two different values when run in different contexts (this is particularly common for expressions that are part of function definitions). If we simplify the expression
x + 3
in a context where the value ofx
is 4, we get 7 as the result. On the other hand, if we simplify the same expression again in a context where the value ofx
is 0, we get 3. So the same expression can simplify to different values if it's evaluated multiple times in the same program. - File
-
A collection of symbols/data stored together in the computer. We can open different kinds of files with different applications. Notable for this class, files with names ending in
.py
conventionally contain Python code, and we can open these with Thonny to show us the code and let us edit it and/or run it as a program.See the file input/output entry for more advanced info on different types of files that becomes relevant when covering file input and output.
- Float/Floating Point Number
-
A number that is stored in a format which allows for a decimal/fractional part, in contrast to an integer which can only be a whole number. This is one of Python's basic data types. So
5.1
and3.0
are floating point numbers (also called "floats") while5
and3
would be integers. Notice that3.0
and3
are different representations of the same number. In terms of data type one is a float and one is an integer, but they both represent the same mathematical concept.A few operations in Python can only deal with floats or integers, while most can deal with either. Notably, the result of
/
(division) is always a float, while the result of//
(integer division) is always an integer. Also, when repeating a string using*
, the repeat count must be an integer.Floating point numbers are stored using limited precision, which means certain values get rounded off and cannot be stored exactly. For example, the fraction
1/3
should be represented using an infinitely repeating sequence of 3's in the decimal part. However, Python truncates this to 16 3's, so it's not an exact representation, and over many operations, slight errors can creep into the math because of this. For example, if you runx = 1/3
to store that inexact number in a variable, and then evaluatex + x + x + x + x + x
, you'll get the number1.9999999999999998
instead of2.0
like you should. The precise operations used do matter though:x * 6
will give2.0
as expected. These floating point errors mean that in situations where exact precision is necessary like banking or certain scientific calculations, other specialized data types need to be used. - Function
-
A unit of code that can be repeated by using it's name followed by parentheses. Many functions require one or more arguments. All functions have a return value, so that when you use them, you get a result back (that result might change depending on the arguments used).
For example the built-in
len
function computes the length of a string. If you use it by runninglen('hello')
then the string'hello'
gets sent in as the argument, and the return value will be the number5
since that's how many letters are in'hello'
.Python has a large number of built-in functions, and you can also define your own. Besides operators which accomplish a few basic operations on values, functions are the main way to get stuff done in Python.
Besides result values, some functions have side effects. For example, the built-in
print
function causes text to appear in the program's output. In some cases, a function's result value may be ignored, and we use it just to get its side effect. - Integer
-
A whole number (including negative numbers). This is one of Python's basic data types. Unlike a floating-point number, an integer cannot have a decimal part (not even a
.0
), so4
is an integer but4.5
and4.0
are floating point numbers.Python stores integers of arbitrarily large size, but using extremely large integers (e.g., integers with 20+ digits) may be slower than using smaller integers.
- Keyword
-
A reserved word in Python syntax that has a special meaning, such as 'def,' 'if,' or 'for.'
These words cannot be used as the name of a variable even though they meet all the normal requirements for variable names. Each has a special meaning but needs to be used in exactly the allowed way for it to work.
- Memory
-
Storage capacity in a computer. It is used to store the current values of variables, and it is also used to store the code the computer is running.
Memory may include temporary memory which gets reset every time the computer shuts off (which is used for most things a program does) as well as permanent memory that will remain even when we restart the computer (used to store files). We call the temporary memory "RAM" which stands for "Random Access Memory" and the permanent memory a "hard disk" or "hard drive."
Memory is measured in bits (each bit can hold a single 1 or 0) or in bytes (each byte is 8 bits). In Python, the exact amount of memory taken up by different kinds of values is very complex, and usually depends on what the value is, not just on its type.
We draw memory diagrams to indicate abstractly what values are currently stored in memory by a program.
- Memory Diagram
-
A diagram that shows how Python stores values in the computer's memory. We use boxes with text labels to show the contents of each variable, and inside the box we either write a 'small' value like an integer or float, or an arrow pointing to a 'big' value like a string. The Introduction to Python lecture covers basic memory diagrams, and more advanced memory diagrams are covered when we cover things like lists.
- None
- NoneType
-
None
is a special value in Python used to mean "there isn't anything here" in places where a value is required by the structure of the language. For example, all functions must return a value, but if a function doesn't really have any value to return, it can returnNone
to indicate that.None
still counts as a real value and can be stored in a variable just like other values. It has its own special type calledNoneType
. - Operator
-
A symbol that instructs Python to perform a specific simple operation on one or more values, like
+
or%
. The operators section of the quick reference contains a list of key operators in Python.Most typically, an operator somehow combines the values on its left and right sides into one value during simplification. For example, the
+
operator adds two numbers together (or concatenates two strings). During simplification, both values and the operator are combined and then that entire part of the expression is replaced by the result. For example,3 + 4 * 5
first simplifies the4 * 5
to20
, so it becomes3 + 20
and then simplifies that to just23
.As in normal mathematics, there's an order of operations which determines which operators apply in what order, and you can use parentheses to specify what should happen first. The order of mathematical operators matches that of normal math, with tied operators getting evaluated from left to right. Python has a number of programming-specific operators, this operator precedence table specifies the total precedence order of all operators that Python supports.
Official (highly technical) documentation on expressions, including operators and "primaries."
- Output
-
The text displayed when we run a program. The
print
function causes the argument(s) you give it to be displayed so that someone running the program can see them. "Output" refers to all of the text that gets displayed this way over the course of the program. Unlike values stored in variables, output is not something that we can perform further operations on or refer back to (at least, unless we capture it somehow). Since output is not a value, it does not have a type: when we output the number12
we get the same thing as when we output the string'12'
. -
Cause text to appear as part of a program's output. Also the name of the built-in
print
function which accomplishes this for us. Instructions might say "write some code that prints 5" in which case we could accomplish that by writingprint(5)
. - Quotes / Quotation Marks
-
Quotation marks (either single quotes
'
or double quotes"
) are used in Python code to indicate that part of the code should be treated as a literal block of text (a.k.a., a string). Anything in between quotation marks is not interpreted as program instructions, but instead is used as-is as part of a string value. You can use either a single or double quote to start a string, and then you must use the same mark to end it (and it must end on the same line of code).Python has an alternative for multi-line text: triple quotes. By writing three single or three double quotes in a row, like
'''
, you can start a string which only ends when you write three more of the same kind of quote in a row, even spanning multiple lines of code. When a triple-quoted string spans multiple lines, the text includes those newline characters and will print out as a multi-line string as well. - Run / Execute
-
Tell Python to follow some code instructions step-by-step. We can run either an entire file as a single program containing multiple lines of code, in which case Python starts at the top and works its way to the bottom, or we can use the shell to ask Python to run just a single line of code at a time.
As Python follows the provided instructions, it identifies the type of statement used on each line, and then follows special rules for that statement to determine what it does. For example, when it encounters an assignment statement, it simplifies the right-hand side of the equals sign and then updates the variable named on the left-hand side.
Once Python reaches the last line of code (or if it encounters an error or runs the special
exit
function) it will stop. If something prevents any of those things from happening, it may continue until you force it to stop.Normally, when running code in a file, we start fresh each time, forgetting any variables defined by a previous run of the code. In contrast, when working in Jupyter notebooks, running a cell will not start from scratch, but will retain any variables already defined by other cells that we ran before. This means that in some cases, certain cells must be run before others in order to assign the variables those other cells depend on. Restarting the kernel in a Jupyter notebook will erase any existing variables and reset things.
- Shell
-
An interactive window where you can enter Python code one line at a time and immediately see the results of each line. Typically the prompt for code will be
>>>
, and if you just type an expression instead of a more complex statement, Python will display the result of that expression automatically (in contrast to running things in the shell, this automatic display of results does not happen when running an entire program at once). - Side Effect
-
Something that happens as a result of calling a function other than fetching a result value and simplifying the function call. This can be anything from displaying text in the output window to drawing a line or playing a sound.
In some cases, a function can have a side effect which modifies the value of a variable, although this does not normally happen since arguments are sent to a function as values, not variables. When a mutable value is used however, a function may change it and that would be a type of side effect.
- Simplify
-
Another term for evaluate.
The process Python follows when evaluating an expression is similar to the process that is used when simplifying an expression in algebra. Note that Python only does simplification not solving, so it won't do things like subtract a value from both sides of an equation to figure out what a variable will be. The programmer has to do all the solving stuff and leave Python with just simplification steps. So the code
y + 2 = x + 6
is invalid and will result in an error, but if we have already definedx = 10
and we write the solved version of the equation asy = x + 4
, Python is able to simplify in two stepsx + 4
=>10 + 4
(variable substitution) and then10 + 4
=>14
(addition).Once an expression is fully simplified, Python will usually do something with the result (in the example above, we'd store the result
14
in the variabley
, throwing away any old value thaty
might have had previously).The result of simplification is always a value. In algebra, simplification might leave you with variables that you don't know the values of yet. However, in Python if the simplification process encounters a value with an unknown value, Python will stop and print out an error message.
- Statement
-
A piece of code which specifies a single instruction for the computer to follow. There may be multiple steps to the instruction (like a big expression that will take lots of steps to evaluate) but each statement has a specific meaning. For example, an assignment statement sets (or updates) the value of a variable, or a definition statement begins the definition of a new custom function.
A program is made up of multiple statements in order, and Python executes each statement one by one from top to bottom. A statement can be just an expression, in which case Python just simplifies that expression and discards the result, but other kinds of statements each have some special effect.
- String
-
A piece of text stored as a single value. Strings are one of Python's basic data types and there are several operators that can be used with them, like
+
for concatenation. - Syntax
-
The rules for how symbols need to be arranged in order for Python to successfully interpret them as code. For example, one rule is that for an assignment statement the left-hand side of the equals sign may only contain a variable name. If you write
x = 3
you are obeying this rule, but if you write3 = x
you are violating it and you'll get aSyntaxError
. In general, the syntax rules of Python are very strict, so you have to get used to following them in order to get the computer to follow the instructions you intend to give it. - Traceback
-
A listing that names specific line numbers in specific code files indicating which line(s) of code are active (usually displayed when an error occurs). Multiple lines of code can be active at once because when a function gets called, the line of code calling the function remains paused while the code in the function is running.
The top of the traceback starts with the first line of code to activate, and as you read down the traceback, you're seeing more and more recent lines, ending with the final line of code which is what Python is actively [evaluating][#evaluate) right now (or when an error occurred). In addition to file names and line numbers, the traceback includes the actual code of the line that's active, to help you see what's going on quickly. An example traceback might look like this:
Traceback (most recent call last): File "...\traceback.py", line 17, in <module> print(outer(4, 'hello')) File "...\traceback.py", line 7, in outer r2 = inner(b) File "...\traceback.py", line 14, in inner return x * (x - 1) TypeError: unsupported operand type(s) for -: 'str' and 'int'
The '...' here would normally include the full path of folders containing the file where the error occurred.
Click here for tips on debugging custom functions using tracebacks.
You can often infer a lot from a careful study of a traceback.
From this traceback we can see that on line 17,
outer
was called with4
and'hello'
as arguments, and that line remained active while we ran the code in theouter
function.outer
itself on line 7 of the same file calledinner
, during which, on line 14 of the file, we ran into an error. On that line, we are trying to subtract 1 fromx
, but we get aTypeError
because one of them is a string and the other is an integer.In this case, the only subtraction on the active line was in
x - 1
and since we can see that1
is an integer, we can inferx
must have been a string. We can't see wherex
came from in the traceback, but it's suspicious that on line17
we are using both an integer and a string as arguments. We'd have to look at the code to know for sure, but it's possible that'hello'
on line 17 should have been an integer instead of text. In debugging based on this traceback, you should first look at line 14 to see wherex
comes from, looking back upward in the code to fine wherex
was most recently assigned a value. Ifx
is defined as a parameter ofinner
, then we should look at line 7 of the file, whereb
is used as the argument toinner
: where doesb
get its value? Again ifb
is a parameter ofouter
, we should then look at line 17, whereouter
is being called with two arguments. Ifb
is the second parameter in the definition ofouter
, then we can conclude that'hello'
on line 17 should not have been a string. - Type
-
A category of data that Python can store. Every value has a type, and the type changes how it works when operators are applied. For example,
'3'
is a string whereas3
is an integer, so'3' + '3'
is'33'
(concatenation of strings) while3 + 3
is6
(addition of integers). Functions likestr
,int
andfloat
can be used to convert between types, although there are limitations as to what values can be converted (e.g., you can't convert'hello'
to an integer, but you could convert'123'
).Python will give you a
TypeError
if you try an operation on incompatible types, like'15' + 15
. - Underscore
-
The 'underline' character
_
.Underscores can be used as part of variable or function names in Python, and some built-in function names use them. In this class we prefer using camelCase to using underscores because underscores make for very long names when read by a screen reader.
In some circumstances, Python uses two underscores at the beginning and end of a name that has special meaning. This design makes it unlikely that someone would accidentally give their variable the same name as a special built-in value.
- Value
-
A concrete thing stored in a variable or which is computed as the result of evaluating an expression. For example,
3
or'hello'
. Values are the results of expressions, but an expression like3 + 3
is not itself a value (and thus cannot be stored in a variable). Every value has a type which influences how operators apply to it. - Variable
-
A name used to store and refer back to a specific value, but it may also be reassigned a different value, so its value may change as a program runs. An assignment statement is used to define a new variable or give an existing variable a new value. Python fully simplifies the right-hand side of the statement before storing the result in the variable.
For example, if we have the code:
whale = 3 giraffe = 2 + 8 whale = whale + 1
we have defined 2 variables named
whale
andgiraffe
. The value ofwhale
starts as3
, but then it changes to become4
once the third line runs (the old value is used when simplifying the expression for a reassignment). Meanwhile,giraffe
has the value10
.Note that variables in Python behave differently than variables in algebra. In algebra, a variable has a single value and we might have multiple equations that help us narrow down that value, but it's impossible to say something like
x = 3
and alsox = 4
. In Python, it's fine to sayx = 3
on one line of code andx = 4
on another line of code: the variable takes on one value, and later changes its value to a different value. This means that in Python, the question "What is the value of variable x?" is incoherent: you need to instead ask "What is the value of variable x when we're on line y of the code?"Variables also behave differently than cells in a spreadsheet. In a spreadsheet, a cell can have both a formula and a value. If it has a formula the value of the cell is computed using that formula, and when any of the other cells used in that formula change values, the formula is re-evaluated. So if cell A2 has formula "=A1 + 1" and cell A1 changes value from 2 to 3, cell A2 in a spreadsheet would change value from 3 to
- However, Python variables do NOT work like that. If we have the code:
car = 10 bike = car * 2 car = 12
on line 1, we assign a new variable
car
the value10
. On line 2, we simplifycar * 2
to the value20
and store that in a new variablebike
. Butbike
doesn't remember the formula that was used, only the value. So when we changecar
to 12 on the next line of code,bike
does NOT change to 24, but instead remains 20. In Python if we want to re-compute a formula, we have to include another line of code which re-runs that formula. So in the following code, the final value ofbike
will be 24:car = 10 bike = car * 2 car = 12 bike = car * 2
Importing and Modules
- Import
-
Loading code from another file for use in your program. The
import
statement specifies a name to import, and Python looks for a file with that name plus a.py
ending. So if you sayimport fox
Python will look for a file calledfox.py
. When you import a file, Python runs that file's code, and then collects any variables it defines (including functions) into a module. Functions from that module can be accessed using the.
operator, so if thefox.py
file defines a function calledplay
, we can sayfox.play
to refer to that function after we doimport fox
.Python will look for the file to import in the current directory as well as in other directories where built-in modules and modules installed via the package manager are located.
- Module
-
A program defined in a file which can be imported by another file. Variables and functions defined by this program can be used by another program which imports them ("module" also refers to this collection of variables and functions).
A module will often not actually do anything when run by itself, but instead just define many functions that other programs can use.
- Backslash
-
The
\
symbol (whereas/
is a "forward slash"; yes, the distinction is arbitrary).It has a special meaning when it occurs inside a quotation marks: it changes the meaning of the next character to an alternate meaning. For example in
\n
the 'n' becomes a newline character. Other special characters include:\t
is the "tab" character, which is also what you get if you hit the 'tab' key.\r
is the "carriage return" character, which moves back to the beginning of the current line of text before continuing to print.\b
is the "bell" character, which sometimes makes a sound when it gets printed (but lots of computers don't do that any more, sadly.\"
and\'
are normal quotation marks, but they DON'T end a string as they usually would. So the code'don\'t'
is a string that has a quotation mark inside it, even though without the backslash'don't'
would be an error since Python thinks that the quote between the 'n' and the 't' is the end of a string'don'
and then doesn't know what to do with the extra 't' and quote.
There are many more special characters available, but we won't cover them all here.
Another use of backslash is that if it's NOT inside quotes and it's at the end of a line of code, then it prevents the end of the line from happening normally, and so the next line of code counts as if it were part of the line above it. For example, you can write:
x = \ 5
and that will work the same as writing
x = 5
on one line. For this application, the backslash MUST be the very last character in the line. - Character
-
A single letter or other symbol within a string, including special things like a space or a line break.
In Python, unlike some other programming languages, there is no separate type to represent a single character. Instead, we can use a string with just one character in it.
- Format String
-
A string that includes an 'f' before the opening quotation mark, allowing the use of curly braces inside the string to interpolate expressions to form a final string value.
For example, if we write:
x = 5 st = f'x = {x} and x + 2 = {x + 2}'
The final value of the
st
variable will be computed by simplifying the expressions inside the curly braces within the string, converting the results to strings, and then replacing the curly brace regions with those results. Since the first set of curly braces contains the expressionx
which simplifies to 5, and the second set has the expressionx + 2
which simplifies to 7, the value ofst
will be the text"x = 5 and x + 2 = 7"
.Format strings are a convenient way to assemble messages from multiple variables or complex expressions, where otherwise you'd have to manually convert the results of those expressions to strings and use concatenation to combine everything. Without using a format string, to achieve the same result you'd have to do:
x = 5 st = "x = " + str(x) + " and x + 2 = " + str(x + 2)
Format strings are often used as the final step in formatter programs.
- Hash Sign
-
The '#' symbol.
In Python, everything following a hash sign until the end of that line is considered a comment. This effect is ignored if it occurs inside of quotation marks of course, so you can still use hash signs in strings.
- Line Break
- Newline Character
-
When dealing with strings, we sometimes want to include multiple lines of text. A line break occurs between each line. To represent this, there's a special character called the newline character that represents moving down to the beginning of a new line. This can be written in code as
'\n'
, where the backslash means "next character has a special meaning" and the special meaning of 'n' is "new line."For example, if we write
"red\nblue\nyellow"
in our code, the corresponding value has the words "red," "blue," and "yellow" on three separate lines, and will appear that way if we use it with theprint
function (in other contexts it may be displayed with quotes around it and\n
inside just like it was entered).Python also allows for triple-quoted strings see quotes that can span multiple lines and therefore include line breaks naturally.
- String Interpolation
-
The process of putting values together along with fixed scaffolding into a single string.
For example, if we do:
x = 5 print("The value is: " + str(x))
We are interpolating the value of
x
into the string that we print.The term interpolation also refers to blending between two values using a number to control the result.
Dealing with Text
Defining Functions
- Argument
- Block
-
One or more lines of code which are all indented at the same level as each other, or are further indented than the base level. Function definitions are based on indented blocks, among other things. For example in this code:
def addThree(n): n = n + 3 return n
the lines
n = n + 3
andreturn n
are both part of the same indented block.One block may contain smaller blocks when there are multiple levels of indentation. For example, in this code there are three blocks:
n = 43 if n > 50: if n % 2 == 0: print('even') else: print('odd')
- The
if n % 2 == 0
line is the start of a block which encompasses the next 4 lines (including theprint
lines). - Each of the
print
lines is its own block (both of which are also part of the first block).
- The
- Body
-
An indented block of code which follows a special kind of statement such that the statement above it controls that code. For example a definition statement uses the indented block of code below it as its body. For a function definition, we run the body code when the function is called.
- Call
- Dead Code
-
Code which will never execute because of where it is placed in the program structure.
For example, since a return statement stops execution of the current function call frame, any code at the same indentation level after a return statement is dead code: based on its placement, it should run next after the
return
happens, but becausereturn
will stop the function it's in, the code will never be reached.Dead code is almost always a mistake and should usually be removed (or the program should be restructured so that it's reachable).
- Definition/Define (functions)
-
A special statement that defines a custom function using the keyword
def
. The definition must include a function name followed by parentheses, and any paramters the function will require must be given names inside those parentheses. The line must end with a colon, and then at least the next line of code must be indented farther to the right. That line, and any subsequent lines indented at the same level or further, will become the body of the function.After a custom function is defined, it can be called just like any other function: write its name followed by parentheses, and supply any necessary arguments within the parentheses. When a function is called, we run the body code, and then simplify the function call by replacing it with the return value of the function. In some cases, we will use that result, while in others it may be ignored and we just want to call the function for its side effects.
For example, in this code:
def plural(word): return word + 's' print(plural('whale'))
the name of the function being defined is
plural
and it has one required parameter calledword
. When we run it, it just evaluates a single return statement that determines the result value by concatenating the argument value with the letter 's.' After the definition, we call it as part of an expression that uses the built-inprint
function. Python will simplify that last line by doing the following:- Starting from
print(plural('whale'))
it first figures out that'whale' is just a piece of text. Having simplified the argument expression, it's ready to actually call
plural(whose result is needed for the
printcall, so
plural` must happen first). - To figure out what
plural('whale')
will simplify to, Python runs the body of theplural
function until it gets to areturn
statement, and then returns the value it gets by simplifying the expression after thereturn
keyword. In this case, since'whale'
was the argument, within the function the variableword
will have the value'whale'
, so when we simplifyword + 's'
we get'whale' + 's'
and then that simplifies to'whales'
. That becomes the "return value" of the function. - Now that we're done with the function, we can replace the function
call code
plural('whale')
with the result that we got. Soprint(plural('whale'))
becomesprint('whales')
. - Finally, we are ready to call
print
, which displays the wordwhales
as output, and simplifies to the valueNone
.
Here is the highly technical official documentation for function definitions.
- Starting from
- Docstring
-
Docstrings will automatically be displayed when the
help
function is used on the module or function they belong to.Every custom function you define should include a docstring, and writing the docstring before you write the function's code is a good way to force yourself to be more deliberate in your planning, which will usually save you time in the end.
A docstring for a custom function may contain doctests.
- Doctest
-
An example of how a custom function should work that's written as part of a docstring. It's written using three
>
symbols to emulate the Python default prompt, plus some code followed by the expected results of that code, exactly as if that code had been run in the Python shell.So for example, if you are using the Python shell to do some math you might see the following:
>>> 3 + 4 7 >>> 5 * 2 10
If you copied those four lines and pasted them into a docstring, then if you wrote the following at the end of your file:
import doctest doctest.testmod()
Python would run the code after the
>>>
lines and confirm that the results matched the two result lines, printing out an error message if it did not. Here's a complete example of a custom function definition that includes a doctest as part of its docstring:def hypotenuse(a, b): """ Computes the hypotenuse of a triangle with near sides a and b. Returns a floating-point number. For example: >>> hypotenuse(3, 4) 5.0 >>> hypotenuse(6, 8) 10.0 >>> hypotenuse(2, 3) 3.605551275463989 """ return (a**2 + b**2)**0.5 # These lines run all doctests for functions in this file import doctest doctest.testmod()
Doctests are useful because you can confirm that your code will work in a variety of situations. Writing doctests before you write your function can be a good way to plan ahead for the different kinds of inputs/arguments you will need to handle.
Note that your output must match what Python would display EXACTLY, although you can enable a special "ELLIPSIS" setting to get around this for certain things. Refer to the official documentation on the doctest module for more details.
- Frame (Function Call Frame)
-
A separate unit of memory used when evaluating the result of a function. Each time a function is called Python creates a new function call frame, and the frame is destroyed when the function ends and returns.
Local variables used by a function, including its parameters, are stored in this function call frame, which means that they are separate from variables used by other functions, even if they share the same name.
Even when calling the same function multiple times, each call gets its own separate function call frame. This feature makes recursion much more powerful, since multiple function call frames for the same function can stack up and may include different values.
- Indentation
-
Indentation is space and/or tab characters at the beginning of a line of code.
In Python, indentation determines which block of code each line belongs to. For code in the same block, indentation must be consistent (or must be more than the base indentation level for sub-blocks).
Theoretically you can pick any number of spaces and/or tabs for indentation, but practically using four spaces for each block level is a widespread convention that you should almost always follow.
- Local Variable
- Non-Local Variable
-
Any variable that's assigned a value within a function, including all parameters. These variables are stored in the function's function call frame and have separate values from other variables (even other variables with the same name). These variables are forgotten when their function call frame ends.
Note that when a function refers to the value of a variable that it does not assign (and which is not a parameter), we call this a "non-local variable." Functions are not allowed to modify the values of non-local variables, if they include an assignment statement for a non-local variable, then that creates a new local variable with the same name instead. It's an error to first refer to a non-local variable you have not yet assigned, and then assign a local variable with the same name. For example, this code will generate an exception:
x = 5 def f(): print(x) # error on this line (local x not yet defined) x = 3 # this line implies x MUST be local print(x) f()
In contrast, the code below does not generate an exception, but instead defines
x
as a local variable in functionf
and separately usesx
as a non-local variable in functiong
:x = 5 def f(): x = 3 # local variable; doesn't affect non-local variable x print(x) # prints 3 def g(): print(x) # prints 5 (non-local variable) f() g()
Click here for more details
These should usually be avoided, but the keywords
global
andnonlocal
can be used to declare that you will be using a non-local or global variable so that you can modify its value instead of creating a new local variable when you do an assignment statement.global
refers to variables defined outside of any function definition, whilenonlocal
can be used when multiple functions are defined inside each other to refer to the closest local variable that's outside the current function definition.We will not use these keywords in this class, nor will we ever need to define functions inside each other.
Here are the official reference pages for the global and nonlocal statements.
- Parameter
-
A special kind of local variable whose name is included in the definition of a function and which gets its value from an argument passed to the function when the function is called. For example, consider the following code:
def play(squirrel, robin): print("Squirrel start", squirrel) print("Robin start", robin) squirrel = squirrel + 1 robin = robin * squirrel print("Squirrel end", squirrel) print("Robin end", robin) print("Result is", squirrel + robin) print("---") play(3, 4) play(5, 2)
In this code, we define one function named
play
, and then we call it twice. Each separate call to a function will set its own values for the parameters of that function. On thedef
line, we wrotesquirrel, robin
inside the parentheses, so that definesplay
as a two-parameter function. Any time it is called, two arguments must be supplied, otherwiseplay
's two parameters would not be able to get their values. So callingplay
asplay(1)
orplay(1, 2, 3)
would generate an exception since play needs exactly two arguments. The values ofsquirrel
androbin
are assigned based on the values of the two arguments for each particular call made toplay
, using the order of arguments and the order of parameters to match them up. So in the first function call above,squirrel
is3
androbin
is4
, while in the second function callsquirrel
is5
androbin
is2
. Parameters are very powerful because they allow us to run the same code with different starting variables just by providing different arguments to a function.Note that like any other local variable, an assignment statement can be used within a function to update or replace the value of a parameter.
For the code above, the output will be:
Squirrel start 3 Robin start 4 Squirrel end 4 Robin end 16 Result is 20 --- Squirrel start 5 Robin start 2 Squirrel end 6 Robin end 12 Result is 18 ---
Parameters are a bit of a slippery concept. You can try copying the code above into your editor and running it with various changes:
- What happens if you change the values used as arguments?
- What happens if you change the name of a parameter?
- What happens if you change the number of parameters used?
- Pass
-
Send an argument into a function so that one of that function's parameters gets a certain value. This happens when we call a function.
For example, when we write:
def f(x): return x + 1 print(f(3)) print(f(10))
On the 4th line, we are passing the value
3
as the parameterx
, and on the 5th line, we are passing the value10
instead. We could say something like "When 3 is passed to f, the result is 4."Note that there is also a
pass
statement which is a statement that does nothing. It's useful when Python requires a statement but you don't want the code to actually do anything. - Return
-
Generate a result value from a function.
The result gets sent back to where the function was called so that evaluation can proceed there, since during evaluation we need to replace function calls with their results. We use the
return
statement to set the result will be: the expression after the wordreturn
gets evaluated and the resulting value becomes the result of the function.If there is no expression after
return
, then the valueNone
is used as the result value, and this also happens if we reach the end of a function's body without executing areturn
statement.If we do execute a
return
statement, we immediately return to the code that called the function, destroying the function call frame whose result we now know. This means that any code in a function's body that happens after areturn
statement is dead code that will never be executed. This also means that usingreturn
will stop progress through a chained conditional statement or a loop.
Booleans and Conditionals
- Boolean
-
A type of value which is either
True
orFalse
, with no other possibilities.Boolean values are used to represent the outcome of tests or decisions, so for example the expression
3 y
and then bigger will hold eitherTrue
orFalse
as its value, depending on what the values ofx
andy
where when evaluating that assignment. - Branch
-
A block of code associated with a particular conditional.
Particularly when multiple
if
/elif
/else
statements are chained together, each of the associated blocks is sometimes referred to as a "branch" or "case." For example, in this code:if n > 5: print(2 * n) else: print(n - 1)
...we might refer to the first
print
as being the "n > 5 branch" or the "greater than five case" or the like. - Case
-
Can be used as a synonym for branch.
Case can also refer to a particular set of arguments or other specific program state in which a program produces a certain output or crashes due to a certain error. For example, you might hear things like "In the case where x = 2..."
- Chained
-
A situation where two or more
if
/elif
/else
branches of a conditional statement are placed next to each other such that control flow for the later branch depends on the earlier branch. For example, anelif
placed after anif
can only trigger if the condition of theif
is falsey. Chaining is limited by certain rules, for example, anif
never chains with the block above it.See conditional for full details.
- Condition
-
An expression which is used as part of a conditional statement to determine what will happen.
Conditions are also found in other kinds of statements, like while loops.
Typically, a condition should simplify to a boolean value, but if it doesn't, the decision that depends on the condition will be made based on whether the value is truthy or falsey.
- Conditional
-
A statement which uses one of the keywords
if
,elif
, orelse
to control the flow of execution within a program.Normally, lines of code are executed in order from top to bottom, perhaps jumping around to figure out the results of function calls. When
if
,elif
, and/orelse
come into the picture, certain blocks of code may be skipped depending on the values of condition expressions. For example, withif
, when the simplified value of the condition is truthy, the block of code after theif
statement gets run normally, but if the value of the condition is not truthy, that block of code gets skipped.Click here for more details
if
,elif
, andelse
statements at the same level of indentation form connected groups:if
statements chain withelif
orelse
statements directly below them,elif
statements chain withif
orelif
statements above them and withelif
orelse
statements below them, andelse
statements chain withif
orelif
statements above them. The connected statements form a chained conditional. Note thatif
statements do NOT chain with statements above them, nor doelse
statements chain with statements below them, and any statement other thanif
,elif
orelse
at the same indentation level will break a chain. For example, the following code contains 5 conditional statements grouped into 3 chained conditionals:if A: print('A') if B: print('B') elif C: print('C') if D: print('D') else: print('E')
The first
if
is alone, because theif
below it cannot connect with anything above it. That secondif
does connect with theelif
below it, but thatelif
cannot connect to anif
below itself. That thirdif
connects to just theelse
below it.The rules of control flow for a chained conditional are:
- Check the first condition on the initial
if
(every group begins with anif
). If it's truthy, run that first block of code and skip all the rest. - If the condition of the first statement was not truthy, repeat this process starting from the second statement in the chain, again skipping all the rest if that statement's condition is truthy.
- If you get to an
else
without executing any block of code yet, run theelse
block (and then you're done with that chain, sinceelse
can only appear last in a chain). - If you get to some other code that's not an
elif
orelse
at the same indentation level as the conditionals in the chain, then the chained conditional is over and you resume running that code normally. This includes the case where there's anif
on the line below the final block of a chained conditional, sinceif
will NOT connect to conditionals above it.
As an example, if we look at the code above and assume that
A
andC
are bothTrue
whileB
andD
areFalse
, it would checkA
, then print 'A' as a result of that check, then it would checkB
, and sinceB
isFalse
, it would skip printingB
and checkC
, whose value would cause it to printC
. Then it would checkD
, and since that's notTrue
, it would run theelse
block (which only depends on the check ofD
, not any of the other checks) and it would print 'E'. - Check the first condition on the initial
- Control Flow
-
The order in which lines of code are executed when running a program. A "control flow statement" is a statement which alters the normal flow of control.
Normally, after executing one statement, Python moves down to the next statement in the code ("normal control flow"). However, special statements or expressions like function calls or conditionals can cause something different to happen.
- Edge Case
-
A specific set of circumstances (case in the second sense) which is either unlikely to happen, or which despite being likely, causes weird behavior or represents an odd configuration of inputs.
For example, pages of a book are not usually given negative numbers, so a negative number as input to a function which expects a book page as the argument might be an edge case. It often takes extra effort to anticipate and correctly handle edge cases, so errors happen more frequently in edge cases.
- Equivalent
-
A situation where two values are similar enough to count as the same value when using the
==
operator, either because they are actually the same, or because they are different values but their structures/contents are effectively the same.Contrast
==
with theis
operator, which ignores equivalence.When using the
==
operator, the result isTrue
if the two values being compared are equivalent, andFalse
if not (the result is always a boolean). Some examples:3 == 3
- Values are the same, so are equivalent. Evaluates toTrue
.3 == 3.0
- Values are not identical, but are equivalent. One is an integer and another is a floating point number but they represent the same underlying number, so they count as equivalent. This evaluates toTrue
.3 == 3.1
- Values are neither identical nor equivalent. This evaluates toFalse
1 == True
- Values are different types, but are considered equivalent. This evaluates toTrue
.0 == False
also evaluates toTrue
.2 == True
- Values are different types, and are NOT considered equivalent, even though both are truthy.[1, 2] == [1, 2]
- Values are two different objects, but their contents are equivalent, so they are considered equivalent. This evaluates toTrue
.
- Mutually Exclusive
-
A situation where different parts of the code are set up such that only one of a group can be run.
Usually conditional is used to achieve this. Using an
if
statement followed by one or moreelif
statements, because of howif
andelif
statements force Python to skip subsequentelif
and/orelse
blocks if the condition of one above them is truthy, the blocks of code will be mutually exclusive. Note that using severalif
statements in a row does NOT make those blocks mutually exclusive, since the conditions of subsequentif
s will be evaluated whether or not earlierif
s were true or false, so it's possible to have multiple blocks of code run if multiple conditions are true.The use of
return
statements (which stop execution of further statements in the same function) can also be one way to achieve mutual exclusivity, as can careful structuring of multiple conditions (even in cases where code isn't sequential). For example, in the following code, theprint('A')
andprint('B')
are mutually exclusive, because it's impossible for the conditions of their respective ifs to both be satisfied.x = 5 if x > 10: print('A') print('middle') if x
- Predicate
-
A function which always returns a boolean.
A predicate can have any number of parameters, but usually has at least one. Calling a predicate gives a true/false answer about its argument(s), so it can be thought of as a procedure that provides the answer to some question about its argument(s). For example, we might define:
def isEven(num): return num % 2 == 0
This function is named "isEven" because it answers the question "is this number even?" and if the number is even, the result will be
True
while if the number is not even, the result will beFalse
. Predicates are often used in conditions. - Truthy/Falsey
-
A property of all values which determines what happens when they're the result of a condition.
Values which count as "truthy" cause the code that included the condition to do one thing (e.g., execute a block of code below an
if
) and values which count as "falsey" cause something else to happen (e.g., skip that block). Normally, we should use boolean values in these situations, whereTrue
is truthy andFalse
is falsey. However, every value counts as either "truthy" or "falsey:"- For integers and floating point numbers the
number
0
(or0.0
) counts as falsey, and all other numbers (including negative numbers) count as truthy. - For strings, the empty string
''
counts as falsey and all other strings count as truthy (the empty string has no characters between the quotation marks, not even a space, so its length is 0). - The special value
None
counts as truthy. - Most other special values (like functions) always count as truthy.
Some truthy values count as equivalent to
True
, and some falsey values count as equivalent toFalse
, but these equivalences don't hold for all truthy/falsey values (for example,2
is truthy, but2 == True
simplifies toFalse
so it's not equivalent toTrue
. - For integers and floating point numbers the
number
Sequences and Loops
- Collection
-
A group of individual values.
Unlike a sequence a collection does not necessarily have an order (or that order is incidental). Values in a collection are often called elements.
- Construct (noun)
-
The general term for different kinds of statements. Refers not just to the line of code where the statement appears, but to all lines of code controlled by that statement too, and can be used to refer to a combination of statements in a common pattern.
For example, both conditionals and loops are code constructs. We might also talk about an
if
/elif
/else
construct composed of multiple statements. - Element
-
One value that's part of a larger collection.
Since sequences are a kind of collection, this also applies to values in a sequence, where the elements may be identified by their indices.
- For Loop
-
A loop using the
for
keyword which iterates its loop body once for each element in a specific collection.Often used with strings, lists, ranges, and other ordered sequences, as it will go through items in order one at a time. For example:
for x in [1, 2, 3]: print(x)
In this
for
loop,x
is the loop variable and[1, 2, 3]
is the loop sequence. Python will run theprint
line three times with 1, 2, and then 3 as the values ofx
during each run respectively.For loops get their name from the fact that they iterate "for each" element in their loop sequence.
- Index
-
An integer indicating a position within a sequence.
In Python, these numbers start at 0 for the 1st position, 1 for the second position, and so on. Python in most situations also allows for a negative number to refer to one of the last items in a sequence, so -1 refers to the last item, -2 to the second-to-last, and so on.
Quite often when dealing with sequences, we will use an index variable along with an indexing operation to read and/or update values in the sequence.
- Indexing
-
The act of pulling a specific item out of a sequence using an index.
Python uses square brackets for this, written directly after the value (or expression) that we want to pull an item out of. For example:
index = 0 nums = [1, 2 3] one = nums[index]
In this code, the square brackets on the second line are NOT an indexing operation, but instead define a new list. Then the square brackets on the third line index that list, pulling out the first item.
- Iteration
-
The process of running some code repeatedly. Also refers to a single repetition within that process.
For example, "solve this problem using iteration" is the first sense, while "what is x on the first iteration of this loop?" is the second sense.
An iteration table can be used to show the values of key variables on different iterations of a loop, with one row of the table per iteration of the loop.
The verb iterate refers to this process.
- Iteration Table
-
A written table which shows the values of key variables on different iterations of a loop.
For example, "solve this problem using iteration" is the first sense, while "what is x on the first iteration of this loop?" is the second sense.
An iteration table can be used to show the values of key variables on different iterations of a loop, with one row of the table per iteration of the loop.
- List
-
A mutable data type that holds multiple values in a fixed order.
Lists are a kind of sequence. Each entry in the list holds a single value, just like a variable can, and just like a variable, it can hold any type of value. Often, however, we use a list where each entry has the same type. Lists are defined using square brackets, with commas to separate the values. You can use expressions in the brackets and Python will simplify those expressions and store the resulting values in the list.
Some examples:
[1, 2, 3]
is a simple list where each of the 3 elements is an integer.['ab', 'c', 'hi']
is a list of strings.[1, True, -4.5, 'st']
is a list that has multiple different types of values.[[1, 2], [3, 4]]
is a list with two values, each of which it itself a list (of two values). We call this a nested list. The inner lists are sometimes called "sublists."
Values in lists can be accessed via indexing which uses square brackets that appear immediately after another expression (often a variable reference). For example:
nums = [1, 2, 3] # creates a list nums[0] = 7 # uses indexing to modify an element in the list print(nums[1]) # uses indexing to access an element in the list
In this code,
nums[0] = ...
changes the value of the first element of the list, whilenums[1]
refers to the value of the second element (without changing it since there's no equals sign).In contrast to strings (which are also sequences), lists support several methods that can change their contents, including
append
,insert
, andpop
.Since each list entry is like a variable, in a memory diagram each list entry gets its own arrow (or holds a simple value like an integer or Boolean).
- Loop
-
A code construct that allows you to repeat a block of code.
The two types of loops in Python are for loops and while loops. Both types have a loop body which is a block of code that gets repeated.
- Loop Condition
-
In a while loop the repetition is controlled by a loop condition which appears after the word
while
and before the colon in thewhile
statement at the top of the loop. Before the first and each subsequent iteration, the condition is simplified, and if it is truthy then the iteration proceeds, but if it's not, the loop stops.Because the loop continues to iterate until the beginning of a new iteration when the condition is false, it can also be called a continuation condition.
Note that the condition is only checked at the beginning of each iteration. If the condition becomes false during the execution of the body of the loop, the loop will still continue to the end of that iteration (and if the condition becomes true again before the end of that iteration, it will continue to the next iteration as well).
- Loop Sequence
- Loop Variable
-
In a for loop, the loop is controlled by a loop sequence and the elements of that sequence are stored, one by one, in a loop variable.
- Method
-
A function that's associated with a particular data type and which is used by writing a dot after a value of that type followed by the method name and then parentheses where arguments may be supplied.
For example, in this code:
x = 'HEllo'.upper() y = x.lower()
there are two methods being used: the
.upper
and.lower
methods for thestring](#string_2) type. These methods are associated with the string type because after simplifying, the value which appears before the dot in each case is a string. Neither accepts any arguments, but the parentheses are still required in order to [call](#call) them. The whole expression before the dot, plus the method and its arguments, will [simplify](#simplify) to the result of the method (established via
returnas usual for a function). The
uppermethod returns an upper-case version of the string it's applied to, while
lowerreturns a lower-case version, so in this code,
xwill have the value
'HELLO'while
ywill have the value
'hello'`. - Memory Diagram
-
When diagramming a list, m
- Mutable
- Immutable
-
Every Python value is either mutable or immutable. A mutable value is one that can be changed, an immutable value cannot be changed. This concept is very slippery, however, because a variable can always be assigned a new value. So variables are always able to change, even when the value inside them may not be.
Click here for more details
The distinguishing feature of a mutable value is thus that it's possible to change the value without needing to create a new variable or change the value of a variable. For example, a list is a mutable value, and we can append to it or pop from it which changes it, without changing any variables, as in this code:
orig = [1, 2, 3] nums = orig # an alias of the original nums.append(4) # value of 'nums' has changed, without writing "nums =" # value of 'orig' has also changed since both variables # store the same value
In contrast, a tuple is an immutable value, so there is no way to change it: we cannot append or pop, and to "add to" a tuple, we need to construct an entirely new, longer tuple based on the original one. We could do:
orig = (1, 2) nums = orig # an alias of the original nums = nums + (3, 4) # 'nums' gets a new value based on old value, but # the value of 'orig' is not changed
This means that when dealing with a mutable type, it's possible to use any alias to change the value, but for an immutable type, even if an alias is created, there's no way to use that alias to change the value stored in another variable, and this limitation is one of the primary advantages of immutable types because it prevents certain types of subtle errors.
In Python, tuples and strings are immutable, along with integers and floating-point numbers. Other data types like lists and dictionaries are mutable.
Most immutable types are hashable whereas most mutable types are not.
One final wrinkle: for immutable types that can contain other values, like tuples, it's possible to store a mutable value inside them, such that the entire value cannot be changed at the top level, but parts within it can be. In this case, the entire value will no longer be hashable.
- Nested
-
Two things are nested if one is inside the other.
This can apply to data as well as code. For example, one conditional could be nested inside another, or one list could be nested inside another list, or a conditional could be nested inside of a loop.
The term is usually used when two of the same thing appear, one inside the other, but that's not necessary for it to apply.
Click here for more details
An example of a nested conditional:
if x <dt id="range">Range</dt> <dd markdown="1"> A [sequence](#sequence) of [integers](#integer) [constructed](#construct_2) using [the built-in `range` function](quickref#range). </dd> <dt id="sequence">Sequence</dt> <dd markdown="1"> Any kind of [value](#value) that can be viewed as a series of individual parts in some order. [Strings](#string_2), [lists](#list), and [tuples](#tuple) are all **sequences**. [Indexing](#indexing) requires a **sequence** to operate on, but works the same way on any kind of **sequence**. Different kinds of **sequences** are composed of different individual [elements](#element): tuples and lists can have any kind of value stored in each spot, while the parts of a string are themselves single-letter strings. Every **sequence** is a [collection](#collection) but not every collection is a sequence (unordered collections are not sequences). </dd>
- String
-
In Python, a string is a kind of sequence and so it can be indexed. In a somewhat weird twist, the elements of a string are also strings: they're single-letter strings. For example:
var = 'word' print(var[2]) # prints 'r' (counts from 0) oneLetter = var[2] print(oneLetter) # also prints 'r' print(oneLetter[0]) # also prints 'r' print(oneLetter[0][0][0][0][0]) # also prints 'r'
Since after you extract a character from a string, it still counts as a string, you can technically extract the "first letter" as many times as you want and you'll still have the same string.
Python strings use an encoding called Unicode, so each element can represent any character that's part of the Unicode standard, including symbols from many different languages as well as things like mathematical symbols and emojis. However, Python strings don't contain formatting information, so you can't specify a font or make them bold or italic.
- Tuple
-
An immutable data type that holds multiple values in a fixed order.
Works a lot like a list, except that you cannot add, remove, or change the values in it. A tuple is defined using parentheses with commas to separate the values, or in some cases, commas alone can be used. Some examples:
(1, 2, 3)
is a tuple with three integers in it.1, 2, 3
also defines the same tuple.('hi', 'bye')
is a tuple with two strings in it.([1, 2, 3], [4, 5, 6])
is a tuple containing two lists. Because lists are mutable, this tuple won't be hashable.(,)
is a zero-element tuple.(1,)
is a single-element tuple. The comma is necessary, because otherwise the parentheses are treated as redundant grouping parentheses.
- While Loop
-
A loop using the
while
keyword which iterates its loop body repeatedly as long as its loop condition continues to be truthy.The condition is only evaluated once before each iteration of the loop, not continuously as the loop body executes, so even if it becomes false during an iteration, that iteration will continue to the end of the loop body. For example:
x = 2 while x
Memory Model
- Alias
-
Two variables are called aliases of one another when they both refer to the same object.
In this situation, they will always be equivalent, since they are actually identical. Using the
id
function on either variable will yield the same result.Contrast with clones.
- Clone
- Copy
-
Two objects are clones of each other if they are equivalent and indistinguishable. If we inspect their values by printing them, the situation may seem the same as with aliases. However, because they are really two different objects, they'll have different IDs, and if they are mutable and we modify one, they will no longer be equivalent.
- ID
- Identity
-
Each object has an identity which by default is just the memory location at which it is stored. The
id
function returns the identity of any object.Because two expressions that refer to the same object will have the same ID, whereas two expressions that refer to different objects will have different IDs, the
id
function can be used to determine whether two equivalent expressions are clones or aliases. - Instance
-
One specific value, relative to a type.
Each object is an instance.
For example, if we write
x = 1
then the value1
which is stored in the variablex
is an instance of the integer type. Note that the object/value is an instance, not the variable.Click here for more details
If we write:
x = 1 y = x
Then the two variables
x
andy
are each holding the same instance. For mutable types like lists, the notion of an instance is more important, since you can have two equivalent instances which are not the same object. If we write:nums = [1, 2, 3] nums2 = [1, 2, 3] nums3 = nums
then in this case we have two list instances, each of which contains the numbers 1, 2, and 3 in order. These two instances are equivalent to each other, so they count as clones. We also have three variables. Variable
nums3
is an alias ofnums
(and vice versa). - Object
-
A value which is stored in memory along with extra bookkeeping information. In Python, all values are objects. Other programming languages make a distinction between objects and primitive types, but Python does not have this distinction. A class defines a new type which can then be used to construct new, custom objects.
Click here for more details
If we say that two things are "the same object" then we mean that they are stored in the same place in memory and thus have the same ID. For mutable objects, since they are aliases of each other, any change made via one reference will also be visible via the other reference.
In contrast, if we say that two things are "the same value" then we mean that they are equivalent which does not distinguish whether they are clones or aliases of each other. For example:
x = [1, 2] y = [1, 2] z = x
In this example we have two list objects, each with two slots containing integers. Because of how Python handles integers, there are actually only two integer objects, which are shared between the two lists. Since integers are not mutable, this doesn't matter much. We can say that
x
andy
have "the same value" but they are not "the same object." On the other hand,x
andz
are "the same object" and are aliases of each other.x
andz
of course also have "the same value." - Refer
- Reference
-
An variable refers to the object that it contains.
When we say that a variable "holds" or "contains" a particular object, those are other ways of saying it refers to that object.
Variable reference has a different, unrelated meaning: each variable reference is one place in your code that the name of a variable is written. Because there can be both local and non-local variables with the same name, the same written letters in code can refer to different variables depending on their context.
- Variable
-
Variables actually refer to objects: specific places in memory where values are stored. This distinction is important in situations where cloning occurs: two different variables can refer to two different equivalent objects, so that they have "the same value" but refer to different objects.
Dictionaries and Sets
- Collection
-
The keys and values of a dictionary are collections. The items in a set are a collection.
- Hash
- Hashable
-
A hash is an integer that is computed based on some more complicated value such that you get the same integer every time for the same value, but different values are likely to yield different integers. Python's built-in
hash
function can give you a hash from any hashable value.A value is hashable if a hash can be created from it. Values that are mutable are usually not hashable, because if we got a hash and then changed the value, the hash would be invalid. To avoid that situation, Python disallows the use of the
hash
function on mutable values, making them unhashable. Even if a value has an immutable type, it may still not be hashable if it contains another value that is mutable. So a tuple containing integers or strings would be hashable, since those types are immutable (and thus hashable themselves) but a tuple that contained a list would not be hashable since lists are mutable and not hashable. Technically, it's possible to create a custom type that's both mutable and hashable, but none of Python's built-in types are like that.Hash values are used to speed up comparisons between complex values. If two values have different hashes, we know they cannot be the same as each other. If they have the same hash, then we need to double-check that they're really equivalent, since it's possible for two different values to have the same hash, but since this is unlikely, we can save a lot of time by using hashes for comparisons. This idea is part of the reason that dictionary lookup can be so fast.
- Key
-
A hashable value used in a dictionary to keep track of an associated value so that we can later look up the associated value using the key.
For example, if we wanted to store a record of how many points were scored by different players on a sports team, we could use players' names as keys and integers as the associated values.
Anything used as a key in a dictionary must be hashable.
- Lookup
-
The process of using one piece of information, called a key to find an associated piece of information (called that key's value. Can also refer to searching for a value that matches some criteria within a collection of values.
- Dictionary
-
A mutable data type that can associate keys with values, and quickly look up a value based on a key.
When keeping track of associations, a dictionary is more efficient than a list, because to look up something in a list requires iterating through it.
The keys used in a dictionary must be hashable, so tuples are often used when complex keys are needed. A dictionary can only associated a single key with one value at a time. However, it's possible to store any kind of value under a key, so to store "multiple values" we can store a single list and then put more than one value inside that list.
Dictionaries are declared using curly braces, with colons separating keys from values and commas separating key-value pairs from each other.
Some examples:
scores = {'abc': 3, 'xyz': 4, 'tlmn': 12} # string keys, integer values mixed = {1: 'hi', 's': 4.5, (4, 5): ['one', 'two'] } # This dictionary has lots of different kinds of keys and values abcScore = scores['abc'] # look up value for an existing key; value is 3 scores['newkey'] = 5 # Brackets on right of = sets value for key scores['ab'] = 8 # Same syntax to update exiting key's value.
See this quickref section on dictionary methods for things you can do with a dictionary besides using square brackets to add, change, or look up values.
- Difference
- Symmetric Difference
-
How much two things differ, or how far it is between them. For example, the difference between 5 and 3 is 2.
When used with sets, there are two kinds: normal set difference and symmetric difference. Set difference refers to all elements that are in one set but not in another. This notion isn't symmetric, since the set difference between A and B doesn't include elements in B that aren't in A, and vice versa. Symmetric difference of sets is all elements that are in either set but not in both, and as its name indicates, it's symmetric.
The
-
operator is used to compute differences for both sets and other types; the^
operator is used for symmetric set differences.For example, if we have sets
A = {1, 2, 3}
andB = {2, 3, 4}
then the differenceA - B
is the set{1}
, the differenceB - A
is the set{4}
, and the symmetric differenceA ^ B
(equivalent toB ^ A
) is the set{1, 4}
. - Intersection
-
The overlap between two sets. This is a set that has all of the elements that appear in common between two sets, but which does not include any elements that are only in one of the two. The
&
operator is used to get the intersection of two sets.For example, if we have sets
A = {1, 2, 3}
andB = {2, 3, 4}
then the intersectionA & B
is the set{2, 3}
.Contrast with union and difference.
- Set
-
A collection of hashable values.
Unlike a list, The values in a set are not ordered, so a set is NOT a sequence, and you cannot index it to retrieve an element. Sets are useful in cases where order doesn't matter and the
in
operator operator is used, as they can return an answer faster than a list. Sets cannot have repeated elements: adding the same element to a set more than once has no effect.A set is constructed using curly braces, without the use of colons to separate keys and values (which would be a dictionary instead). For example:
nums = { 1, 2, 3 } nums.add(4) # add new element nums.add(3) # already in set; no effect print(4 in nums) # True print(5 in nums) # False empty = set() # Can't use '{}' for an empty set
Sets can be combined using a number of methods/operators like
|
,&
and-
. - Union
-
The combination of two sets. This is a set that has all of the elements that appear in either of two sets. The
|
operator is used to get the union of two sets.For example, if we have sets
A = {1, 2, 3}
andB = {2, 3, 4}
then the unionA | B
is the set{1, 2, 3, 4}
.Contrast with intersection and difference.
- Value
-
We also use the term "value" to refer to an item stored in a dictionary associated with a particular key. Since those items are themselves values in the original sense, this isn't that confusing, although technically, the keys of a dictionary are themselves also values in the original sense, which does make the overlap tricky to think about.
Files and Directories
- Absolute Path
-
A path that starts from the root of the file system.
For example, on Windows,
"C:\Users"
is an absolute path because it starts with "C:" which identifies the drive to look in. On MacOS,"/home"
would be an absolute path, because it starts with/
which is the root of the file system. - Ancestor
-
A node In a tree which is somewhere above another node, called its descendant.
- Child
-
A node in a tree which is a direct descendant of another node, called its parent.
In a tree, each node may have zero or more children, but each child node has exactly one parent.
Since a file system is a kind of tree, this term (or more explicitly child directory) may be used to refer to a subdirectory.
- Close
-
Ends a file object's access to a file, forcing any pending data for that file to be written.
If a file has not been closed yet, pending data may not yet be in the file, and if the program crashes, this may lead to file corruption, so it's important to eventually close every file you open. Using a
with
statement can ensure that this will happen automatically. - Context Manager
-
An object that manages the context of a with statement.
Python will automatically call certain methods of this object when exiting the
with
block. For example, when a file object is used as a context manager for awith
block, it will automatically be closed when control flow exits that block. - Current Directory
- Current Working Directory
-
The directory that Python is working in right now.
By default, any files you create will be created in this directory, and if you attempt to import any modules or open any existing files, they must be in this directory in order to be loaded (although built-in modules and modules installed via the package manager can be imported from elsewhere).
The
os.getcwd
function returns the path to this directory. At the start of a path string, a single period refers to this directory, so for example the path"./a/b.txt"
refers to a file named "b.txt" that's in a subdirectory named "a" which is in the current directory. - Data
-
Aggregate term for information, especially when it's stored in a computer.
Python objects/values are highly structured information that we can act on using operators and functions to produce other values, but the term data refers to any kind of information. When loading data from a file, Python can't break it down into objects automatically, so it gets loaded as one big string instead. If you know the format of the data, you can write code that will break apart the string into objects, or in some cases, like with CSV or JSON data, Python has built-in functions that can do that for you.
- Descendant
-
A node that's below another node in a tree.
A direct descendant (also called a child) is directly connected to its parent node; an indirect descendant is connected via at least one other intermediary node.
If node A is a descendant of node B, then we can also say that node B is an ancestor of node A.
- Directory
- Folder
-
A named collection of files and/or sub-directories which is part of the file tree.
This is an operating system concept, not a Python type. Each file system has a root directory, and then because directories can contain other directories, the file system forms a tree structure.
- File
-
Files are stored in a file system and organized within directories into a file tree.
- File Input/Ouptut
-
Operations that read from or write to one or more files.
The built-in
open
function is used to create a file object in either "read mode" or "write mode" that can then be used to either read or write data from/to a file. Unless using a special parser, data read from a file is read as a string, and data written to a file must be a string. We can therefore think of what's stored in a file as one long string, to be re-interpreted by a parser (and/or constructed by a formatter) if necessary. - File Object
-
A special object returned by the
open
function which can be used to read from or write to a file.A file object is not synonymous with the file that it reads from/writes to. For example, one can have multiple file objects that read from or write to the same file at once. File objects provide methods like .read and .write which cause data to be read from or written to the underlying file. Note that data may not actually be written to the file until the file object is closed using the
.close
method; this is called "buffering." Code will typically do something like:fileObject = open('myFile.txt', 'r') # create file object in read mode fileContentsString = fileObject.read() # read from the file; store data in Python variable fileObject.close() # file object no longer needed print(fileContentsString) # use data read from file
File objects can be used with
with
statements to ensure they'll be closed properly when no longer needed. - File System
-
An organized collection of files stored in a computer, typically in a partition on a disk/drive.
Files in a file system are normally organized using directories into a file tree.
- File Tree
-
A collection of files organized in directories starting from a root directory.
Each file belongs to a particular directory, and each directory (except the root directory) belongs in exactly one other directory, so the whole structure has the form of a tree, where the parent of each file is a directory, and each directory has a single parent directory until you get to the root directory.
This structure mirrors the organization of real-life paper files within folders in a filing cabinet, and by grouping files together conceptually, can help people remember where to find the files they're looking for. Folder names could refer to the type of files they contain (e.g., 'documents'), the origin of those files (e.g., 'downloads'), when/where those files were created (e.g., '2015_trip'), etc. At each layer of the tree, you only have to be able to pick the correct folder at that layer to find your file, so even if you don't remember everything about the file you're looking for, a file tree can help you find it by reminding you of the different sub-categories relevant to that file.
- Leaf
-
Within a tree, a node that has no descendants.
Unless a tree consists of just a single node (which would be a leaf), each leaf in a tree has a parent node.
- Node
-
An individual element within a larger collection, particularly one that's organized into a graph or tree.
Because files and directories are organized into a file tree, they both count as nodes.
In a tree, nodes are distinguished as either leaf nodes if they have no descendants or branch nodes if they have at least one descendant.
- Open
-
The process of establishing a connection to a file so that data can either be read from or written to it.
The built-in
open
function does this and returns a file object that can be used to read or write data, depending on whether the file was opened in read mode or write mode.Whenever a file is opened, it should eventually be closed. Using a
with
statement can ensure that this happens automatically. - Parent
-
A node in a tree which has at least one child node.
A node which isn't a parent is a leaf. A parent node B (of node C) may also be a child node (of another node A).
Since a file system is a kind of tree, this term (or more explicitly parent directory) may be used to the directory that contains a particular file or sub-directory.
- Path
- File Path
-
A string which contains directory names separated by either
/
on Mac/Linux or\
on Windows, possibly with a file name at the end.Used to specify a file or directory, either relative to the current directory, or as an absolute path starting from the root directory of the file system.
The special string
'..'
means "the parent of the current/previous directory" and can be used multiple times to refer to farther ancestor directories. So for example, the path:../../homework/cs111/assignments.txt
refers to a file named "assignments.txt" which is located in a directory named "cs111" which is in the "homework" directory that's in the grandparent of the current directory.
- Read (file operation)
-
Take data stored in a file and copy it into Python as a value that can be used within a program.
Since reading from a file makes a copy of the data, changing the variable where you store the result of the read does not change what's in the file.
In Python, using the default mechanisms for reading like the
read
method of a file object automatically decode data stored in the file into a string, using a default encoding like UTF-8. - Relative Path
-
A path which does not start at the root of the file system. Any path which is not an absolute path is a relative path.
- Root
- Root Directory
-
The root of a tree is the only node in that tree which does not have a parent.
The outermost directory in a file system is the root of that file tree and so it is often called the root dirctory.
- Subdirectory
- Sub-Directory
-
A directory that is contained within another outer directory. Also sometimes called a child directory.
The directory that contains it is called a parent directory.
- Tree
-
An graph in which each each node has a single parent node, except for a single root node that doesn't have a parent. Since multiple nodes might share a parent, the whole thing forms a tree-like structure.
Usually drawn upside-down relative to real-life trees, with the root at the top and the leaves at the bottom.
-
with
Statement -
A statement using the
with
keyword that sets up a context manager to ensure that certain cleanup steps happen when leaving the associated block of code.When a file object is used as a context manager, this ensures that the file will eventually be closed, even in cases where control flow leaves the block via a
return
statement or an exception.The context manager is assigned to a variable as part of the
with
statement, so that it can be used within the statement body. For example:with open('names.txt', 'r') as fileReader: for line in fileReader: print(line)
- Write (file operation)
-
Take data stored in a Python value and copy it into a file, possibly overwriting old contents of the file.
Because of buffering (see file object), the data may not immediately be written into the file in some cases, and it's safest to close the file object you used to write the data first.
By default, Python requires you to use string data when writing to a file, and encodes that data using a default encoding like UTF-8.
Recursion
- BaseCase
- RecursiveCase
-
A base case is a case in a recursive function where no recursive call gets made.
A recursive case is a case where there is at least one recursive call.
Recursive functions typically start with a conditional which allows them to respond differently for different argument values. For some values, they will make an recursive call and thus revisit the conditional in another function call frame with different arguments, while for other values, they will not need to do this and can just do the necessary work immediately without recursion. The cases where recursive calls are needed are called recursive cases and the ones where they aren't needed are called base cases. If a function has only recursive cases, then it will always create new recursive function call frames, and eventually, Python will run out of memory for function call frames, resulting in a
RecursionError
. This can also happen even if there are one or more base cases, if the pattern of arguments established by the recursive cases does not ever reach any base case.The code for the base case(s) is usually relatively simple, and it's often good to start a recursive definition by defining one or more base cases. Often, having more than one base case can prove helpful, although many problems only need one.
The case-by-case method for solving recursive problems begins by defining multiple base cases which help you see what pattern of recursive cases will be necessary, and then simplifying by removing unnecessary base cases once a suitable recursive case has been defined.
- Fractal
-
A self-similar structure, meaning that one or more smaller part(s) of the structure are reminiscent of the whole.
This self-similarity can persist at multiple (or even theoretically infinite) scales.
Many natural structures are fractal, for example trees: The entire tree is made up of a trunk that splits into branches, but those branches also split into smaller branches, which split into twigs, etc. A zoomed-in view of a small branch that splits into 3 twigs is similar to a zoomed out view of a whole tree that splits into 3 branches, even though they're not exactly the same.
When we call a recursive function, the resulting process is fractal, in the sense that parts of the process are reminiscent of the whole process. For this reason, recursive functions are often used to draw fractals.
- Infinite Recursion
-
When a recursive function either does not have a base case, or it has a base case which is never reached, theoretically function call frames will keep generating more of themselves indefinitely.
Python actually has a limit on the number of function call frames allowed at once, and you will get a
RecursionError
if you exceed these limits, which any infinitely recursive function will do. Some programming languages are able to analyze the correct result of an infinitely recursive function in some cases, but Python does not support this, so we need to avoid infinite recursion when programming in python. - Recursion
- Recursive
-
A repetitive process where one step in a larger process has the same general sequence as the larger process it is a part of.
In Python, recursion can be achieved by defining a custom function which includes a function call to itself. We call such function calls (which occur within the definition of the function being called) recursive calls, and the entire function is called a recursive function. Take the following count-down function as an example:
def countDown(n): if n == 0: # base case print('Blast off!') else: # recursive case print(n) countDown(n - 1)
Whenever this function is called, unless the argument is a 0, it will print whatever that argument is. But then after doing that, it will call itself again, with a smaller value of
n
. So if we callcountDown(3)
that will print 3, then it will callcountDown(2)
, which will print 2, then callcountDown(1)
, which will print 1, then callcountDown(0)
, which will finally print "Blast off!" and finish without calling itself any more. The total effect is to print:3 2 1 Blast off!
Click here for more details
Note: The first branch of the
if
/else
statement in this function is called a base case because it does not contain a recursive function call, while the second branch that does contain a recursive call is called a recursive case.We could have achieved the same result via a loop, like this:
def countDownLoop(n): for i in range(n, 0, -1): print(n) print('Blast off!')
We can think of the process in two different ways. The iterative way of describing the process is: "To count down from N, print each number from N down to 1, and then print 'Blast off!'". The recursive way of describing the process is: "To count down from N, first print N, then count down from N - 1. If N is 0, instead print 'Blast off!'".
A recursive process is a kind of fractal since it is self-similar.
To avoid infinite recursion, we must ensure that every recursive function has a base case, and that the pattern of arguments generated by recursive calls will eventually reach that base case.
A call to a recursive function generates a nested series of function call frames, since the initial call's frame stays active while its recursive call is being resolved, and that one stays active while its recursive call is being resolved, and so on, until the frame which enters the base case. Once that frame returns, control flow returns to the penultimate frame, and so on back through the entire series of frames.
Data Formats
- Bit
-
The smallest unit of computer memory and thus also data, holding a single 1 or 0.
On a disk, memory is usually organized into bytes, and it's usually not possible to read or write individual bits. Grouped together, bits can represent numbers, letters, or more complex values via an encoding scheme.
- Byte
-
A group of eight bits.
On some historical systems bytes were sometimes made up of more or fewer bits, but eight has emerged as a nearly universal modern standard.
One byte is enough to encode a single letter in the extended ASCII encoding
- CSV
- TSV
- Comma-Separated Values
- Tab-Separated Values
-
Comma-separated values (or CSV) is a file format for data organized into tables, in which entries that should be in different columns are separated by commas, while entries in different rows are on separate lines.
Because different entries may have different widths, you can't necessarily see things lined up within columns in a CSV file, but common programs like Microsoft Excel, Google Sheets, and Numbers on a Mac can all load CSV files and display their entries as tables.
In general, all entries are assumed to be strings, but some programs will load entries that can be converted into numbers as integers and/or floating-point numbers.
Interesting edge cases include:
- How can we encode an entry that includes a comma as part of its text?
- How can we encode an entry that includes a newline as part of its text?
In both cases, if an entry starts with a double-quote right after the previous comma (or at the start of a line) then that entry continues until the next double quote, even if that includes commas and or newlines. This begets yet another question:
- How can we encode a value which includes a comma, a newline, and a double-quote as part of its text?
To solve this problem, when an entry starts (and thus also ends) with a double-quote, then within that entry two double-quotes in a row will be interpreted as a single double quote that's part of that entry's text, rather than as the end of the entry.
CSV is a common format for tabular data, because it can be read and written by many different programs, and because it can viewed or be edited in a normal text editor if necessary, plus since it's just text data in a specific arrangement, it can even be copy-pasted via the clipboard. In Python, the built-in
csv
module allows you to easily read and write CSV files.The related tab-separated values (or TSV) format uses tabs instead of commas to separate entries, but is otherwise pretty much the same. An advantage is that tab characters appear in table entries much less frequently than commas, and so fewer entries need to be quoted on average. The fact that the entry separators aren't immediately visible to users viewing the file in a text editor is a disadvantage though.
- Data
-
Technically, all digital data is composed of bits, usually grouped into bytes. In Python, a byte string can be used to represent this low-level view of data.
- Decode
- Encode
- Formatting
- Parsing
-
Decoding is the process of taking encoded data and turning it back into one or more objects according to the rules of a specific format (also called an "encoding scheme").
Encoding is the reverse process of arranging one or more objects into data.
Parsing is another term for decoding, usually used for more abstract or complex decoding, for example turning a string containing statements in a programming language into a program.
Formatting is another term for encoding. Like parsing, it's usually used for more abstract/complex encoding processes.
Usually the result of encoding is a byte string that specifies a specific sequence of bytes that can be stored in a file, although in the abstract we can talk about encoding from some more complex object (for example, a tree stored as a dictionary) into some simpler object (like a string or a list) that's still more complex than raw bytes.
- Decoder
- Parser
-
A program that can convert one particular kind of data into a specific arrangement of objects. Python has built-in parsers available for common data formats like CSV and JSON.
"Decoder" is more often used when converting low-level data or when the data is has a simpler structure, like when converting from bytes to a string, as opposed to converting from a string in a certain format to a more complex group of objects like a list of dictionaries (which would more likely be called a "parser").
- Encoder
- Formatter
-
The opposite of a parser: takes a specific arrangement of objects and turns it into formatted data, which a matching parser could turn back into equivalent objects.
"Encoder" is more often used for lower-level formatters, like one that converts a string into bytes. In contrast, a function that converts a dictionary into a string is more likely to be called a formatter than an encoder.
- Encoding
- Encoding Scheme
- Format
-
A convention for how data should be decoded into objects.
A format/encoding is sometimes also called an encoding scheme.
When data is stored in a file, it is usually organized according to a known format, so that when we load the data from the file, we can convert it back from raw data into objects. Well-known formats like CSV or JSON have built-in functions for parsing them, but even the default process of loading file data as a string involves decoding it from the bytes stored in the file.
Technically, the Python programming language itself is a format, and when you run a Python program, it gets parsed into a series of statements before those statements get executed.
How individual bits/bytes are interpreted as values like integers or strings is also a kind of format/encoding. Strings in particular have multiple possible encodings like ASCII and Unicode (Python uses the UTF-8 Unicode encoding for strings by default).
- JSON
- Javascript Object Notation
-
A format for storing several types of common programming values in a file or string based on the Javascript programming language.
Javascript and Python use very similar syntax, so the JSON format also looks a bit like Python code. One difference is that JSON only allows for values to be stored, not expressions or other statements.
JSON supports the following types:
None
, encoded asnull
.- Booleans, encoded as
true
andfalse
. - Integers and floating-point numbers encoded as
usual in Python (for example,
5
or3.14
). - Strings, using double quotes only.
- Lists of values, encoded using square brackets and commas as usual in Python.
- Dictionaries, where the keys may only be strings, encoded using curly braces, colons, and commas as usual in Python.
Neither lists nor dictionaries allow an extra comma after the last element, although Python does allow that. No other types are allowed, although it's common to create a system for turning other types into a combination of allowed types and back again for storage as JSON. Notably, tuples and sets cannot be stored directly as JSON, neither can dictionaries that include non-string keys.
The built-in
json
module makes it easy to load values from JSON or store them as JSON. Although JSON has its limitations, since it can store complex objects directly into a file and load them again later with minimal fuss, it's a convenient choice for storing values in a more permanent form. Note that when we store an object in a JSON file, we're making a copy of the data, and the same is true when loading an object from a JSON file. If you store an object and then load it, you'll have a copy of the original, but it won't be the same object. This also means that it's not possible to store objects that contain aliases and have them retain their original structure.
Errors, Testing, and Debugging
- Bisect (debugging)
-
A debugging technique where you first comment out one half of the code and then the other half, in order to locate where an error originates.
Once you know which half of the code the error is in, you can repeat the process with the two quarters that make up that half, and so on, allowing you to quickly narrow down the exact location of an error.
Of course, commenting out half your code is likely to cause more problems than the one that you're trying to locate, so you have to be careful about interpreting the errors you get. Usually it's faster to use targeted probes to establish where an error is occurring, but in extreme cases where that's not working, bisecting the code is an option.
You can also use this with data structures instead of code. For example, suppose your code produces a list with 1 million elements, and then calls an external library function to process the list, but the library function indicates the list is invalid without telling you which element is the problem. Printing out the list is impractical here, but you could instead try to use slicing to send only the first 500,000 elements, then if that has no error, the second 500,000, and whichever of those has an error, start narrowing your slice ranges by halves until you have few enough elements that it's reasonable to print them out and look at them.
- Exception
-
When an exception occurs, Python starts terminating function call frames one by one, starting with the innermost frame, to generate a traceback. If it finds that the exception occurred inside of a
try
block, then instead of crashing, the program will continue from after the end of thetry
block, possibly executing any matchingexcept
blocks and/or afinally
block if one is present. This means that if you know your code is going to generate an exception under certain conditions, you can customize the response to that exception instead of having the program crash. - Probe (debugging)
-
To add a statement using the
print
function to a program in order to display the value of an important variable or expression as it runs, in order to help debug it.Sometimes when a program crashes the traceback will contain enough information to figure out what the error is and fix it. However, if there is an error which does not result in a crash, and/or if the information from the traceback is confusing or insufficient, using a probe to help figure out more information is usually a good next step. For example, consider this program:
def f(a, b): return a + b x = 5 y = 'hello' print(f(x, y))
If we run it, we get the following traceback:
Traceback (most recent call last): File "probe.py", line 7, in <module> print(f(x, y)) File "probe.py", line 2, in f return a + b TypeError: unsupported operand type(s) for +: 'int' and 'str'
Although the traceback lets us know that the
a + b
operation failed on line 2 in thef
function, that's not really the cause of the error, as there's nothing wrong with thef
function (other than its mysterious name, poorly named parameters, and lack of documentation). Instead, the issue is that we've tried to mix together two different types of values, but that's due to the particular values ofx
andy
which aren't shown in the traceback itself. Of course, this simple example is probably one where we could figure things out quickly from looking at line 7 and then above it, but imagine if the definitions ofx
andy
were even farther away from the point of failure. Perhaps they got their values from complex functions that we didn't fully understand...In order to start figuring out where the real issue lies, given this traceback, it might make sense to probe the values of
x
andy
just before line 7, which we could do by adding aprint
:def f(a, b): return a + b x = 5 y = 'hello' print('XY', x, y) print(f(x, y))
In this example we've included a tag in the probe so that in the output, we'll know to look for the line that starts with 'XY'.
Normally, adding a probe will give you some extra information about an error, but you may want to add more probes or try other debugging techniques to get a better understanding of an error. If you add a probe but don't see the output from it when you run the program again, there are several things to check for:
- Is it possible you've saved multiple versions of a file, and the one that you're editing is not the same as the one that you're running?
- Did you add the probe at a point in the code after a crash happens?
- Did you put your probe in a function, [conditional](#conditional], or loop? In that case, maybe that part of the code isn't actually running at all. Try putting another probe earlier in the code, like at the top of the same function or top of the whole file.
If you end up adding more probes throughout your code, you may effectively be tracing the code.
- Regression (testing)
-
A situation where an error that occurred earlier in development was fixed, but new changes to the code cause the same error to occur again.
A good suite of test cases helps detect regressions quickly so that harmful changes can be undone before the code is released as a new version and users encounter the error.
- Tag (debugging)
-
A short string that is added into a probe or other debugging print statement, to identify that particular statement within the program's output.
For example, a probe might look like:
print('X', x)
The string
'X'
here is the tag. If instead we didn't use a tag, we could do this:print(x)
However, without the tag, as we add more prints or other prints that were already in the program run, it may be hard to tell which lines of output come from the probe and which are other stuff, especially if the value of x is not what we were expecting, or is a hard-to-notice value like an empty string. The tag helps avoid this problem since it will be very clear which line of output was produced by the probe.
- Test
-
A piece of code designed to check whether some other code works or not.
Writing tests can help us avoid errors, and it's also a good idea to come up with some tests early on in the design process (this is called test-driven development).
When debugging code, it can be helpful to isolate the behavior of one function at a time, and create new tests that express what it should do, so that you can be sure that an error isn't happening in one part of the code. They also help avoid (or at least quickly detect) regression.
- Test-Driven Development
-
A software development strategy that involves writing tests and documentation before the code that will satisfy those tests.
For some kinds of programming, it's often easy to think of concrete examples of what the finished product will be able to do, and laying out a few examples to work from often helps scaffold the work of writing the code that will satisfy those examples.
Test-driven development aims to help you think about edge cases early, as well as nailing down what kinds of parameters your code will accept and what kinds of return values your code will produce. This in turn helps you avoid dead-end design choices and write code faster, plus you'll be able to debug it faster with a ready-to-go suite of tests as soon as it's complete.
It's generally a more challenging approach when it's not easy to specify these things (for example, if you're building a game with lots of randomness, the same inputs don't always produce a predictable output).
- Toggle (debugging)
-
Temporarily disabling (or re-enabling) a line of code for debugging purposes by adding a hash sign to turn it into a comment.
This technique is very useful when you're not quite sure exactly what is going on, and/or when there's a line of code that other techniques like probing indicate may be problematic but you're not sure why/how. Trying to just completely disable a line of code and seeing what will happen when you do that can generate an informative traceback or, even in the case that nothing changes, point towards some other part of the code as being the place where the error is located.
In the extreme case, we can toggle many lines of code at once to hunt down an error, which is called bisecting.
- Trace
-
A sequential record of events that happen as part of a larger process.
It's also the name of a debugging technique where you add
print
calls to your code so that it prints out a trace of its process as it runs. In a loop or recursive function, even adding a single statement that usesprint
can produce a trace since that line of code may be executed repeatedly. You can print different kinds of relevant things as part of a trace, such as the values of key variables or just messages to indicate which branch of a conditional was taken (see probing). -
try
Statement -
except
Statement -
finally
Statement -
A
try
statement just consists of the keywordtry
followed by a colon, and then an indented block. If any code within the block generates an exception, instead of causing the program to crash, the code will skip to the end of thetry
block and continue executing code outside the block as if the exception had not occurred, as long as there is at least oneexcept
statement that matches the type of the exception that was generated. If none of theexcept
statements match an exception, that exception continues to terminate function call frames until it finds another outertry
block or the program crashes.If the
try
statement has afinally
block, the code in that block is executed no matter what: If no exception occurs, it gets executed when control flow reaches the end of thetry
block; if an exception does occur and gets matched, thefinally
block is executed after any matchingexcept
blocks, and if an exception does occur but is not matched, thefinally
block gets executed before the exception terminates the current function call frame.finally
blocks are thus useful for ensuring that critical code gets run even if something unexpected happens.Each
except
block uses the keywordexcept
followed by one or more exception types. Any exception of one of the listed types will match that block, causing the code in that block to run. Except blocks are checked in order, and only the first matching block gets run.For example:
try: x = 3 y = x / 0 # causes a ZeroDivisionError except ValueError: print('nope') # doesn't get run except ZeroDivisionError: print('yup') # this block matches except Exception: # would match any kind of exception print('nope') # doesn't run because block above it matched first finally: print('done') # runs at the end, even if there's no exception
- Unpack (debugging)
-
To take a complicated expression and break it down into multiple smaller expressions by storing intermediate results in variables.
Not directly related to unpacking (tuples).
For example, if we have a complex expression like this:
x = (128 * y + int(z)) / 34
We could unpack that into:
yVal = 128 * y numerator = yVal + int(z) x = numerator / 34
Unpacking has several benefits:
- If an error is occurring on a complicated line of code, after unpacking it may be easier to see which specific function call or operation is causing the error.
- It offers more opportunities for probing or tracing intermediate values.
- If sub-expressions are being re-used in several places, computing them once and storing them in a variable can help isolate the cause of an error and also helps make the code easier to read and maintain according to the DRY principle.
Program Design Concepts
- Descriptive Naming
-
Picking variable names that describe what kind of values they will hold.
For example, calling something
numberOfCopies
instead ofx
. This has a huge impact on the readability of your code and helps make debugging faster. If you find yourself struggling to think of a good descriptive name for a variable, this can be a sign that you need to stop and think more about what you intend to use that variable for, and/or what process is producing its value. - DRY
- Don't Repeat Yourself
-
A design principle that states that repetition of code should be minimized.
In general, it's good to avoid copy-pasting code, and to use intermediate variables whenever multiple expressions share identical sub-parts. If you find yourself typing similar lines of code again and again, consider bundling them into a function and using parameters to allow a single function to accomplish multiple similar sub-steps shared by different processes.
The benefits of following the DRY principle are:
- Your code will be easier to debug, as errors will be closer to the code that caused them, and you won't have to fix the same error in multiple places throughout your code.
- Your code will be easier to read, since it will be shorter, and it will be broken into smaller pieces that each have a simple purpose, and there won't be sections that repeat the same functionality.
- Your code will be easier to test, since you'll be able to test and verify shared functionality in a single location.
Extra: Custom Types
(We don't expect you to know these terms as part of this class.)
- Class
-
A statement using the
class
keyword that defines a custom type; also refers to custom types defined this way and to built-in types.For example if you run
type(4)
you'll see the text<class>
becauseint
is a class. In this sense class is interchangeable with type.Within a
class
statement, an indented block defines the fields and methods of the custom type. A special method named__init__
defines how new objects of that type can be created. Just like a function, a class should have a docstring that explains its purpose and something about how it works, although each method should also have its own docstring.__str__
is another special method which will be called whenever an object of the class needs to be converted into a string (for example, by the built-instr
function).Here's an example:
class Pair: """ A pair of numbers to be displayed with a custom separator between them. The `between` field defines the separator that will be used. """ between = "-" """ This field defines the separator to use between the numbers. It can be changed with the `setSeparator` method. """ def __init__(self, a, b): """ Sets up the pair using `a` and `b` as the numbers. """ self.a = a self.b = b def __str__(self): """ Formats this pair as a string, using the `between` field as the separator. """ return str(self.a) + self.between + str(self.b) def setSeparator(self, separator): """ Sets up a new custom separator string for this pair. """ self.between = separator def add(self, other): """ Adds the `a` and `b` values from another pair to this pair's `a` and `b` values, returning a new pair. """ newA = self.a + other.a newB = self.b + other.b return Pair(newA, newB) def first(self): """ Returns the first number in the pair. """ return self.a def second(self): """ Returns the first number in the pair. """ return self.b p1 = Pair(1, 2) p2 = Pair(2, 3) p3 = p1.add(p2) p3.setSeparator('~') print(p1, p2, p3) # prints "1-2 2-3 3~5"
Notice that:
- When we use the name of the class as if it were a function by putting
parentheses with arugment
expressions after it, we create a new object and
Python automatically calls the
__init__
method to set up that object, using the fresh object asself
and the additional arguments we supplied as the rest of__init__
's arguments. So the line that saysp1 = Pair(1, 2)
will create a new object and call__init__
with the new object asself
, 1 asa
, and 2 asb
. - All of the methods use a parameter called
self
, but when we call these methods, we only provide the other arguments, not theself
argument (for example, the linep3.setSeparator('~')
only supplies one argument even though thesetSeparator
method has two parameters includingself
. Python automatically sets the object that the method is called on as theself
value when calling a method. So for that line,self
would be set to the object stored in thep3
variable. - Not all fields are declared as part of the class. In
particular, the
__init__
method (or any other code) can create a new field simply by assigning to it using a dot, as in the line that saysself.a = a
. Any field declared as part of the class (likebetween
in this case) will be added to each object of that class that gets created, but the__init__
method may add further fields, and you could even have other methods that add new fields when called. Typically, it's best to create fields only by declaring them as part of the class definition or as part of the__init__
method, and to document as part of the class docstring any fields that other code might want to use (herebetween
is documented buta
andb
are not).
- When we use the name of the class as if it were a function by putting
parentheses with arugment
expressions after it, we create a new object and
Python automatically calls the
- Construct (verb)
-
The process of creating a new instance of a class.
This can be a built-in class like
list
, or a custom class with its own__init__
method that will dictate what happens. Each time a new instance is constructed, new memory is used for the resulting object. -
__init__
-
A special method which defines what happens when constructing a new instance of a custom class.
The first parameter should always be called
self
, and will automatically be set to a newly-constructed object when this method is called. Any additional parameters require matching arguments when creating an object of the class.Rather than calling
__init__
directly, it gets called automatically when we write a class name followed by parentheses (and possibly arguments in those parentheses).Here's an example:
class Powers: """ Stores a number along with the same number to the second and third power. """ def __init__(self, base): """ You must specify the base number to use. """ self.base = base self.second = base * base self.third = base * base * base p = Powers(3) print(p.third) # prints 27
In this example, we write the class name (
Powers
) followed by(3)
to pass3
as the value of thebase
parameter when creating our new object. Theself
parameter gets filled in automatically with a new blank object that will become the result of the setup process. In this example, that setup process just creates three fields calledbase
(value 3),second
(value 9) andthird
(value 27). - Field
-
A variable that's attached to each instance of a class, possibly holding different values for each instance.
A class definition can define fields and give them default values, and additional fields can be attached to an instance in the
__init__
method. Fields can be accessed and/or updated by typing an expression for an object followed by a dot and then the field name. For example:class Example: """ An example class. """ field = 3 # field defined as part of class def __init__(self, v): """ You must provide a value. """ self.value = v # field added during setup e1 = Example(7) # create an instance e2 = Example(10) # create another instance print(e1.value) # access field value (prints 7) print(e2.value) # prints 10 print(e1.field) # prints 3 e2.field = 5 # update field value print(e2.field) # prints 5 print(e1.field) # still prints 3
When a field is defined as part of a class instead of being attached during setup in the
__init__
method, each instance of that class gets an alias of the same value for that field. This means that if you define a field with a mutable value like a list, every instance of the class will share a single list, and mutating one of them will appear to change all of them (since in reality, there's only one). For this reason, it's usually better to create fields within the__init__
method rather than as part of the class definition. - Instance
-
An object as defined relative to its class.
If we create three objects from a custom class called
Example
we would say that each of them is "an instance of the Example class." Or whenever we create aExample
new object we might say that we're "creating a new instance of the Example class."The term implies that many identical or similar instances may exist or be created, and indeed custom classes allow us to easily create many similar objects that share certain properties, but which each have their own identity and which can be modified separately from other instances.
-
A function definition that is inside of a class statement automatically becomes a
method
.Objects of the custom type defined by that class can use any of its methods. Write the object first, then a dot, and then the method name, followed by the usual parentheses and argument expressions. For a method, the first parameter should be named
self
, but you will supply one fewer argument when calling it. The specific object that appeared before the dot will automatically be provided as theself
argument.Python also supports static methods and class methods using decorators, but these are beyond the scope of this glossary.
For an example of methods in action, consider the following code:
class Counter: """ A counter that holds a number. You can add to the counter using the `count` method and use the `value` method to get the current value. """ n = 0 def count(self, num): """ Adds the given number to the counter. """ self.n = self.n + num def value(self): """ Returns the counter's current value. """ return self.n c1 = Counter() # starts at 0 c2 = Counter() # starts at 0 c1.count(3) # now at 3 c2.count(5) # now at 5 c1.count(1) # now at 4 print("C1 is at:", c1.value()) # prints 4 print("C2 is at:", c2.value()) # prints 5
In this code, when we write
c1.count(3)
how does Python know which object's number to increase? Thec1
before the dot is a variable whose value is the firstCounter
we created, and in theself.n = self.n + num
code of thecount
method,self
will be set to the same value asc1
, because that's what appears before the dot. So we would say this code "applies thecount
method to c1." The argument we supplied (3) is passed in as the second argumentnum
, since the first argument for a method is assigned automatically to the object the method is being applied to. Socount
is defined with two parameters, but only gets called with one argument. At the same time we need to callcount
using a base object and a dot, because otherwise there would be noself
object available. -
self
-
self
is by convention the name used for the first parameter of a method, which gets supplied automatically rather than via an argument.In theory this parameter could have any name you want; it's the position that matters, not the name, but using
self
for these automatic parameters is a convention that helps make it easier for everyone to read your code. - Special Method
-
A method which has one of a few reserved names that start and end with double underscores, and which gets used automatically by Python under certain circumstances.
This list of special method names includes all special methods that Python recognizes. The two most ubiquitous special methods are
__init__
for defining how new objects get set up, and__str__
for defining how any object can be converted to a string when the built-instr
function is applied to it. Other special methods can define how an object should work when used with operators or when being hashed. -
__str__
-
__repr__
-
A special method which defines how an object of a custom class should be converted into a string.
When using the
str
orprint
functions, for example, Python automatically converts objects to strings. This also happens inside format strings. In these cases when it encounters a custom object, it will use a default format that just names the object's class and shows where in memory the object is stored, like this:<__main__.a object at>
However, we might want to have our custom object be displayed with more detail as a string, and so we can define a
__str__
method for it. This method has just one parameter:self
, and it needs to return a string. For example, we might define a class like this:class Greeter: """ Stores a name and includes that name along with a greeting when converted to a string. """ def __init__(self, name): """ You must provide the name to store. """ self.name = name def __str__(self): """ Converts to a string by including a greeting along with the name. """ return "Hello " + self.name g = Greeter('Peter') print(g) # prints "Hello Peter"
When
print
is called withg
as the argument, Python needs to convertg
to a string automatically. Since it's a custom class, Python calls that class'__str__
method withg
as theself
argument, and gets the string"Hello Peter"
as the result, which is what it displays.The related
__repr__
special method also takes just aself
argument and is used to convert an object to a string whenever the built-inrepr
function is called on an object. Whenever possible, therepr
result should be a string that if used in code, would result in an equivalent object. So for the class we defined above, we might define:def __repr__(self): """ Returns a string that could be used in code to construct an equivalent Greeting. """ return "Greeting(" + repr(self.name) + ")"
Note how
repr
is used within the__repr__
special method: the representation result for a string will include quotation marks (and will correctly handle cases where the string itself includes quotation marks), so the result of callingrepr(g)
after the code shown above would be"Greeting('Peter')"
and this matches the line of code that was used to constructg
in the first place.Not all objects can realistically have a
repr
result that can be used to reconstruct an equivalent object, but to the extent that this is possible, it's the convention to do so.repr
is mostly used for debugging purposes or otherwise displaying an object in a more technical context. </main.a>
Extra: Data Formats
(We don't expect you to know these terms as part of this class.)
- Archive
- Extract
- TAR
- Zip File
-
An archive is a single file that can be extracted to produce multiple files and/or directories.
Archives are often compressed to save disk space and transmission bandwidth.
A zip file is an archive in the zip format. TAR (for "Tape ARchive"; sometimes called a "tarball") is another common format for code archives. The built-in Python
zipfile
module can be used to interact with zip archives. - ASCII
- American Standard Code for Information Exchange
-
An older, simple endcoding for storing text as binary data.
It assigns each integer from 0 to 127 to a specific symbol, so that a sequence of numbers (for example, a byte string) can be displayed as text. Negative numbers and numbers beyond 127 are not assigned any symbol, so you may get an error if they're included in data that you try to decode using ASCII. "Extended ASCII" includes 128 additional symbols, going up to 255, but there are several different versions that were designed to support the alphabets of different countries. Languages that use characters which aren't part of ACSCII need to use other encodings; the Unicode standard defines an encoding which includes many languages from all over the world as well as special symbols and emoji.
In theory, because each ASCII code is between 0 and 127, and all of those numbers can be represented using some combination of 7 bits, ASCII text could be stored using 7 bits per character. These days, an 8-bit byte has become standard, so ASCII text is often stored inefficiently using 8 bits per character.
This page has a chart of all ASCII codes.
- Byte String
-
A Python type that holds a sequence of bytes.
Whereas a regular string is a sequence of characters (which themselves are really single-character strings), a byte string is instead a sequence of integers. Python doesn't have a dedicated type to represent a byte, but since each byte is made of 8 bits, and if we interpret those bits as a number, it will be a number between 0 and 255, Python uses integers between 0 and 255 to represent bytes.
Byte strings can be written in Python by writing a 'b' before a normal string, and using only ASCII and escape sequences. For example:
b'a0\xf3'
The byte values in this string will be 97 (ASCII code for letter 'a'), 48 (ASCII code for digit '0'), and 243 (escape sequence
\xf3
encodes byte with hexadecimal value f3 which is 15*16 + 3 = 243. - Compress
-
To take data in one format and put it into another format in which is takes up less space.
- Escape Character
- Escape Sequence
-
A series of characters in a string or bytes in a byteString which has a special meaning. These start with an escape character, typically backslash.
For example, in a string, the sequence
'\n'
is an escape sequence that represents a single newline character. If you type a backslash followed by an n in your code as part of a string, Python will turn that into a newline character in the resulting program.In byte strings, values above 127 must be entered using escape sequences when the
b'...'
syntax for creating a byte string, because only ASCII characters are supported.Here is the full list of escape sequences that Python accepts.
- Graph (data structure)
- Edge (graph)
-
A collection of nodes linked together by edges.
The colloquial term "graph" usually refers to a chart or diagram, often drawn with one or more axes and points plotted in space to show relationships among values. But a graph data structure is an unrelated concept, holding nodes connected by edges. For example, a "social graph" might represent users of a social media platform as nodes, and edges could be added wherever one user follows another.
Graphs come in different flavors depending on their edges: a directed graph has edges which are directional, whereas an undirected graph has edges that don't have a direction. In a directed graph, if there's an edge connecting nodes A to node B, it doesn't necessarily mean that node B connects back to node A (that would be a separate edge in the other direction). But in an undirected graph, those two edges are the same thing, and there's only one possible edge between two nodes.
Graphs can be used to represent many kinds of real-world relationships, such as which users interact with each other in a multi-user application, which countries border each other on a map, which chemicals are used to synthesize which other chemicals in a factory, etc. Accordingly, there are a number of graph algorithms that analyze various properties of a graph and which can be useful in many different scenarios. Examples include a reachability algorithm which determines whether one node can be reached from another by crossing one or more edges, or a shortest-path algorithm that determines the shortest possible path from one node to another (i.e., the path that traverses the fewest edges).
Python has no built-in data structure that can store a graph, nor does it have dedicated syntax for creating graphs. However, dictionaries and/or lists can be used to store graphs, and there are open-source third-party libraries like
networkx
that can be used to create and analyze graphs. - Interpolation
-
A mathematical process for blending between two different values according to a third number, which usually takes the form of a floating-point number between 0 and 1, where 0 indicates 100% of the first value, 1 indicates 100% of the second value, and numbers in between indicate different blends of the two.
See also the unrelated term string interpolation.
An example of numerical interpolation might be interpolating between two 2-D points, like this:
# Point 1 x1 = 1 y1 = 1 # Point 2 x2 = 5 y2 = -5 # Interpolation variable t = 0.7 # Result: # t = 0 gives us (x1, y1) # t = 1 gives us (x2, y2) # In-between values are somewhere in between the two points x = x1 * (1 - t) + x2 * t y = y1 * (1 - t) + y2 * t print(x, y)
Numerical interpolation is called "linear" when the formula proceeds directly from one point to the other at a steady rate. We could also use other kinds of formulas, for example quadratic or cubic interpolation, where an exponent is added to the interpolation variable that changes how quickly a change in the interpolation variable translates into a change in the mixture between the two things being interpolated.
- Unicode
-
A widely-adopted, regularly updated encoding for storing text as binary data which supports many languages, mathematical symbols, and emoji.
In contrast to ASCII, which uses 7 bits per symbol, Unicode uses 32 bits per symbol (so each symbol is assigned a number between 0 and 4294967295, which is the largest number that 32 bits can hold). There are 8-bit and 16-bit versions of Unicode which are more efficient if you are mostly using characters from a limited part of the encoding space; for English text UTF-8 (the 8-bit version) is almost as efficient as ASCII.
In Python, characters in strings are stored using Unicode, so any Unicode symbol can be included in a string, and [escape codes](#escape code) can be used to include a symbol according to its numerical code, even if you can't type it on your keyboard.
The official Unicode website has lots of details, including downloadable charts of character codes and an index of character codes by name.
Extra: Software and Hardware
(We don't expect you to know these terms as part of this class.)
- Analog
-
Data which has a smooth, continuous value, rather than being broken up into discrete units like digital data.
Analog data can often be described or approximated by mathematical equations.
- Bandwidth
-
The amount of data that can be transmitted via a certain connection per unit time, usually measured in bits per second (or a larger denomination like bytes, megabytes, gigabytes, etc. per second).
Bandwidth determines how fast data can be transferred, and for certain applications like streaming audio or video, whether the experience will be smooth or not. (Streaming refers to sending data for immediate consumption, like sending video data to be displayed immediately rather than to be stored in a file and played back later.)
Bandwidth is typically limited, so using compression to save bandwidth is the norm.
- Binary
-
A number system that uses only the digits 0 and 1 to represent numbers. Whereas in the decimal system each place-value represents 10, in binary each place-value represents 2.
Sometimes also used as a synonym for digital, since digital data is also made of ones and zeroes. Sometimes also used as a synonym for executable, because the contents of an executable are "raw" CPU instructions which humans don't usually read directly, so compared to code, they're thought of as an arbitrary mass of binary data.
- Boot
-
The startup process for a computer.
This is short for "bootstrap" from the idiom "pull yourself up by your bootstraps" meaning "to start from nothing and build something." When a computer is first powered on, it must begin from a saved sequence of CPU instructions. Typically, these instructions locate a default drive and load further instructions from that drive to start the operating system.
- Corruption
-
A situation where data that should be in one format has been cut off or changed so that it doesn't match the original format any more.
Attempting to parse corrupted data will usually fail, although this could mean anything from an error to a situation where the parsing appears to succeed, but some of the objects it produced have the wrong values.
- CPU
- Central Processing Unit
- CPU Instruction
-
A central processing unit or CPU is a physical device in a computer which receives and transmits patterns of electricity via wires.
A CPU instruction is a special pattern of electricity which tells the CPU to carry out a specific operation; CPUs are designed so that when they receive CPU instructions one after the other, they transmit patterns of electricity which encode the results of those instructions. By connecting a CPU to devices like a keyboard, mouse, and monitor, the computer can run an operating system program which provides an interface for a user to run and manage other executables.
- Digital
-
Composed of bits, that is, ones or zeroes.
In contrast to analog data, which records a continuous value over time and might be expressed as a smooth equation, digital data is composed of bits. Analog data is typically more true to what is happening in the real world, but more difficult to interpret, whereas digital data is typically only an approximation of the real world but is easier to interpret and work with.
Digital data can be stored directly on a disk, whereas analog data would have to be encoded somehow, or stored on a different kind of storage device.
- Disk
- Drive
-
Disk refers to a physical object that stores data. Many modern disks are no longer disc-shaped, but the term comes from an era when most data storage devices were disc-shaped and data was accessed by spinning them around. Music records were the model for computer disks.
A drive (or disk drive) is a slot that a disk can be placed into to make its contents accessible to a computer. On Windows (and earlier, in DOS, which stands for "Disk Operating System"), each disk drive is identified by a single letter, and "C" (the third drive) is usually where the main file system is stored on a hard drive or solid-state drive, with drives A and B being used for ejectable floppy disks.
In the digital realm, both terms can refer to the data, stored on a disk, made available via a disk drive, and usually split into one or more partitions each containing a file system. So the phrase "C drive" might really refer to a disk, a drive, a partition, a file system, or even the root directory of that file system...
- Executable
- Script
-
An executable is a file containing CPU instructions which can be directly executed by a computer's CPU.
In contrast, a script is a file containing code which relies on an interpreter to run.
Python programs are scripts which run via the Python interpreter.
- Floppy
- Floppy Disk
- Floppy Drive
-
A floppy disk (or just floppy) is a disk made of a flexible magnetic material used to store data, usually held in a plastic shell so that the disc itself is only exposed when inserted into a drive (called a floppy drive) in a computer.
- HDD
- Hard Drive
- Hard Disk
- Hard Disk Drive
-
A disk drive which uses rigid disks storing magnetic data, typically permanently contained within an enclosure. Unlike a floppy disk, the disk part is permanently stored in the drive rather than being ejectable, so hard disk, hard drive, and hard disk drive can all refer to the same object.
In contrast to a solid state drive, a hard drive stores information using magnetic fields that are read (or written) using a tiny needle-like device that travels over the surface of the disk as it spins around. The physical limitations in terms of rotation speed, read/write head size, and so on limit how densely information can be stored and how fast it can be read or written. Hard drives are typically slower than solid state drives, but they are also typically more durable.
- Hardware
-
The physical device(s) that make up a computer, in contrast to software.
Hardware determines how a device is capable of interacting with the real world and what physical inputs it can accept and outputs it can generate. However, most computational devices don't do much without complicated software to control them.
- Interpreter
-
An executable which reads from a script file and allows that script to be run on a CPU.
The Python interpreter is an executable program that allows Python code to run on a CPU. When you tell an IDE to run your program, it will run the Python interpreter and point it at your program as the script to run.
Some other programming languages, called "compiled languages," instead first compile a program from the base language into an executable made of CPU instructions that can be run directly on the CPU.
- Partition
- Mount
-
A portion of a physical disk that stores data, separate from the data stored on other partitions.
Each partition usually contains its own file system, although some special partitions may contain data that's not organized into a file system. Different partitions of the same physical disk may be made available as different directories within the same operating system. For example, on Windows, two partitions of the same disk might be labeled as the C and D drives, even though they aren't actually two separate physical drives. It's also possible to store different operating systems on different partitions and select which operating system to run during the boot process when turning on the computer.
The process of making a partition's data available via an operating system is called mounting the partition. In some cases, an operating system may make it appear as though a file system stored in one partition contains directories that are actually other file systems mounted from other partitions. This means that a file tree can span multiple file systems.
Normally, the data stored in a partition cannot be accessed unless it is mounted, although some low-level tools can do things like copy all of the data to a new partition without mounting it.
- RAM
- Random Access Memory
-
RAM stands for Random Access Memory, which is a kind of physical circuit that can be used to store digital information. A RAM circuit only stores information as long as it is powered on, so whatever is stored there disappears when the computer shuts down. In a computer, these circuits are used to hold temporary memory, while anything that needs to be stored permanently should be written to a disk, usually by writing to a file in a file system.
All objects used in a Python program are stored in RAM, and do not persist beyond the end of the program. If you want to save information permanently, you can write it to a file.
- Server
-
A computer that's used to relay data to other computers, rather than being controlled directly by a user.
For example, when you load a web page, the data for that page is typically sent by a server that has no keyboard or mouse connected to it, and which is designed to just responds to web page requests.
In an application that involves both server and client programs, if the server becomes unavailable, the client programs will typically display some kind of error message and the application won't work. Similarly, even if the server is working, if a particular user cannot make a connection to it (maybe they're using a cell phone and out of service range) their application won't work because the client program can't function without being able to exchange data with the server program.
For large applications with many users, there will typically be many different interchangeable servers that users can connect to so that even if some fail, others can pick up the slack and the application will still function. When servers fail or when a connection is unreliable, data corruption can result.
- Software
-
Programs, in contrast to hardware.
While hardware refers to the physical devices that allow for computation to happen, software refers to the programs like operating systems and applications that run on the hardware and determine much of its actual user-facing functionality. Given specific hardware, we can run certain software, and when running certain software, the user can accomplish certain tasks. In some cases the same software can run on different kinds of hardware, in other cases the same hardware can be used to run different kinds of software.
- SDD
- Solid-State Drive
-
A data storage drive that uses flash memory circuits to store information.
Unlike RAM, a solid-state drive retains stores information even when powered off, but unlike a hard drive, it has no moving parts and uses electrical charge instead of magnetic fields to store data.
Because it has no moving parts, an SSD can typically be read from and written to much faster than a hard drive. However, the circuits used degrade over time, particularly when data is overwritten, and so SSDs fail sooner than hard drives. When memory cells in an SSD wear out, they may not store the data that they are supposed to, possibly leading to data corruption, although modern drives have mechanisms to detect and warn about cell failure.
Extra: Software Development
(We don't expect you to know these terms as part of this class.)
- Build
-
The process of preparing the code of a package for release, often also combining the relevant files and directories into an archive.
In some programming languages that are compiled, the build process includes compiling the code into one or more executable or [shared library}(#sharedLibrary) files.
While Python code can optionally be compiled into "bytecode," this is not necessary for building a package, and instead the process mostly consists of bundling together code and metadata into an archive that will be formatted correctly for use by a package repository.
- Compile
-
The process of turning code into CPU instructions to produce an executable.
Programming languages which rely on a compilation step are called "compiled languages," in contrast to languages like Python that use an interpreter which are called "interpreted languages."
Technically, Python code can be compiled into "bytecode," but this process is optional as the Python interpreter can produce the bytecode it needs as a program is being run.
- Dependency
-
A package that another package or program depends on in order to function correctly.
While user-facing programs often use libraries to support their functionality, libraries also use other libraries to handle parts of what they need. For example, a photo-processing application might use an image-loading library to load images in different formats. That library might in turn rely on individual format-specific libraries like a JPG library or a PNG library to read data stored in different formats. In this case, the image-loading library is a dependency of the application, and the format-specific libraries are dependencies of the image-loading library.
Taken together, the direct and indirect dependencies of any program form a tree. In order to run the program, all of the packages it depends on need to be available. Typically, a package manager will automatically manage dependencies, figuring out what's required and installing everything needed based on a request for a single package.
The situation gets complicated when versions are considered. Sometimes one package depends on a specific version (or range of versions) of another, but a second package depends on an incompatible version. In this case, there's no way to satisfy the dependencies of both packages at once.
- Developer
- Software Development
- Software Engineering
-
A developer or software engineer is someone who develops/engineers software, usually also a programmer.
Software development/engineering is the process of building programs to solve particular problems or otherwise satisfy design goals.
There are aspects of software development beyond just programming, such as interface design and system design, so on a larger team, not every developer is necessarily a programmer. The role is distinct from a producer whose job is to manage scheduling and planning plus take care of other team organizational tasks, without worrying about the actual design or implementation details directly. Of course, it's possible to be both a developer and a producer if you're doing work that fits into both categories.
Outside of the computer science context, of course, developer may refer to people who do other kinds of development. The "engineering" form emphasizes that software is built to satisfy external design constraints which originate in both humans and nature, and that we seek to build software that operates reliably within those constraints and fails safely when it encounters an error or is pushed beyond its limits. Some software development is more craft- or art-like (e.g., developing a video game) while other software development is more engineering-like (e.g., building a controller for an industrial assembly arm).
There are many different ways to approach software development, including team-based and solo strategies, different ways of discovering and organizing your goals, and different strategies for how to build complex programs.
- Interface
- API
- Application Programming Interface
- UI
- User Interface
- GUI
- Graphical User Interface
-
A physical mechanism or software abstraction that allows a user to interact with a computer in some way.
For example, a keyboard is an interface that allows for typing in text, while a mouse or touchscreen might allow for indicating positions in two dimensions and sending "click" or "tap" actions. On the software side, for example a web browser is an interface for viewing web pages; a web page could also be opened with a text editor but you wouldn't be able to interact with it in the same way.
Every program has one or more interfaces, even if it's just a supporting library designed to be used by other programmers. In that case, we call the collection of functions and other variables that the library makes available an application programming interface or API because it can be used by others to program an application.
Programs that interact with their users directly (for example, by accepting text input or responding to mouse clicks) have an interface which encompasses all of the ways they respond to user inputs and all of the things they display to help the user understand what to input or provide the user with other information or feedback. This is called a user interface or UI. A graphical user interface or GUI is an interface that displays graphics like images beyond just text, and which will often allow for interaction via a mouse or touch screen rather than just accepting text input from a keyboard.
- Library
-
A collection of code designed to be imported and used by other code, rather than to run as a program by itself. Can be organized into one or more modules.
Python has a substantial built-in standard library of modules that can all be imported by any Python program, and the Python Package Index holds a huge number of additional packages that can be used to install more libraries and programs.
- License
- Copyleft
-
A license is a document that specifies how one is allowed to use a program.
Licenses may require payment or otherwise restrict the use of software, or they may instead allow for it to be used and shared freely, with some conditions (like providing attribution to the original author). Such permissive licenses are used by open-source projects.
A copyleft license is one that requires that anyone who distributes a modified version also allow others to distribute modified versions of the new version, so that nobody can copy the original, make some changes, and then distribute their version with a more restrictive license. Some licenses also restrict certain uses, or restrict the ability to charge money for access or use. The GNU General Public License is an example of a copyleft license that also restricts commercial use.
The Wikipedia page on software licenses has many more details. The Creative Commons also publishes a list of licenses with various restrictions that are not particular to software and are often use for art.
- Maintainer
-
Someone who handles long-term maintenance tasks for a piece of software, such as a library.
As the versions of dependencies evolve and new bugs are discovered, or features need to be added, there's always more changes that need to be made to keep things working, and so maintainers are necessary for any software that gets used by a significant audience over several years or more.
- Metadata
-
Data which is extraneous to the core purpose of a file or other collection of data.
For example, an image file needs to store color data, but it may also store information like the name of the person who created the file, when the file was created, or if it's a photo, where it was taken and what camera settings were used. None of this extra data is necessary to display or even edit the image itself, but it's useful for other purposes, like searching through a collection images. Some file formats include a way to store metadata within the file, or in other cases, metadata may be stored in an archive or otherwise stored somewhere else.
When metadata is stored separately from the main data it pertains to, it can get lost (for example, if metadata about song authors is stored in a database by your music player, if you share a song file with a friend that information may not be included). On the other hand, when metadata is stored in the same file as the data it pertains to, it may be shared even when the user doesn't intend to share that much information (for example, when posting a selfie to social media, metadata about where the picture was taken may be included automatically, allowing other users to figure out where you live). Web sites that handle images often strip out metadata by default to avoid this problem.
- Open Source
- FOSS
- Free and Open Source Software
-
Software or hardware where the code or hardware specifications is/are freely available and can be shared by anyone is open-source.
In contrast, many applications that cost money or are developed by corporations are provided as executables where the original "source" code is not available.
The developer of a piece of open-source software may still charge money for it via a restrictive license, but in many cases, open-source software is free and open-source software, or FOSS, meaning that anyone is allowed to use it without charge, and the code is available to be shared freely as well. FOSS is generally produced by volunteers and hobbyists rather than by corporations, although some FOSS developers are paid for their work via crowdfunding or other sources.
A large part of the software that makes up modern operating systems is free and open-source software, maintained by volunteers. The Python programming language itself is FOSS.
- OS
- Operating System
-
A management program that starts as soon as the computer is turned on during the boot process and which enables basic user interfaces that allow the person using the computer to launch other programs, access files on file systems, and close programs or shut down and restart the computer when necessary.
Typically, each program is designed to work with one particular operating system, although there are some programs that can work on multiple operating systems. Because Python interpreter programs exist for many different operating systems, a program written in Python can run on any of them.
- Package
-
A collection of code that can be installed or uninstalled using a package manager.
A package may contain a single program, or it may contain one or more modules designed to be imported and used as libraries. Packages may depend on other packages, and a package manager will install all required dependencies when a package is requested.
- Package Manager
-
A tool for downloading, installing, and uninstalling packages from one or more package repositories.
Using a package manager makes it easy to install and upgrade packages, since it can keep track of different versions and dependencies between them, and can automatically install all required dependencies when the programmer wants to install a specific package.
- Package Repository
-
A collection of packages, usually available via the internet, including metadata like descriptions and license information and an API so that they can be accessed by a package manager.
A package repository usually contains only free software (often mostly open-source) so that packages can be installed without worrying about payment.
The Python Package Index is the standard repository for Python packages not published by the Python developers themselves.
- Producer
-
Someone who manages the logistical aspects of a software development team.
A producer is not typically the manager for the team in charge of hiring and promotions, but does have authority to set schedules, call meetings, etc. Their job is to make sure that everyone else on the team has what they need when they need it to do their work, and that team members communicate and collaborate effectively.
Producers typically need some level of technical expertise to understand the work their colleagues are doing, but since they aren't necessarily developers they don't need to be experts at writing code or designing software.
- Prototype
- Paper Prototype
-
An incomplete application designed to be used to solicit feedback about the design or otherwise test some of the full functionality before the entire thing is ready.
This can range from a paper prototype where a mock-up of the user interface is made out of paper and manipulated by a designer to mimic intended functionality, all the way to a full-fledged alpha version that is nearly complete but still undergoing final testing and revisions. Code prototypes are usually assigned version numbers less than 1; paper prototypes usually do not have version numbers except when doing very extensive user testing.
- Release
-
Upload a build of a package to a package repository.
More generally, making some code available to a broader audience. As a noun, release usually refers to a specific version of a package, for example "the 2.12.10 release of Python."
- User Interface Design
-
The process of designing, testing, and revising a user interface, sometimes with the goal of making it easy to use.
Since an application can accept input and provide feedback to its users in a huge variety of ways, it can be tricky to figure out what design is easiest for users to use. User testing and prototyping can be valuable for understanding how a certain design will actually be used and then improving it. Accordingly, user interface design is usually a cyclical process of developing a new design prototype, testing it, and then improving it before testing it again.
The domain of human-computer interaction studies user interfaces and best practices in user interface design.
Quite often, the goals of the interface designer may be counter to the goals of the user. For example, the designer of a banking application may want as many users to opt into a different account structure that has more hidden fees, even though this is not what the users want. In these cases, the goal of the designer is no longer "ease of use" and interfaces may become more difficult to use, especially where monopolies exist that avoid the risk of users switching to a competitor's application. More generally, there are a wide range of design goals besides "easy to use" that designers often care about.
- User Testing
-
The process of letting users use an application or prototype and observing how they interact with it, usually in order to understand how to improve its user interface during the user interface design process.
Typically, a designer will observe the user's interactions without interfering at all, since once an application is published, most users will not have a chance to get guidance from the designer on how to use it. This can be frustrating for the designer when the user who is testing does not understand some aspect of the interface or gets stuck, but such occurrences point to places where it may be necessary to provide more information or context, or otherwise change interaction design to avoid them.
- Version
- Version Number
-
A specific collection of code for a package that's been saved at a particular point in time, usually identified by a version number.
As maintainers keep fixing and upgrading packages, they periodically build a specific version of the code and release it via a package repository. Certain versions of one package may depend on a specific range of versions of other packages. A package manager can figure out which version(s) of which packages are needed as dependencies for a particular version of a desired package, and can manage package upgrades.
Click here for more details
Since software is continually evolving, instructions that used to work months or years ago may not work any more with the most up-to-date versions of the packages involved. Things get even more complicated when packages depend on hardware features that might no longer be available.
A standard called semantic versioning is used for many version numbers in modern packages. A semantic version number is made up of one to three integers separated by periods, identifying a "major," "minor," and "patch" version. If there are minor fixes or improvements to a package, that will typically be indicated by a new patch version, with the major & minor versions remaining the same. If there are new features added but all the old features still work the same way, that will be a new minor version. If major changes are made, especially if the new version doesn't support all the same operations that the previous version did, or if the kinds of results for some operations changed, a new major version number will be used.
Python itself uses semantic versioning: For example, the differences between versions 2.11.4 and 2.11.5 will be minor fixes, differences between versions 2.11 and 2.12 would include new features, and differences between versions 2 and 3 include major changes to how statements are written so that code which works with version 2 might not work at all with version 3, and vice versa.
Terms like "alpha" and "beta" can also be used to categorize versions based on their maturity.
Extra: CS Disciplines & Careers
(We don't expect you to know these terms as part of this class.)
- AI
- Artificial Intelligence
-
A sub-field of computer science that involves developing programs that appear to be intelligent.
Because "intelligence" has many different definitions and might involve things like creativity, solving puzzles, learning from mistakes, etc., there are many different branches of AI research that focus on different things.
The term AI has also become (around 2020) a synonym for particular technologies that generate text or images based on prompts, namely large language models for generating text and generative adversarial networks for generating images. These particular techniques are only a small part of the broader field of AI within computer science.
- Compilers
-
A sub-field of computer science that focuses on the design and optimization of programs that compile code into executables.
Any improvement to the efficiency of executables generated by a compiler can be automatically applied to any program written in the programming language that the compiler uses, so new compiler optimizations can have a big impact. Unfortunately, they often represent tradeoffs rather than strict improvements, and in some cases, they can cause the abstractions provided by the compiler to leak, meaning that to take advantage of them programmers need to understand the details of the hardware their code is being compiled for.
The study of compilers is closely related to the study of programming languages.
- Computer Engineering
- Electrical Engineering
-
Computer Engineering is a field related to computer science that studies how to build computer hardware.
Also a career in the design and manufacturing of computers and related systems.
Electrical engineering is broader and covers all kinds of electrical systems, including non-computer electronics like power grids, lighting, and heating circuits.
- Computer Science Education
-
A sub-field of computer science focused on how to teach computer science concepts.
Includes questions such as what programming languages are best for beginners, what order different concepts should be taught in, and how to evaluate the impact of different teaching technologies.
Teaching any subject can be approached as a craft and/or as a science, and most computer science professors do not necessarily do computer science education research, despite being practitioners.
Computer Systems
-
A sub-field of computer science that involves the study of hardware, operating systems, and other less-abstract aspects of computers.
Often focuses on speed, power efficiency, or other performance metrics, and ideally finds breakthroughs that can be applied to make computers more capable or efficient while maintaining existing abstractions so that existing software does not need to change.
Computer systems is related to computer engineering but places more emphasis on the combination of hardware and software rather than focusing on the hardware alone.
- Data Science
-
A career involving gathering, managing, and analyzing data.
Data scientist positions can be found in corporations, nonprofits, government, journalism, and elsewhere. Almost any large organization can benefit from analyzing its own data to improve efficiency and/or analyzing customer or user data to improve its services. Of course, such analysis is often used to serve the interests of the organization even when they run counter to the interests of the people outside the organization that it serves or interacts with.
Despite the name, data scientists often skip over steps or ignore the scientific method, in part simply because a strictly scientific approach limits the insights one can gain from data, and because organizations are willing to make changes based on potential data patterns and then react to the results, rather than waiting for a strict scientific conclusion (which may be impossible to reach) before taking action.
- GAN
- Generative Adversarial Network
-
A machine learning system which uses opposing "generator" and "detector" networks to generate images (or other data like music or videos) that matches the style of a set of input images used as training data.
Along with large language models, generative adversarial networks are part of the contemporary idea of AI circa 2020-2025. As with large language models, once trained on a sufficiently large dataset, GANs can reproduce a wide range of images in different styles based on a short text prompt.
GANs, as currently deployed by large corporations, have all of the same ethical issues as LLMs.
- HCI
- Human-Computer Interaction
-
A sub-field of computer science that focuses on user interfaces and user interface design, with the goal of understanding how people interact with computers and how the design of applications can shape those interactions.
Draws on methodology from psychology and anthropology since it studies human activities, perceptions, and reactions, but also often involves technical programming work to prototype new or alternative interfaces. Often uses the scientific method to develop formal theories of how people use computers.
- IT
- Information Technology
-
Devices that manage and exchange information, typically computers (including desktops, laptops, cell phones, and all sorts of "smart" devices).
Also refers to people or departments who work with these devices to maintain or upgrade them.
Also refers to an academic discipline studying these devices which is very close to computer science but with more emphasis on technology and less on theory (in some contexts, it may just be a synonym for computer science).
- Information Technology Operations
- DevOps
-
IT Operations is a career involving deploying and maintaining information technology systems that support an application, web site, or an organization's employees.
Overlaps with system administration.
An IT operations role might involve purchasing and setting up physical hardware like servers to support an application being built by software engineers, then installing operating systems and the new application server software onto those machines and making sure that they can connect to the internet, and finally maintaining them and fixing problems with software or hardware as the application gets released.
DevOps is a career that integrates software development with operations, so that the same team which develops an application is also responsible for deploying and maintaining it.
- LLM
- Large Language Model
-
A statistical model of word sequences that can predict text that might follow a prompt.
LLMs are a machine learning technique which is the underlying technology behind AI "chatbots" that give the illusion of a conversation by predicting what text might occur next in a conversation after the user types in their message. When trained on hundreds or thousands of examples, their predictions are very limited and humans can easily see their limitations, but when trained on billions of examples, they can start to give a convincing illusion of intelligence.
Large language models have some inherent limitations, most notably the fact that they will sometimes produce text that's a verbatim copy of the examples they were trained on, which can be a copyright issue, and the fact that since they are just predicting word combinations, they do not reason about the truth of the text they predict, and can therefore produce true-sounding statements which are completely false (this is sometimes called "hallucination" although that term makes it sound like the LLMs are thinking rather than making statistical predictions).
Modern LLMs circa 2020-2025 are haunted by several ethical issues:
- The methods by which corporations acquire and filter the data necessary to train large language models are unethical, including piracy and worker abuses.
- When these models produce verbatim copies of their training data, the result is plagiarism, but current designs have no way of altering the user to this.
- Chatbots built using these models have encouraged their users to commit suicide and otherwise generated harmful text, despite attempts to prevent this.
- The ever-increasing energy and water demands of datacenters planned to support projected future LLM use are incompatible with the changes in energy and water use necessary to combat global climate change.
- Despite being trained on human writing without any compensation to the authors of the training data, and despite their mediocre performance including regular significant errors, LLMs are competing with humans for jobs in copywriting and related fields, and humans are using LLMs to generate short stories and books which make it more difficult for human authors to sell their work.
- Websites generated using LLMs containing a mix of truths and untruths have begun to crowd out older human-authored websites in search results, making accurate information harder to find on the internet. Other uses of LLMs for spam and information warfare cause further harms.
This is a partial list of issues with how large corporations are deploying LLMs. These are not all underlying issues with the technology itself; many of these just result from cost-cutting measures.
- Machine Learning
-
A sub-field of artificial intelligence that involves developing algorithms for computers to "learn" from patterns in data (called "training data") so that they can accomplish tasks without being explicitly programmed for that specific task.
Large Language Models and Generative Adversarial Networks are machine learning techniques that are colloquially referred to as "AI." although that term is much broader than those specific technologies.
- Software Engineering (career)
- Software Development (career)
-
A career involving writing code and designing applications.
- Programming Languages (discipline)
-
A sub-field of computer science focused on the study of programming languages, including how they are designed and how they can be made more powerful or easier to use.
- System Administration
-
A career involving maintaining shared computer systems, including keeping their hardware and operating systems running smoothly and managing accounts and permissions for their users.
System administration uses some of the same skills as information technology operations, but typically involves managing one or several systems for a base of users who form a shared community, rather than for customers.