@extends('template')
@section('title')
Lab 13: Working with Data
@stop
@section('content')
# Lab 13: Working with Data
## Summary
As usual, we'll work on this week's B notebook today.
@include('/labs/lab13/_toc')
## Big Questions {.bigqs}
- **Why is a list of dictionaries a natural format for storing data?**
Show Answer
There's no single best answer here, but at a basic level, it mimics a
spreadsheet, since each list entry can have different keys to
represent labeled columns, and in the code, it's easy to iterate
through the rows and then access multiple parts of each row by name.
Why are spreadsheets so popular? They're a good way for
non-programmers to record, inspect, and even analyze data, and they
can represent a lot of different kinds of data in one common format.
One caveat: although the list-of-dictionaries format is conceptually
simple, it's computationally very inefficient (since you have to
store the keys separately in each row). Using a library like `pandas`
for data processing would be better if you have a lot of data, but
for small data sets, it probably won't matter.
- **Why do we want to store data in files, as opposed to directly in
Python variables?**
Show Answer
The important thing about files is that they persist: they will still
be there even after we quit the program or turn the computer off and
on again. If we stored our data in Python variables, we'd have to
keep the program running for as long as we wanted that data to
stay around. Another advantage of files: you can send them in an
email or put them on a USB drive and transfer them around, which you
can't do with Python variables.
A related question is: why store data in CSV or JSON formats instead
of just writing a `.py` file which when run defines the data
structure we want? There are two big reasons for this: first,
interoperability: lots of programming languages and programs can read
and/or write CSV files, but only someone who knows how to program
Python can interpret a Python data file, and programs written in
other languages probably wouldn't be able to use it.
The second reason is security: If we ask you to run a Python file
which defines our data, it's possible to sneak in some other Python
commands that act as a virus. So if someone you don't trust sends you
a `.py` file, it's probably best not to just run it without looking
inside and understanding what the code is first. In contrast, CSV and
JSON formats can't include any code: they just include data. You
might still want to be cautious about what data they do have, but
it's much safer to actually open up and look at that data or load it
into a processing program.
@stop