Key Word(s): SQL
SQLite
Exercises¶
Today you will work with the candidates.txt
and contributors.txt
datasets to
create a database in python
using SQLite
. This is a hands-on lecture. For
once, you may do these tasks in a Jupyter notebook.
The exercises will consist of a sequence of steps to help illustrate basic commands.
Exercise Deliverables¶
- Make a copy of this notebook and call it
L21_Exercises.ipynb
.- For each step in the exercise notebook, there are instructions labeled "Do the following:" (except for Setup and Interlude). Put all the code for those instructions in code cell(s) immediately following the instructions. The code in those cells should look like a
python
script. You must place comments where appropriate that describe your intentions and understanding. When in doubt add a comment. Note: To get thepandas
tables to display in a cell, usedisplay()
.- Save and close your database. Be sure to upload your database with the lecture exercises. You must name your database
L21DB.sqlite
.- Commit your completed work to your class
git
repo inside the directorylectures/lecture21
on themain
(ormaster
) branch.
import sqlite3
We will also use a basic a pandas
feature to display tables in the database.
import pandas as pd
pd.set_option('display.width', 500)
pd.set_option('display.max_rows', None)
pd.set_option('display.max_columns', 100)
pd.set_option('display.notebook_repr_html', True)
from IPython.display import display
Now we create the tables in the database (just like last time).
db = sqlite3.connect('L21DB.sqlite')
cursor = db.cursor()
cursor.execute("DROP TABLE IF EXISTS candidates")
cursor.execute("DROP TABLE IF EXISTS contributors")
cursor.execute("PRAGMA foreign_keys=1")
cursor.execute('''CREATE TABLE candidates (
id INTEGER PRIMARY KEY NOT NULL,
first_name TEXT,
last_name TEXT,
middle_init TEXT,
party TEXT NOT NULL)''')
db.commit() # Commit changes to the database
cursor.execute('''CREATE TABLE contributors (
id INTEGER PRIMARY KEY AUTOINCREMENT NOT NULL,
last_name TEXT,
first_name TEXT,
middle_name TEXT,
street_1 TEXT,
street_2 TEXT,
city TEXT,
state TEXT,
zip TEXT,
amount REAL,
date DATETIME,
candidate_id INTEGER NOT NULL,
FOREIGN KEY(candidate_id) REFERENCES candidates(id))''')
db.commit()
Next we load the data for the candidates:
with open ("candidates.txt") as candidates:
next(candidates) # jump over the header
for line in candidates.readlines():
cid, first_name, last_name, middle_name, party = line.strip().split('|')
vals_to_insert = (int(cid), first_name, last_name, middle_name, party)
cursor.execute('''INSERT INTO candidates
(id, first_name, last_name, middle_init, party)
VALUES (?, ?, ?, ?, ?)''', vals_to_insert)
db.commit()
Interlude¶
Now that you have values in the tables of the database, it would be convenient to be able to visualize those tables in some way. We'll write a little helper function to accomplish this.
def viz_tables(cols, query):
q = cursor.execute(query).fetchall()
framelist = dict()
for i, col_name in enumerate(cols):
framelist[col_name] = [row[i] for row in q]
return pd.DataFrame.from_dict(framelist)
Here's how we can use our helper function. It gives a pretty nice visualization
of our table. You should do the same thing with the contributors
table.
candidate_cols = [col[1] for col in cursor.execute("PRAGMA table_info(candidates)")]
query = '''SELECT * FROM candidates'''
viz_tables(candidate_cols, query)
Do the following:¶
- Load the data in the
contributors
table by loading the data from thecontributors.txt
file. You are not allowed to use afor
-loop in this task (comprehensions are allowed). - Display the contributors table.
Step 2: Various Queries¶
We can query our database for entries with certain characteristics. For
example, we can query the candidates
table for entries whose middle name
fields are not empty.
query = '''SELECT * FROM candidates WHERE middle_init <> ""'''
viz_tables(candidate_cols, query)
We can also see how many entries satisfy the query:
print("{} candidates have a middle initial.".format(viz_tables(candidate_cols, query).shape[0]))
This used the shape
method on the returned pandas
table. You'll get to
practice counting in SQL later.
Do the following queries:¶
- Display the contributors where the state is "PA"
- Display the contributors where the amount contributed is greater than $\$1000.00$.
- Display the contributors from the state "UT" where the amount contributed is greater than $\$1000.00$.
- Display the contributors who didn't list their state
- Hint: Match
state
to the empty string
- Hint: Match
- Display the contributors from "WA" or "PA"
- Display the contributors who contributed between $\$100.00$ and $\$200.00$.
- Hint: You can use the
BETWEEN 100.00 and 200.00
clause.
- Hint: You can use the
Step 3: Sorting¶
It could be beneficial to sort by one of the attributes in the database. The following cell contains a basic sorting demo. Run it and try to understand what happened.
query = '''SELECT * FROM candidates ORDER BY id DESC'''
viz_tables(candidate_cols, query)
Do the following sorts on the contributors
table:¶
- Sort the
candidates
table bylast_name
. - Sort the
contributors
table by theamount
in descending order whereamount
is restricted to be between $\$1000.00$ and $\$5000.00$.- Hint: In your SQL command, start with getting the amount between the specified range followed by the sort. This will all be done in one line.
- Sort the contributors who donated between $\$1000.00$ and $\$5000.00$ by
candidate_id
and then byamount
in descending order.- Hint: Multiple orderings can be accomplished by separating requests
after
ORDER BY
with commas. - e.g.
ORDER BY amount ASC, last_name DESC
- Hint: Multiple orderings can be accomplished by separating requests
after
Step 4: Selecting Columns¶
So far, we've been selecting all columns from a table (i.e. SELECT * FROM
).
Often, we just want to select specific columns (e.g. SELECT amount FROM
).
query = '''SELECT last_name, party FROM candidates'''
viz_tables(['last_name', 'party'], query)
Using the DISTINCT
clause, you remove duplicate rows.
query = '''SELECT DISTINCT party FROM candidates'''
viz_tables(['party'], query)
Do the following:¶
- Get the first and last name of contributors. Make sure each row has distinct values.
Step 5: Altering Tables¶
The ALTER
clause allows us to modify tables in our database. Here, we add a
new column to our candidates table called full_name
.
cursor.execute('''ALTER TABLE candidates ADD COLUMN full_name TEXT''')
candidate_cols = [col[1] for col in cursor.execute("PRAGMA table_info(candidates)")]
viz_tables(candidate_cols, '''SELECT * FROM candidates''')
What if we want to rename or delete a column? It can't be done with SQLite
with a single command. We need to follow some roundabout steps (see SQLite
ALTER TABLE). We won't
consider this case at the moment.
For now, let's put a few commands together to populate the full_name
column.
candidate_cols = [col[1] for col in cursor.execute("PRAGMA table_info(candidates)")] # regenerate columns with full_name
query = '''SELECT id, last_name, first_name FROM candidates''' # Select a few columns
full_name_and_id = [(attr[1] + ", " + attr[2], attr[0]) for attr in cursor.execute(query).fetchall()] # List of tuples: (full_name, id)
update = '''UPDATE candidates SET full_name = ? WHERE id = ?''' # Update the table
for rows in full_name_and_id:
cursor.execute(update, rows)
query = '''SELECT * FROM candidates'''
viz_tables(candidate_cols, query)
Here's another update, this time on an existing column.
update = '''UPDATE candidates SET full_name = "WINNER" WHERE last_name = "Obama"'''
cursor.execute(update)
update = '''UPDATE candidates SET full_name = "RUNNER-UP" WHERE last_name = "McCain"'''
cursor.execute(update)
viz_tables(candidate_cols, query)
Do the following:¶
- Add a new column to the contributors table called
full_name
. The value in that column should be in the formlast_name, first_name
. - Change the value in the
full_name
column to the string"Too Much"
if someone donated more than $\$1000.00$.
Step 6: Aggregation¶
You can perform reduction operations on the values in the database. For example, you can compute the maximum, minimum, sum or the total number from multiple input values. Here's a little example:
contributor_cols = [col[1] for col in cursor.execute("PRAGMA table_info(contributors)")] # You've already done this part in your solution.
function = '''SELECT *, MAX(amount) AS max_amount FROM contributors'''
viz_tables(contributor_cols, function)
Do the following:¶
- Modify the demo to only output the max amount.
- Hints:
- Instead of using
SELECT *, MAX(amount) ...
you can trySELECT MAX(amount) ...
- You will want to use
cursor.execute()
here and thenfetchall()
. - Do not display your results in a table. It is sufficient to write the answer out to the screen as a single number.
- Instead of using
- Hints:
- Count how many donations there were above $\$1000.00$.
- Hint: There is a
COUNT
function.
- Hint: There is a
- Calculate the average positive donation.
- Hint: There is an
AVG
function.
- Hint: There is an
- Calculate the average contribution from each state and display in a table.
Restrict to positive values again.
- Hint: Use code that looks like:
"SELECT state,SUM(amount) FROM contributors GROUP BY state"
.
- Hint: Use code that looks like:
Step 7: DELETE¶
We have already noted that SQLite
can't drop columns in a straightforward
manner. However, it can delete rows quite simply. Here's the syntax:
deletion = '''DELETE FROM table_name WHERE condition'''
query = '''SELECT * FROM candidates LIMIT 3'''
viz_tables(candidate_cols, query)
query = '''SELECT * FROM candidates LIMIT 4 OFFSET 5'''
viz_tables(candidate_cols, query)
query = '''SELECT * FROM candidates ORDER BY last_name LIMIT 4 OFFSET 5'''
viz_tables(candidate_cols, query)
Do the following:¶
- Query and display the ten most generous donors.
- Query and display the ten least generous donors who donated a positive amount of money (since the data we have has some negative numbers in it...).
Save¶
Don't forget to save all of these changes to your database using db.commit()
.
Before closing your editor or IDE, be sure to close the database connection with
db.close()
.