Key Word(s): SQL
SQLite Exercises¶
Today you will work with the candidates.txt and contributors.txt datasets to
create a database in python using SQLite. This is a hands-on lecture. For
once, you may do these tasks in a Jupyter notebook.
The exercises will consist of a sequence of steps to help illustrate basic commands.
Exercise Deliverables¶
- Make a copy of this notebook and call it
L21_Exercises.ipynb.- For each step in the exercise notebook, there are instructions labeled "Do the following:" (except for Setup and Interlude). Put all the code for those instructions in code cell(s) immediately following the instructions. The code in those cells should look like a
pythonscript. You must place comments where appropriate that describe your intentions and understanding. When in doubt add a comment. Note: To get thepandastables to display in a cell, usedisplay().- Save and close your database. Be sure to upload your database with the lecture exercises. You must name your database
L21DB.sqlite.- Commit your completed work to your class
gitrepo inside the directorylectures/lecture21on themain(ormaster) branch.
import sqlite3
We will also use a basic a pandas feature to display tables in the database.
import pandas as pd
pd.set_option('display.width', 500)
pd.set_option('display.max_rows', None)
pd.set_option('display.max_columns', 100)
pd.set_option('display.notebook_repr_html', True)
from IPython.display import display
Now we create the tables in the database (just like last time).
db = sqlite3.connect('L21DB.sqlite')
cursor = db.cursor()
cursor.execute("DROP TABLE IF EXISTS candidates")
cursor.execute("DROP TABLE IF EXISTS contributors")
cursor.execute("PRAGMA foreign_keys=1")
cursor.execute('''CREATE TABLE candidates (
id INTEGER PRIMARY KEY NOT NULL,
first_name TEXT,
last_name TEXT,
middle_init TEXT,
party TEXT NOT NULL)''')
db.commit() # Commit changes to the database
cursor.execute('''CREATE TABLE contributors (
id INTEGER PRIMARY KEY AUTOINCREMENT NOT NULL,
last_name TEXT,
first_name TEXT,
middle_name TEXT,
street_1 TEXT,
street_2 TEXT,
city TEXT,
state TEXT,
zip TEXT,
amount REAL,
date DATETIME,
candidate_id INTEGER NOT NULL,
FOREIGN KEY(candidate_id) REFERENCES candidates(id))''')
db.commit()
Next we load the data for the candidates:
with open ("candidates.txt") as candidates:
next(candidates) # jump over the header
for line in candidates.readlines():
cid, first_name, last_name, middle_name, party = line.strip().split('|')
vals_to_insert = (int(cid), first_name, last_name, middle_name, party)
cursor.execute('''INSERT INTO candidates
(id, first_name, last_name, middle_init, party)
VALUES (?, ?, ?, ?, ?)''', vals_to_insert)
db.commit()
Interlude¶
Now that you have values in the tables of the database, it would be convenient to be able to visualize those tables in some way. We'll write a little helper function to accomplish this.
def viz_tables(cols, query):
q = cursor.execute(query).fetchall()
framelist = dict()
for i, col_name in enumerate(cols):
framelist[col_name] = [row[i] for row in q]
return pd.DataFrame.from_dict(framelist)
Here's how we can use our helper function. It gives a pretty nice visualization
of our table. You should do the same thing with the contributors table.
candidate_cols = [col[1] for col in cursor.execute("PRAGMA table_info(candidates)")]
query = '''SELECT * FROM candidates'''
viz_tables(candidate_cols, query)
Do the following:¶
- Load the data in the
contributorstable by loading the data from thecontributors.txtfile. You are not allowed to use afor-loop in this task (comprehensions are allowed). - Display the contributors table.
Step 2: Various Queries¶
We can query our database for entries with certain characteristics. For
example, we can query the candidates table for entries whose middle name
fields are not empty.
query = '''SELECT * FROM candidates WHERE middle_init <> ""'''
viz_tables(candidate_cols, query)
We can also see how many entries satisfy the query:
print("{} candidates have a middle initial.".format(viz_tables(candidate_cols, query).shape[0]))
This used the shape method on the returned pandas table. You'll get to
practice counting in SQL later.
Do the following queries:¶
- Display the contributors where the state is "PA"
- Display the contributors where the amount contributed is greater than $\$1000.00$.
- Display the contributors from the state "UT" where the amount contributed is greater than $\$1000.00$.
- Display the contributors who didn't list their state
- Hint: Match
stateto the empty string
- Hint: Match
- Display the contributors from "WA" or "PA"
- Display the contributors who contributed between $\$100.00$ and $\$200.00$.
- Hint: You can use the
BETWEEN 100.00 and 200.00clause.
- Hint: You can use the
Step 3: Sorting¶
It could be beneficial to sort by one of the attributes in the database. The following cell contains a basic sorting demo. Run it and try to understand what happened.
query = '''SELECT * FROM candidates ORDER BY id DESC'''
viz_tables(candidate_cols, query)
Do the following sorts on the contributors table:¶
- Sort the
candidatestable bylast_name. - Sort the
contributorstable by theamountin descending order whereamountis restricted to be between $\$1000.00$ and $\$5000.00$.- Hint: In your SQL command, start with getting the amount between the specified range followed by the sort. This will all be done in one line.
- Sort the contributors who donated between $\$1000.00$ and $\$5000.00$ by
candidate_idand then byamountin descending order.- Hint: Multiple orderings can be accomplished by separating requests
after
ORDER BYwith commas. - e.g.
ORDER BY amount ASC, last_name DESC
- Hint: Multiple orderings can be accomplished by separating requests
after
Step 4: Selecting Columns¶
So far, we've been selecting all columns from a table (i.e. SELECT * FROM).
Often, we just want to select specific columns (e.g. SELECT amount FROM).
query = '''SELECT last_name, party FROM candidates'''
viz_tables(['last_name', 'party'], query)
Using the DISTINCT clause, you remove duplicate rows.
query = '''SELECT DISTINCT party FROM candidates'''
viz_tables(['party'], query)
Do the following:¶
- Get the first and last name of contributors. Make sure each row has distinct values.
Step 5: Altering Tables¶
The ALTER clause allows us to modify tables in our database. Here, we add a
new column to our candidates table called full_name.
cursor.execute('''ALTER TABLE candidates ADD COLUMN full_name TEXT''')
candidate_cols = [col[1] for col in cursor.execute("PRAGMA table_info(candidates)")]
viz_tables(candidate_cols, '''SELECT * FROM candidates''')
What if we want to rename or delete a column? It can't be done with SQLite
with a single command. We need to follow some roundabout steps (see SQLite
ALTER TABLE). We won't
consider this case at the moment.
For now, let's put a few commands together to populate the full_name column.
candidate_cols = [col[1] for col in cursor.execute("PRAGMA table_info(candidates)")] # regenerate columns with full_name
query = '''SELECT id, last_name, first_name FROM candidates''' # Select a few columns
full_name_and_id = [(attr[1] + ", " + attr[2], attr[0]) for attr in cursor.execute(query).fetchall()] # List of tuples: (full_name, id)
update = '''UPDATE candidates SET full_name = ? WHERE id = ?''' # Update the table
for rows in full_name_and_id:
cursor.execute(update, rows)
query = '''SELECT * FROM candidates'''
viz_tables(candidate_cols, query)
Here's another update, this time on an existing column.
update = '''UPDATE candidates SET full_name = "WINNER" WHERE last_name = "Obama"'''
cursor.execute(update)
update = '''UPDATE candidates SET full_name = "RUNNER-UP" WHERE last_name = "McCain"'''
cursor.execute(update)
viz_tables(candidate_cols, query)
Do the following:¶
- Add a new column to the contributors table called
full_name. The value in that column should be in the formlast_name, first_name. - Change the value in the
full_namecolumn to the string"Too Much"if someone donated more than $\$1000.00$.
Step 6: Aggregation¶
You can perform reduction operations on the values in the database. For example, you can compute the maximum, minimum, sum or the total number from multiple input values. Here's a little example:
contributor_cols = [col[1] for col in cursor.execute("PRAGMA table_info(contributors)")] # You've already done this part in your solution.
function = '''SELECT *, MAX(amount) AS max_amount FROM contributors'''
viz_tables(contributor_cols, function)
Do the following:¶
- Modify the demo to only output the max amount.
- Hints:
- Instead of using
SELECT *, MAX(amount) ...you can trySELECT MAX(amount) ... - You will want to use
cursor.execute()here and thenfetchall(). - Do not display your results in a table. It is sufficient to write the answer out to the screen as a single number.
- Instead of using
- Hints:
- Count how many donations there were above $\$1000.00$.
- Hint: There is a
COUNTfunction.
- Hint: There is a
- Calculate the average positive donation.
- Hint: There is an
AVGfunction.
- Hint: There is an
- Calculate the average contribution from each state and display in a table.
Restrict to positive values again.
- Hint: Use code that looks like:
"SELECT state,SUM(amount) FROM contributors GROUP BY state".
- Hint: Use code that looks like:
Step 7: DELETE¶
We have already noted that SQLite can't drop columns in a straightforward
manner. However, it can delete rows quite simply. Here's the syntax:
deletion = '''DELETE FROM table_name WHERE condition'''
query = '''SELECT * FROM candidates LIMIT 3'''
viz_tables(candidate_cols, query)
query = '''SELECT * FROM candidates LIMIT 4 OFFSET 5'''
viz_tables(candidate_cols, query)
query = '''SELECT * FROM candidates ORDER BY last_name LIMIT 4 OFFSET 5'''
viz_tables(candidate_cols, query)
Do the following:¶
- Query and display the ten most generous donors.
- Query and display the ten least generous donors who donated a positive amount of money (since the data we have has some negative numbers in it...).
Save¶
Don't forget to save all of these changes to your database using db.commit().
Before closing your editor or IDE, be sure to close the database connection with
db.close().