{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Title :\n",
"Exercise: CS109A Olympics\n",
"\n",
"## Description :\n",
"\n",
"
\n",
"\n",
"## Data Description:\n",
"\n",
"## Instructions:\n",
"\n",
"- In this exercise, you will simulate the 100m sprint race discussed during the lecture.\n",
"- We have already defined for you a Sprinter() class which has two characteristics for each sprinter:\n",
" - Base time \n",
" - Performance variance \n",
"- Run the code cell that makes four instances of the `Sprinter()` class. You will work with those for the entire exercise.\n",
"- Call the time attribute of the helper class to get the time taken by a competitor in the actual race.\n",
"- First run the race simulation five times; you will do this by creating a dictionary with participant name as keys, and time taken in a simulated race as the values. You will sort this dictionary by values and determine the winner of the simulated race.\n",
"- Repeat the simulation of the race for 10,000 times and count who won the race for how many times. Based on this observation, you will then investigate why a particular participant won as many times?\n",
"- Repeat the simulation for 10,000 times, but this time get the distribution of times for each participant over these runs. \n",
"- Calculate the mean race time, standard deviation of the race time and the confidence interval for each participant.\n",
"- Use the helper code to observe a plot similar to the one given below:\n",
" \n",
"
\n",
"\n",
"## Hints: \n",
"\n",
"Counter()\n",
"Helps accumulating counts of objects in a certain data structure.\n",
"\n",
"np.mean()\n",
"Used to calculate the mean of an array.\n",
"\n",
"sorted()\n",
"Used to sort data.\n",
"\n",
"np.std()\n",
"Used to calculate the std deviation of an array.\n",
"\n",
"np.percentile\n",
"Used to calculate percentile of data inbetween a given range. Frequently used for calculating confidence intervals."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## CS109A Olympics : 100m dash\n",
"We are going to have 4 of our team members compete against each other in the 100m dash."
]
},
{
"cell_type": "code",
"execution_count": 1,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
""
]
},
"execution_count": 1,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# Importing libraries\n",
"import numpy as np\n",
"from time import sleep\n",
"import os\n",
"from IPython.display import clear_output\n",
"from collections import Counter\n",
"from helper import Sprinter\n",
"from helper import run_sim\n",
"import matplotlib.pyplot as plt\n",
"from prettytable import PrettyTable\n",
"plt.xkcd(scale=0,randomness=4)\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Taking a look at the competitors\n",
"Each participant has a characteristic assigned to them. The characteristic has 2 parts :\n",
"\n",
"1. Base speed : This is the time they gave in a non-competitive environment.\n",
"2. Performance variance : Based on the mood, weather and other conditions this measure determines how much a participant's time will vary."
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {},
"outputs": [],
"source": [
"# Name of sprinters\n",
"sprinters = ['Pavlos','Tale','Varshini','Hayden']\n",
"\n",
"# Defining charactersistics, ('Base pace','performance variance')\n",
"characteristics = [(13,0.25),(12.5,0.5),(12.25,1),(14.5,1)]\n",
"sprinters_dict = {}\n",
"\n",
"for idx,sprinter in enumerate(sprinters):\n",
"\n",
" # Take note of the * before characteristics\n",
" sprinters_dict[sprinter] = Sprinter(*characteristics[idx])\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Running a race\n",
"`sprinters_dict` has keys as the name of each participant, and the value as a class. The `time` attribute of the class is the time taken by that person to run a race. \n",
"- Call `sprinters_dict['Pavlos'].time` for 10 different times."
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {},
"outputs": [],
"source": [
"# Call time attribute\n",
"___\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## ⏸ Pause & Think\n",
"Run the cell above, once again. What do you observe?\n",
"\n",
"A. Output is different because the python compile memory location has changed\n",
"\n",
"B. Output is the same\n",
"\n",
"C. Output changes because it is a new sample from random process"
]
},
{
"cell_type": "code",
"execution_count": 0,
"metadata": {},
"outputs": [],
"source": [
"### edTest(test_chow0) ###\n",
"# Submit an answer choice as a string below (eg. if you choose option A put 'A')\n",
"answer = '___'\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"- Get the times for each participant by calling the `time` attribute and create a dictionary called `race`, which has the key as the name of the participant and value as the time taken by participant to run the race.\n",
"- Sort `race.items()` according to time and get the item in dictionary with the least time taken to finish and assign it to `winner`. "
]
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {},
"outputs": [],
"source": [
"### edTest(test_race) ###\n",
"# Get the times for each participant and make a dictionary\n",
"race = ___\n",
"\n",
"# Sort the items of the dictionary to get the winner\n",
"# Hint: Remember to sort by the values and not the keys\n",
"winner = ___\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Race simulation\n",
"\n",
"As you would have noticed, every time you make a new dictionary `race`, the results would differ.\n",
"\n",
"Redefine the `race` dictionary, and run the cell below for a simulation of the race! "
]
},
{
"cell_type": "code",
"execution_count": 5,
"metadata": {},
"outputs": [],
"source": [
"# Get the times for each participant and make a dictionary\n",
"race = {sprinter:dash.time for sprinter,dash in sprinters_dict.items()}\n",
"\n",
"# Sort the items of the dictionary to get the winner\n",
"winner = sorted(race.items(),key=lambda x:x[1])[0]\n",
"\n",
"# Uncomment and execute the following code\n",
"# run_sim(race,winner)\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Multiple simulations\n",
"Earlier was just one race, we want to find out who performs better over multiple races. So let's run the race 5 times\n",
"\n",
"- Run a loop for 5 times\n",
"- In each loop generate the race dictionary as done earlier, and get the winner after sorting `race.items()`\n",
"- Append name of the winners to the `winner_list`"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Keep track of everyone's timings"
]
},
{
"cell_type": "code",
"execution_count": 6,
"metadata": {},
"outputs": [],
"source": [
"# Run the simulation and append winners to the winner_list\n",
"# Create an empty list\n",
"winner_list = []\n",
"\n",
"# Run a simulation for 5 loops\n",
"for simulation in range(5):\n",
"\n",
" # Create a race dictionary\n",
" race = {k:v.time for k,v in sprinters_dict.items()}\n",
"\n",
" # Sort the items\n",
" winner = sorted(race.items(),key=lambda x:x[1])[0]\n",
"\n",
" # Append the name of the winner to winners_list\n",
" winner_list.append(winner)\n",
" \n",
"# Take a look at the winners list\n",
"winner_list \n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Even more simulations\n",
"\n",
"We will run 10,000 simulations and use the `Counter` to see who wins how many times.\n",
"\n",
"Check the hints for how to use `Counter()`."
]
},
{
"cell_type": "code",
"execution_count": 7,
"metadata": {},
"outputs": [],
"source": [
"# Run the simulation and append winners to the winner_list\n",
"# Create an empty list\n",
"winner_list = []\n",
"\n",
"# Run a simulation for 10000 loops\n",
"for simulation in range(10000):\n",
"\n",
" # Create race dictionary\n",
" race = {k:v.time for k,v in sprinters_dict.items()}\n",
"\n",
" # Sort the items\n",
" winner = sorted(race.items(),key=lambda x:x[1])[0]\n",
"\n",
" # Append the name of the winner to winners_list\n",
" winner_list.append(winner[0])\n",
" \n",
"# Display first 5 entries from winner_list\n",
"winner_list___ \n"
]
},
{
"cell_type": "code",
"execution_count": 8,
"metadata": {},
"outputs": [],
"source": [
"### edTest(test_wins) ###\n",
"# Get the counts for each person winning the race\n",
"# Hint: Use counter, look at the hints \n",
"wins = Counter(winner_list)\n",
"\n",
"# Print wins to see the output of the simulation\n",
"print(___)\n"
]
},
{
"cell_type": "code",
"execution_count": 9,
"metadata": {},
"outputs": [],
"source": [
"# Helper code to plot the wins of each sprinter\n",
"plt.bar(list(wins.keys()),list(wins.values()),alpha=0.5)\n",
"plt.xlabel('Sprinters')\n",
"plt.ylabel('Race wins',rotation=0,labelpad=30)\n",
"plt.show();\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Why is Varshini winning so much ?\n",
"\n",
"Let us analyze why exactly is Varshini winning so frequently in our simulations.\n",
"But first, we will need to record the sprint timings for each sprinter in every simulation."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We will again run 10,000 simulations but this time record the individual sprint timings for each simulation instead.\n",
"\n",
"- Make a new dictionary `race_results` with keys as the name of sprinters and the value as an empty list. We will append race results to this list after each simulation.\n",
"- Run a simulation loop for 10000 times\n",
"- In each simulation loop over `sprinters_dict.items()` and for each participant:\n",
" - Calculate time by calling `.time` \n",
" - `append` time to the list for particular key of `race_results`"
]
},
{
"cell_type": "code",
"execution_count": 10,
"metadata": {},
"outputs": [],
"source": [
"# Run the earlier simulation loop for 10000 times\n",
"# Loop over the sprinters_dict items and for each participant\n",
"# Call time and append to the corresponding list in race_results\n",
"\n",
"race_results= {k:[] for k in sprinters_dict.keys()}\n",
"for simulation in range(10000):\n",
" for sprinter,dash in sprinters_dict.items():\n",
"\n",
" # For a given participant call the .time attribute\n",
" sprint_timing = dash.time\n",
" race_results[sprinter].append(sprint_timing) \n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Sample mean $\\bar{x}$ sample standard deviation $s$\n",
"\n",
"\n",
"Now we have a list of times given by each participant. We have the complete distribution, so let's calculate the mean, standard deviation and confidence interval.\n",
"\n",
"As discussed in the lecture, if we have a given sample, we can quickly compute the mean and standard deviation using `np.mean()` and `np.std()`.\n",
"\n",
"Let's begin with the race results for `Pavlos`."
]
},
{
"cell_type": "code",
"execution_count": 11,
"metadata": {},
"outputs": [],
"source": [
"# Using the race_results dictionary, find the mean\n",
"# and std for 'Pavlos'\n",
"pavlos_mean = ___\n",
"pavlos_std = ___\n",
"print(f'The average pace of Pavlos is {pavlos_mean:.2f} and the sample std is {pavlos_std:2f}')\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Sample mean $\\bar{x}$ sample standard deviation $s$ for all sprinters\n",
"\n",
"For each sprinter in the `race_results` dicitionary, find the mean and standard deviation of the 10,000 simulations using the `np.mean()` and `np.std()` functions.\n",
"\n",
"Store your findings in a new dictionary called `race_stats`."
]
},
{
"cell_type": "code",
"execution_count": 12,
"metadata": {},
"outputs": [],
"source": [
"# Calculate mean and std of each participant\n",
"\n",
"# Initialize an empty dictionary\n",
"race_stats = {}\n",
"\n",
"# Loop over race_results.keys()\n",
"for sprinter in race_results.keys():\n",
" sprinter_mean = np.mean(race_results[sprinter])\n",
" sprinter_std = np.std(race_results[sprinter])\n",
"\n",
" # Store it as a list [mean,std] corresponding to each \n",
" # participant key in race_stats\n",
" race_stats[sprinter] = [sprinter_mean,sprinter_std]\n"
]
},
{
"cell_type": "code",
"execution_count": 13,
"metadata": {},
"outputs": [],
"source": [
"# Use the helper code below to print your findings\n",
"pt = PrettyTable()\n",
"\n",
"pt.field_names = [\"Sprinter\", \"Sample mean\", \"Sample std\"]\n",
"\n",
"for sprinter,stats in race_stats.items():\n",
" pt.add_row([sprinter, round(stats[0],3),round(stats[1],3)])\n",
"\n",
"print(pt)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Confidence Interval\n",
"Confidence interval is the range of values for which we can claim a certain confidence level(95% mostly). The confidence interval represents values for the population parameter for which the difference between the parameter and the observed estimate is not significant at the 5% level.\n",
"\n",
"- Calculate the 95% CI by getting `np.percentile` at 2.5 and 97.5.\n",
"- Calculate and append these to the list of stats in the `race_stats` dictionary, for each participant"
]
},
{
"cell_type": "code",
"execution_count": 14,
"metadata": {},
"outputs": [],
"source": [
"#By using the race_results dictionary defined above,\n",
"# Find the 2.5 and 97.5 percentile of Tale's race runs.\n",
"# Hint : Use race_results['Tale's']\n",
"CI = np.percentile(___,[___,___])\n",
"print(f'The 95% confidence interval for Tale is {round(CI[0],2),round(CI[1],2)}')"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Confidence intervals for all sprinters.\n",
"\n",
"Let's repeat the above for each sprinter.\n",
"You will add this information to your `race_stats` dictionary.\n",
"\n",
"We expect you to extend stats list with the $2.5$ and the $97.5$ percentile values for each sprinter.\n",
"\n",
"For e.g., if for `Pavlos`, we have `mean=13.00`, `std=0.1`, and CI as `(12.8,13.2)`, your `race_stats['Pavlos']` must look like: `[13.00,0.1,12.8,13.2]`."
]
},
{
"cell_type": "code",
"execution_count": 15,
"metadata": {},
"outputs": [],
"source": [
"# Repeat the same as above, but for every sprinter\n",
"# run through the race_results dictionary for each sprinter\n",
"# find the confidence interval, and add it to the race_stats dictionary \n",
"# defined above\n",
"\n",
"for sprinter,runs in race_results.items():\n",
" ci = np.percentile(runs,[2.5,97.5])\n",
"\n",
" # Hint: You can use the .extend() method to add it to the \n",
" # existing list of stats\n",
" race_stats[sprinter].extend(ci)\n"
]
},
{
"cell_type": "code",
"execution_count": 16,
"metadata": {},
"outputs": [],
"source": [
"# Use the helper code below to print your findings\n",
"pt = PrettyTable()\n",
"\n",
"pt.field_names = [\"Sprinter\", \"Sample mean\", \"Sample std\",\"95% CI\"]\n",
"\n",
"for sprinter,stats in race_stats.items():\n",
" mean = round(stats[0],3)\n",
" std = round(stats[1],3)\n",
" confidence_interval = (round(stats[2],3),round(stats[3],3))\n",
" pt.add_row([sprinter, mean,std,confidence_interval])\n",
"\n",
"print(pt)\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Histogram plot for each sprinter\n",
"\n",
"Run the following cell to get a cool plot for distribution of times."
]
},
{
"cell_type": "code",
"execution_count": 17,
"metadata": {},
"outputs": [],
"source": [
"# Helper code to plot the distribution of times\n",
"fig = plt.gcf()\n",
"fig.set_size_inches(10,6)\n",
"bins = np.linspace(10, 17, 50)\n",
"\n",
"for sprinter,runs in race_results.items():\n",
" height, bins, patches = plt.hist(runs, bins, alpha=0.5, \\\n",
" label=sprinter,density=True,edgecolor='k')\n",
" plt.fill_betweenx([0, height.max()], race_stats[sprinter][2], race_stats[sprinter][3], alpha=0.2)\n",
"plt.legend(loc='upper left',fontsize=16)\n",
"plt.xlabel('Seconds')\n",
"plt.ylabel('Frequency',rotation=0,labelpad=25)\n",
"ax = plt.gca()\n",
"ax.spines['right'].set_visible(False)\n",
"ax.spines['top'].set_visible(False)\n",
"ax.set_title('Time distribution for sprinters')\n",
"plt.show()\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## ⏸ Pause & Think\n",
"\n",
"Take a look at the histograms for each participant and comment on why do you think is Varshini winning more races?"
]
},
{
"cell_type": "code",
"execution_count": 0,
"metadata": {},
"outputs": [],
"source": [
"### edTest(test_chow1) ###\n",
"# Write your answer as a string below\n",
"answer = '___'\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## ⏸ What **one parameter** should Tale change in order to win more races?\n",
"\n",
"**Note : Pick one that is most influential**\n",
"\n",
"A. Improve consistency\n",
"\n",
"B. Reduce base time\n",
"\n",
"C. Increase base time\n",
"\n",
"D. Relax and hydrate before the race"
]
},
{
"cell_type": "code",
"execution_count": 0,
"metadata": {},
"outputs": [],
"source": [
"### edTest(test_chow2) ###\n",
"# Submit an answer choice as a string below (eg. if you choose option A put 'A')\n",
"answer = '___'\n"
]
},
{
"cell_type": "code",
"execution_count": 0,
"metadata": {},
"outputs": [],
"source": [
"# Before you click mark, please comment out the run_sim function above\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 👩🏻🎓 Bonus (Not graded)\n",
"\n",
"Find out who among has would have the most podium finishes (top 3)."
]
},
{
"cell_type": "code",
"execution_count": 18,
"metadata": {},
"outputs": [],
"source": [
"# Your code here\n"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.8.5"
}
},
"nbformat": 4,
"nbformat_minor": 5
}