{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Title :\n",
"Exercise: CS109A Olympics\n",
"\n",
"## Description :\n",
"\n",
"
\n",
"\n",
"## Data Description:\n",
"\n",
"## Instructions:\n",
"\n",
"- In this exercise, you will simulate the 100m sprint race discussed during the lecture.\n",
"- We have already defined for you a Sprinter() class which has two characteristics for each sprinter:\n",
" - Base time \n",
" - Performance variance \n",
"- Run the code cell that makes four instances of the `Sprinter()` class. You will work with those for the entire exercise.\n",
"- Call the time attribute of the helper class to get the time taken by a competitor in the actual race.\n",
"- First run the race simulation five times; you will do this by creating a dictionary with participant name as keys, and time taken in a simulated race as the values. You will sort this dictionary by values and determine the winner of the simulated race.\n",
"- Repeat the simulation of the race for 10,000 times and count who won the race for how many times. Based on this observation, you will then investigate why a particular participant won as many times?\n",
"- Repeat the simulation for 10,000 times, but this time get the distribution of times for each participant over these runs. \n",
"- Calculate the mean race time, standard deviation of the race time and the confidence interval for each participant.\n",
"- Use the helper code to observe a plot similar to the one given below:\n",
" \n",
"
\n",
"\n",
"## Hints: \n",
"\n",
"Counter()\n",
"Helps accumulating counts of objects in a certain data structure.\n",
"\n",
"np.mean()\n",
"Used to calculate the mean of an array.\n",
"\n",
"sorted()\n",
"Used to sort data.\n",
"\n",
"np.std()\n",
"Used to calculate the std deviation of an array.\n",
"\n",
"np.percentile\n",
"Used to calculate percentile of data inbetween a given range. Frequently used for calculating confidence intervals."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## PyDS Olymipics : 100m dash\n",
"We are going to have 4 of our team members compete against each other in the 100m dash."
]
},
{
"cell_type": "code",
"execution_count": 1,
"metadata": {},
"outputs": [],
"source": [
"# Importing libraries\n",
"import numpy as np\n",
"from time import sleep\n",
"import os\n",
"from IPython.display import clear_output\n",
"from collections import Counter\n",
"from helper import Sprinter\n",
"import matplotlib.pyplot as plt\n",
"from prettytable import PrettyTable\n",
"plt.xkcd(scale=0,randomness=4)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Taking a look at the competitors\n",
"Each participant has a characteristic assigned to him. The characteristic has 2 parts :\n",
"\n",
"1. Base speed : This is the time they gave in a non-competitive environment.\n",
"2. Performance variance : Based on the mood, weather and other conditions this measure determines how much a participant's time will vary."
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {},
"outputs": [],
"source": [
"# Name of sprinters\n",
"sprinters = ['Pavlos','Hargun','Joy','Hayden']\n",
"# Defining charactersistics, ('Base pace','performance variance')\n",
"characteristics = [(13,0.25),(12.5,0.5),(12.25,1),(14.5,1)]\n",
"sprinters_dict = {}\n",
"for idx,sprinter in enumerate(sprinters):\n",
" sprinters_dict[sprinter] = Sprinter(*characteristics[idx])"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Running a race\n",
"`sprinters_dict` has keys as the name of each participant, and the value as a class. The `time` attribute of the class is the time taken by that person to run a race. \n",
"- Call `sprinters_dict['Pavlos'].time` for 10 different times."
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {},
"outputs": [],
"source": [
"# Call time attribute\n",
"___"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"- Get the times for each participant by calling the `time` attribute.\n",
"- Create a dictionary called `race`, which has the key as the name of the participant and value as the time taken by participant to run the race.\n",
"- Sort `race.items()` according to time and get the item in dictionary with the least time taken to finish and assign it to `winner`. \n",
"\n",
"Note: The time taken by a participant to finish the race is the value of the dictionary so remember to sort by values"
]
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {},
"outputs": [],
"source": [
"# Get the times for each participant and make a dictionary\n",
"race = ___\n",
"# Then sort the items of the dictionary to get the winner\n",
"# Hint: Remember to sort by the values and not the keys\n",
"winner = ___ "
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Race simulation\n",
"\n",
"As you would have noticed, every time you make a new dictionary `race`, the results would differ.\n",
"\n",
"Redefine the `race` dictionary, and run the cell below for a simulation of the race! "
]
},
{
"cell_type": "code",
"execution_count": 5,
"metadata": {},
"outputs": [],
"source": [
"# Again get the times for each participant and make a dictionary\n",
"race = ___\n",
"# Then sort the items of the dictionary to get the winner\n",
"winner = ___\n",
"\n",
"# Execute the following code\n",
"for i in range(1,11):\n",
" clear_output(wait=True)\n",
" print(\"|START|\"+\"\\n|START|\".join(['----'*min(10,int((15*i)/race[runner]))+ ' '*(10-min(10,int((15*i)/race[runner])))+'|'+runner for runner in race.keys()]))\n",
" sleep(0.5)\n",
" \n",
"print(f'\\nThe winner is {winner[0]} with a time of {winner[1]:.2f}s!') "
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Multiple simulations\n",
"Earlier was just one race, we want to find out who performs better over multiple races. So let's run the race 5 times\n",
"\n",
"- Run a loop for 5 times\n",
"- In each loop generate the race dictionary as done earlier, and get the winner after sorting `race.items()`\n",
"- Append winners to the `winner_list`"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Keep track of everyone's timings"
]
},
{
"cell_type": "code",
"execution_count": 6,
"metadata": {},
"outputs": [],
"source": [
"# Run the simulation and append winners to the winner_list\n",
"winner_list = []\n",
"for simulation in range(5):\n",
" race = ___\n",
" winner = ___\n",
" ___\n",
" \n",
"winner_list "
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Even more simulations\n",
"\n",
"We will run 10,000 simulations and use the `Counter` to see who wins how many times.\n",
"\n",
"Check the hints for how to use `Counter()`."
]
},
{
"cell_type": "code",
"execution_count": 7,
"metadata": {},
"outputs": [],
"source": [
"# Run the simulation and append winners to the winner_list\n",
"___"
]
},
{
"cell_type": "code",
"execution_count": 8,
"metadata": {},
"outputs": [],
"source": [
"# Get the counts for each person winning the race\n",
"wins = Counter(___)\n",
"print(wins)"
]
},
{
"cell_type": "code",
"execution_count": 9,
"metadata": {},
"outputs": [],
"source": [
"# Execute the code \n",
"plt.bar(list(wins.keys()),list(wins.values()),alpha=0.5)\n",
"plt.xlabel('Sprinters')\n",
"plt.ylabel('Race wins',rotation=0,labelpad=30)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Why is Joy winning so much ?\n",
"\n",
"Let us analyze why exactly is Joy winning so frequently in our simulations.\n",
"But first, we will need to record the sprint timings for each sprinter in every simulation."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We will again run 10,000 simulations but this time record the individual sprint timings for each simulation instead.\n",
"\n",
"- Make a new dictionary `race_results` with keys as the name of sprinters and the value as an empty list. We will append race results to this list after each simulation.\n",
"- Inside the simulation loop, loop through the items of the `race_results` dictionary, and for each participant :\n",
" - Calculate time by calling `.time`\n",
" - `append` time to the list for participant in `race_results`"
]
},
{
"cell_type": "code",
"execution_count": 10,
"metadata": {},
"outputs": [],
"source": [
"# Run the earlier simulation and store all 10000 times given by a participant\n",
"# race_results has a list of times as values for a given key( i.e participant)\n",
"# So for a key it has a corresponding list of times for that participant.\n",
"\n",
"race_results= {___:___ for ___ in sprinters_dict.___}\n",
"for simulation in range(10000):\n",
" for sprinter,dash in sprinters_dict.items():\n",
" sprint_timing = ___\n",
" race_results[___].append(___) "
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Sample mean $\\bar{x}$ sample standard deviation $s$\n",
"\n",
"\n",
"Now we have a list of times given by each participant. We have the complete distribution, so let's calculate the mean, std and confidence interval.\n",
"\n",
"As discussed in the lecture, if we have a given sample, we can quickly compute the mean and standard deviation using `np.mean()` and `np.std()`.\n",
"\n",
"Let's begin with the race results for `Pavlos`."
]
},
{
"cell_type": "code",
"execution_count": 11,
"metadata": {},
"outputs": [],
"source": [
"# Using the race_results dictionary, find the mean\n",
"# and std for 'Pavlos'\n",
"pavlos_mean = np.mean(___)\n",
"pavlos_std = np.std(___)\n",
"print(f'The average pace of Pavlos is {pavlos_mean:.2f} and the sample std is {pavlos_std:2f}')"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Sample mean $\\bar{x}$ sample standard deviation $s$ for all sprinters\n",
"\n",
"For each sprinter in the `race_results` dicitionary, find the mean and standard deviation of the 10,000 simulations using the `np.mean()` and `np.std()` functions.\n",
"\n",
"Store your findings in a new dictionary called `race_stats` as a list. So the `race_stats` dictionary has a list of corresponding stats for each participant(key)"
]
},
{
"cell_type": "code",
"execution_count": 12,
"metadata": {},
"outputs": [],
"source": [
"# loop through the keys of race_results\n",
"# calculate mean and std of each participant using np.mean() and np.std()\n",
"# Assign these stats to the key, as a list\n",
"race_stats = {}\n",
"for sprinter in race_results.keys():\n",
" sprinter_mean = ___\n",
" sprinter_std = ___\n",
" race_stats[sprinter] = [___,___]"
]
},
{
"cell_type": "code",
"execution_count": 13,
"metadata": {},
"outputs": [],
"source": [
"# Use the helper code below to print your findings\n",
"pt = PrettyTable()\n",
"\n",
"pt.field_names = [\"Sprinter\", \"Sample mean\", \"Sample std\"]\n",
"\n",
"for sprinter,stats in race_stats.items():\n",
" pt.add_row([sprinter, round(stats[0],3),round(stats[1],3)])\n",
"\n",
"print(pt)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Confidence Interval\n",
"Confidence interval is the range of values for which we can claim a certain confidence level(95% mostly). The confidence interval represents values for the population parameter for which the difference between the parameter and the observed estimate is not significant at the 5% level.\n",
"\n",
"- Use `np.percentile()` to calculate the 95% CI.\n",
"- Calculate `np.percentile` at 2.5 and 97.5 to get the interval.\n",
"- Calculate and append these to the list of stats in the `race_stats` dictionary, for each participant"
]
},
{
"cell_type": "code",
"execution_count": 14,
"metadata": {},
"outputs": [],
"source": [
"#By using the race_results dictionary defined above,\n",
"# Find the 2.5 and 97.5 percentile of Hargun's race runs.\n",
"CI = np.percentile(___,[___,___])\n",
"print(f'The 95% confidence interval for Hargun is {round(CI[0],2),round(CI[1],2)}')"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Confidence intervals for all sprinters.\n",
"\n",
"Let's repeat the above for each sprinter.\n",
"You will add this information to your `race_stats` dictionary.\n",
"\n",
"We expect you to append the $2.5$ and the $97.5$ percentile values to the existing stats list for each sprinter.\n",
"\n",
"For e.g., if for `Pavlos`, we have `mean=13.00`, `std=0.1`, and CI as `(12.8,13.2)`, your `race_stats['Pavlos']` must look like: `[13.00,0.1,12.8,13.2]`."
]
},
{
"cell_type": "code",
"execution_count": 15,
"metadata": {},
"outputs": [],
"source": [
"# Now lets repeat the same, but for every sprinter\n",
"# run through the race_results dictionary for each sprinter\n",
"# find the confidence interval, and add it to the race_stats dictionary \n",
"# defined above\n",
"# Hint: You can use the .extend() method to add it to the existing list of stats\n",
"for sprinter,runs in race_results.items():\n",
" ci = np.percentile(___)\n",
" race_stats[___].___"
]
},
{
"cell_type": "code",
"execution_count": 16,
"metadata": {},
"outputs": [],
"source": [
"# Use the helper code below to print your findings\n",
"pt = PrettyTable()\n",
"\n",
"pt.field_names = [\"Sprinter\", \"Sample mean\", \"Sample std\",\"95% CI\"]\n",
"\n",
"for sprinter,stats in race_stats.items():\n",
" mean = round(stats[0],3)\n",
" std = round(stats[1],3)\n",
" confidence_interval = (round(stats[2],3),round(stats[3],3))\n",
" pt.add_row([sprinter, mean,std,confidence_interval])\n",
"\n",
"print(pt)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Histogram plot for each sprinter\n",
"\n",
"Run the following cell to get a cool plot for distribution of times."
]
},
{
"cell_type": "code",
"execution_count": 17,
"metadata": {},
"outputs": [],
"source": [
"fig = plt.gcf()\n",
"fig.set_size_inches(10,6)\n",
"bins = np.linspace(10, 17, 50)\n",
"\n",
"for sprinter,runs in race_results.items():\n",
" height, bins, patches = plt.hist(runs, bins, alpha=0.5, \\\n",
" label=sprinter,density=True,edgecolor='k')\n",
" plt.fill_betweenx([0, height.max()], race_stats[sprinter][2], race_stats[sprinter][3], alpha=0.2)\n",
"plt.legend(loc='upper left',fontsize=16)\n",
"plt.xlabel('Seconds')\n",
"plt.ylabel('Frequency',rotation=0,labelpad=25)\n",
"ax = plt.gca()\n",
"ax.spines['right'].set_visible(False)\n",
"ax.spines['top'].set_visible(False)\n",
"plt.show()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## ⏸ Take a look at the histograms for each participant and comment on why do you think Joy is winning the most races?\n",
"\n",
"A. Very consistent distribution\n",
"\n",
"B. Low base time and not a very high spread\n",
"\n",
"C. High base time but variation causes lower times to show more frequently\n",
"\n",
"D. Joy is not winning the most races\n"
]
},
{
"cell_type": "code",
"execution_count": 0,
"metadata": {},
"outputs": [],
"source": [
"### edTest(test_chow1) ###\n",
"# Submit an answer choice as a string below (eg. if you choose option A put 'A')\n",
"\n",
"answer = '___'"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## ⏸ What **one parameter** should Hargun change in order to win more races?\n",
"\n",
"A. Reduce base time\n",
"\n",
"B. Reduce consistency\n",
"\n",
"C. Relax before the race\n",
"\n",
"D. Increase consistency"
]
},
{
"cell_type": "code",
"execution_count": 0,
"metadata": {},
"outputs": [],
"source": [
"### edTest(test_chow2) ###\n",
"# Submit an answer choice as a string below (eg. if you choose option A put 'A')\n",
"\n",
"answer = '___'"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 👩🏻🎓 Bonus (Not graded)\n",
"\n",
"Find out who among has would have the most podium finishes (top 3)."
]
},
{
"cell_type": "code",
"execution_count": 18,
"metadata": {},
"outputs": [],
"source": [
"# Your code here"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.8.5"
}
},
"nbformat": 4,
"nbformat_minor": 5
}