\u001b[0;34m()\u001b[0m\n\u001b[0;32m----> 1\u001b[0;31m \u001b[0;32mprint\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mfloat_list\u001b[0m\u001b[0;34m[\u001b[0m\u001b[0;36m10\u001b[0m\u001b[0;34m]\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m",
"\u001b[0;31mIndexError\u001b[0m: list index out of range"
]
}
],
"source": [
"print(float_list[10])"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"You can find the length of a list using the built-in function `len`:"
]
},
{
"cell_type": "code",
"execution_count": 19,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"[1.0, 3.0, 5.0, 4.0, 2.0]\n"
]
},
{
"data": {
"text/plain": [
"5"
]
},
"execution_count": 19,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"print(float_list)\n",
"len(float_list)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Indexing on lists plus Slicing"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"And since Python is zero-indexed, the last element of `float_list` is"
]
},
{
"cell_type": "code",
"execution_count": 20,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"2.0"
]
},
"execution_count": 20,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"float_list[len(float_list)-1]"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"It is more idiomatic in Python to use -1 for the last element, -2 for the second last, and so on"
]
},
{
"cell_type": "code",
"execution_count": 21,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"2.0"
]
},
"execution_count": 21,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"float_list[-1]"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We can use the ``:`` operator to access a subset of the list. This is called **slicing.** "
]
},
{
"cell_type": "code",
"execution_count": 22,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"[3.0, 5.0, 4.0, 2.0]\n",
"[1.0, 3.0]\n"
]
}
],
"source": [
"print(float_list[1:5])\n",
"print(float_list[0:2])"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Below is a summary of list slicing operations:\n",
"\n",
"
"
]
},
{
"cell_type": "code",
"execution_count": 24,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"['hi', 7]"
]
},
"execution_count": 24,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"lst = ['hi', 7, 'c', 'cat', 'hello', 8]\n",
"lst[:2]"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"You can slice \"backwards\" as well:"
]
},
{
"cell_type": "code",
"execution_count": 25,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"[1.0, 3.0, 5.0]"
]
},
"execution_count": 25,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"float_list[:-2] # up to second last"
]
},
{
"cell_type": "code",
"execution_count": 26,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"[1.0, 3.0, 5.0, 4.0]"
]
},
"execution_count": 26,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"float_list[:4] # up to but not including 5th element"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"You can also slice with a stride:"
]
},
{
"cell_type": "code",
"execution_count": 27,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"[1.0, 5.0]"
]
},
"execution_count": 27,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"float_list[:4:2] # above but skipping every second element"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We can iterate through a list using a loop. Here's a **for loop.**"
]
},
{
"cell_type": "code",
"execution_count": 28,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"1.0\n",
"3.0\n",
"5.0\n",
"4.0\n",
"2.0\n"
]
}
],
"source": [
"for ele in float_list:\n",
" print(ele)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"What if you wanted the index as well?"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Use the built-in python method `enumerate`, which can be used to create a list of tuples with each tuple of the form `(index, value)`. "
]
},
{
"cell_type": "code",
"execution_count": 29,
"metadata": {
"scrolled": true
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"(0, 1.0)\n",
"(1, 3.0)\n",
"(2, 5.0)\n",
"(3, 4.0)\n",
"(4, 2.0)\n"
]
}
],
"source": [
"for i, ele in enumerate(float_list):\n",
" print(i, ele)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Appending and deleting"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We can also append items to the end of the list using the `+` operator or with `append`."
]
},
{
"cell_type": "code",
"execution_count": 30,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"[1.0, 3.0, 5.0, 4.0, 2.0, 0.333]"
]
},
"execution_count": 30,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"float_list + [.333]"
]
},
{
"cell_type": "code",
"execution_count": 31,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"float_list.append(.444)"
]
},
{
"cell_type": "code",
"execution_count": 32,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"[1.0, 3.0, 5.0, 4.0, 2.0, 0.444]\n"
]
},
{
"data": {
"text/plain": [
"6"
]
},
"execution_count": 32,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"print(float_list)\n",
"len(float_list)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Now, run the cell with `float_list.append()` a second time. Then run the subsequent cell. What happens? \n",
"\n",
"To remove an item from the list, use `del.`"
]
},
{
"cell_type": "code",
"execution_count": 33,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"[1.0, 3.0, 4.0, 2.0, 0.444]\n"
]
}
],
"source": [
"del(float_list[2])\n",
"print(float_list)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"You may also add an element (elem) in a specific position (index) in the list"
]
},
{
"cell_type": "code",
"execution_count": 34,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"[1.0, '3.14', 3.0, 4.0, 2.0, 0.444]"
]
},
"execution_count": 34,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"elem = '3.14'\n",
"index = 1\n",
"float_list.insert(index, elem)\n",
"float_list"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### List Comprehensions\n",
"\n",
"Lists can be constructed in a compact way using a *list comprehension*. Here's a simple example."
]
},
{
"cell_type": "code",
"execution_count": 35,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"[1, 4, 9, 16, 25, 36, 49, 64, 81, 100]"
]
},
"execution_count": 35,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"squaredlist = [i*i for i in int_list]\n",
"squaredlist"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"And here's a more complicated one, requiring a conditional."
]
},
{
"cell_type": "code",
"execution_count": 36,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"[8, 32, 72, 128, 200]\n"
]
}
],
"source": [
"comp_list1 = [2*i for i in squaredlist if i % 2 == 0]\n",
"print(comp_list1)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"This is entirely equivalent to creating `comp_list1` using a loop with a conditional, as below:"
]
},
{
"cell_type": "code",
"execution_count": 37,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"[8, 32, 72, 128, 200]\n"
]
}
],
"source": [
"comp_list2 = []\n",
"for i in squaredlist:\n",
" if i % 2 == 0:\n",
" comp_list2.append(2*i) \n",
"print(comp_list2)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The list comprehension syntax\n",
"\n",
"```\n",
"[expression for item in list if conditional]\n",
"\n",
"```\n",
"\n",
"is equivalent to the syntax\n",
"\n",
"```\n",
"for item in list:\n",
" if conditional:\n",
" expression\n",
"```"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
" Exercise 2: (do at home) Build a list that contains every prime number between 1 and 100, in two different ways:
\n",
" \n",
"- 2.1 Using for loops and conditional if statements.\n",
"- 2.2 **(Stretch Goal)** Using a list comprehension. You should be able to do this in one line of code. **Hint:** it might help to look up the function `all()` in the documentation."
]
},
{
"cell_type": "code",
"execution_count": 14,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"[2,\n",
" 3,\n",
" 5,\n",
" 7,\n",
" 11,\n",
" 13,\n",
" 17,\n",
" 19,\n",
" 23,\n",
" 29,\n",
" 31,\n",
" 37,\n",
" 41,\n",
" 43,\n",
" 47,\n",
" 53,\n",
" 59,\n",
" 61,\n",
" 67,\n",
" 71,\n",
" 73,\n",
" 79,\n",
" 83,\n",
" 89,\n",
" 97]"
]
},
"execution_count": 14,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"primes = []\n",
"for i in range(1,101):\n",
" if sum([(i % p) == 0 for p in primes]) > 0:\n",
" continue\n",
" if i != 1:\n",
" primes.append(i)\n",
"primes"
]
},
{
"cell_type": "code",
"execution_count": 18,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"[2,\n",
" 3,\n",
" 5,\n",
" 7,\n",
" 11,\n",
" 13,\n",
" 17,\n",
" 19,\n",
" 23,\n",
" 29,\n",
" 31,\n",
" 37,\n",
" 41,\n",
" 43,\n",
" 47,\n",
" 53,\n",
" 59,\n",
" 61,\n",
" 67,\n",
" 71,\n",
" 73,\n",
" 79,\n",
" 83,\n",
" 89,\n",
" 97]"
]
},
"execution_count": 18,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"[i for i in range(2,101) if all(i % j != 0 for j in range(2,i))]"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# %load ../solutions/exercise2_1.py\n",
"N = 100;\n",
"\n",
"# using loops and if statements\n",
"primes = [];\n",
"for j in range(2, N):\n",
" count = 0;\n",
" for i in range(2,j):\n",
" if j % i == 0:\n",
" count = count + 1;\n",
" if count == 0:\n",
" primes.append(j)\n",
"print(primes)\n"
]
},
{
"cell_type": "code",
"execution_count": 40,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": []
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# %load ../solutions/exercise2_2.py\n",
"primes_lc = [j for j in range(2, N) if all(j % i != 0 for i in range(2, j))]\n",
"\n",
"print(primes)\n",
"print(primes_lc)\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Simple Functions\n",
"\n",
"A *function* object is a reusable block of code that does a specific task. Functions are commonplace in Python, either on their own or as they belong to other objects. To invoke a function `func`, you call it as `func(arguments)`.\n",
"\n",
"We've seen built-in Python functions and methods (details below). For example, `len()` and `print()` are built-in Python functions. And at the beginning, you called `np.mean()` to calculate the mean of three numbers, where `mean()` is a function in the numpy module and numpy was abbreviated as `np`. This syntax allows us to have multiple \"mean\" functions in different modules; calling this one as `np.mean()` guarantees that we will execute numpy's mean function, as opposed to a mean function from a different module."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### User-defined functions\n",
"\n",
"We'll now learn to write our own user-defined functions. Below is the syntax for defining a basic function with one input argument and one output. You can also define functions with no input or output arguments, or multiple input or output arguments.\n",
"\n",
"```\n",
"def name_of_function(arg):\n",
" ...\n",
" return(output)\n",
"```\n",
"\n",
"We can write functions with one input and one output argument. Here are two such functions."
]
},
{
"cell_type": "code",
"execution_count": 42,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"(25, 125)"
]
},
"execution_count": 42,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"def square(x):\n",
" x_sqr = x*x\n",
" return(x_sqr)\n",
"\n",
"def cube(x):\n",
" x_cub = x*x*x\n",
" return(x_cub)\n",
"\n",
"square(5),cube(5)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"What if you want to return two variables at a time? The usual way is to return a tuple:"
]
},
{
"cell_type": "code",
"execution_count": 43,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"(25, 125)"
]
},
"execution_count": 43,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"def square_and_cube(x):\n",
" x_cub = x*x*x\n",
" x_sqr = x*x\n",
" return(x_sqr, x_cub)\n",
"\n",
"square_and_cube(5)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Lambda functions\n",
"\n",
"Often we quickly define mathematical functions with a one-line function called a *lambda* function. Lambda functions are great because they enable us to write functions without having to name them, ie, they're *anonymous*. \n",
"No return statement is needed. \n"
]
},
{
"cell_type": "code",
"execution_count": 44,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"9\n"
]
},
{
"data": {
"text/plain": [
"25"
]
},
"execution_count": 44,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# create an anonymous function and assign it to the variable square\n",
"square = lambda x: x*x\n",
"print(square(3))\n",
"\n",
"hypotenuse = lambda x, y: x*x + y*y\n",
"\n",
"## Same as\n",
"# def hypotenuse(x, y):\n",
"# return(x*x + y*y)\n",
"\n",
"hypotenuse(3,4)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Methods\n",
"A function that belongs to an object is called a *method*. By \"object,\" we mean an \"instance\" of a class (e.g., list, integer, or floating point variable).\n",
"\n",
"For example, when we invoke `append()` on an existing list, `append()` is a method.\n",
"\n",
"In other words, a *method* is a function on a specific *instance* of a class (i.e., *object*). In this example, our class is a list. `float_list` is an instance of a list (thus, an object), and the `append()` function is technically a *method* since it pertains to the specific instance `float_list`."
]
},
{
"cell_type": "code",
"execution_count": 45,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"[1.0, 2.09, 4.0, 2.0, 0.444]\n"
]
},
{
"data": {
"text/plain": [
"[1.0, 2.09, 4.0, 2.0, 0.444, 56.7]"
]
},
"execution_count": 45,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"float_list = [1.0, 2.09, 4.0, 2.0, 0.444]\n",
"print(float_list)\n",
"float_list.append(56.7) \n",
"float_list"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
" Exercise 3: (do at home) generated a list of the prime numbers between 1 and 100
\n",
" \n",
"In Exercise 2, above, you wrote code that generated a list of the prime numbers between 1 and 100. Now, write a function called `isprime()` that takes in a positive integer $N$, and determines whether or not it is prime. Return `True` if it's prime and return `False` if it isn't. Then, using a list comprehension and `isprime()`, create a list `myprimes` that contains all the prime numbers less than 100. "
]
},
{
"cell_type": "code",
"execution_count": 19,
"metadata": {},
"outputs": [],
"source": [
"# your code here\n",
"def isprime(n):\n",
" return all([n % i != 0 for i in range(2,n)])"
]
},
{
"cell_type": "code",
"execution_count": 26,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"[2,\n",
" 3,\n",
" 5,\n",
" 7,\n",
" 11,\n",
" 13,\n",
" 17,\n",
" 19,\n",
" 23,\n",
" 29,\n",
" 31,\n",
" 37,\n",
" 41,\n",
" 43,\n",
" 47,\n",
" 53,\n",
" 59,\n",
" 61,\n",
" 67,\n",
" 71,\n",
" 73,\n",
" 79,\n",
" 83,\n",
" 89,\n",
" 97]"
]
},
"execution_count": 26,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"[n for n in range(2,100) if isprime(n)]"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# %load ../solutions/exercise3.py\n",
"def isprime(N):\n",
" count = 0;\n",
" if not isinstance(N, int):\n",
" return False\n",
" if N <= 1:\n",
" return False\n",
" for i in range(2, N):\n",
" if N % i == 0:\n",
" count = count + 1;\n",
" if count == 0:\n",
" return(True)\n",
" else:\n",
" return(False)\n",
" \n",
"print(isprime(3.0), isprime(\"pavlos\"), isprime(0), isprime(-1), isprime(1), isprime(2), isprime(93), isprime(97)) \n",
"myprimes = [j for j in range(1, 100) if isprime(j)]\n",
"print(myprimes)\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Introduction to Numpy\n",
"Scientific Python code uses a fast array structure, called the numpy array. Those who have programmed in Matlab will find this very natural. For reference, the numpy documention can be found [here](https://docs.scipy.org/doc/numpy/reference/). \n",
"\n",
"Let's make a numpy array:"
]
},
{
"cell_type": "code",
"execution_count": 48,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"array([1, 2, 3, 4])"
]
},
"execution_count": 48,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"my_array = np.array([1, 2, 3, 4])\n",
"my_array"
]
},
{
"cell_type": "code",
"execution_count": 49,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"4"
]
},
"execution_count": 49,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# works as it would with a standard list\n",
"len(my_array)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The shape array of an array is very useful (we'll see more of it later when we talk about 2D arrays -- matrices -- and higher-dimensional arrays)."
]
},
{
"cell_type": "code",
"execution_count": 50,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"(4,)"
]
},
"execution_count": 50,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"my_array.shape"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Numpy arrays are **typed**. This means that by default, all the elements will be assumed to be of the same type (e.g., integer, float, String)."
]
},
{
"cell_type": "code",
"execution_count": 51,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"dtype('int64')"
]
},
"execution_count": 51,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"my_array.dtype"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Numpy arrays have similar functionality as lists! Below, we compute the length, slice the array, and iterate through it (one could identically perform the same with a list)."
]
},
{
"cell_type": "code",
"execution_count": 52,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"4\n",
"[3 4]\n",
"1\n",
"2\n",
"3\n",
"4\n"
]
}
],
"source": [
"print(len(my_array))\n",
"print(my_array[2:4])\n",
"for ele in my_array:\n",
" print(ele)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"There are two ways to manipulate numpy arrays a) by using the numpy module's methods (e.g., `np.mean()`) or b) by applying the function np.mean() with the numpy array as an argument."
]
},
{
"cell_type": "code",
"execution_count": 53,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"2.5\n",
"2.5\n"
]
}
],
"source": [
"print(my_array.mean())\n",
"print(np.mean(my_array))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"A ``constructor`` is a general programming term that refers to the mechanism for creating a new object (e.g., list, array, String).\n",
"\n",
"There are many other efficient ways to construct numpy arrays. Here are some commonly used numpy array constructors. Read more details in the numpy documentation."
]
},
{
"cell_type": "code",
"execution_count": 54,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"array([ 1., 1., 1., 1., 1., 1., 1., 1., 1., 1.])"
]
},
"execution_count": 54,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"np.ones(10) # generates 10 floating point ones"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Numpy gains a lot of its efficiency from being typed. That is, all elements in the array have the same type, such as integer or floating point. The default type, as can be seen above, is a float. (Each float uses either 32 or 64 bits of memory, depending on if the code is running a 32-bit or 64-bit machine, respectively)."
]
},
{
"cell_type": "code",
"execution_count": 55,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"8"
]
},
"execution_count": 55,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"np.dtype(float).itemsize # in bytes (remember, 1 byte = 8 bits)"
]
},
{
"cell_type": "code",
"execution_count": 56,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"array([1, 1, 1, 1, 1, 1, 1, 1, 1, 1])"
]
},
"execution_count": 56,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"np.ones(10, dtype='int') # generates 10 integer ones"
]
},
{
"cell_type": "code",
"execution_count": 57,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"array([ 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.])"
]
},
"execution_count": 57,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"np.zeros(10)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Often, you will want random numbers. Use the `random` constructor!"
]
},
{
"cell_type": "code",
"execution_count": 58,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"array([ 0.85115672, 0.37346821, 0.3298871 , 0.47496563, 0.69940192,\n",
" 0.97207796, 0.91488615, 0.36063927, 0.81240722, 0.16128617])"
]
},
"execution_count": 58,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"np.random.random(10) # uniform from [0,1]"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"You can generate random numbers from a normal distribution with mean 0 and variance 1:"
]
},
{
"cell_type": "code",
"execution_count": 59,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"The sample mean and standard devation are 0.025195 and 1.026880, respectively.\n"
]
}
],
"source": [
"normal_array = np.random.randn(1000)\n",
"print(\"The sample mean and standard devation are %f and %f, respectively.\" %(np.mean(normal_array), np.std(normal_array)))"
]
},
{
"cell_type": "code",
"execution_count": 60,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"1000"
]
},
"execution_count": 60,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"len(normal_array)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"You can sample with and without replacement from an array. Let's first construct a list with evenly-spaced values:"
]
},
{
"cell_type": "code",
"execution_count": 61,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"array([ 0. , 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1. ])"
]
},
"execution_count": 61,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"grid = np.arange(0., 1.01, 0.1)\n",
"grid"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Without replacement"
]
},
{
"cell_type": "code",
"execution_count": 62,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"array([ 0.3, 0.8, 0.7, 1. , 0. ])"
]
},
"execution_count": 62,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"np.random.choice(grid, 5, replace=False)"
]
},
{
"cell_type": "code",
"execution_count": 63,
"metadata": {},
"outputs": [
{
"ename": "ValueError",
"evalue": "Cannot take a larger sample than population when 'replace=False'",
"output_type": "error",
"traceback": [
"\u001b[0;31m---------------------------------------------------------------------------\u001b[0m",
"\u001b[0;31mValueError\u001b[0m Traceback (most recent call last)",
"\u001b[0;32m\u001b[0m in \u001b[0;36m\u001b[0;34m()\u001b[0m\n\u001b[0;32m----> 1\u001b[0;31m \u001b[0mnp\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mrandom\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mchoice\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mgrid\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0;36m20\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mreplace\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0mFalse\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m",
"\u001b[0;32mmtrand.pyx\u001b[0m in \u001b[0;36mmtrand.RandomState.choice\u001b[0;34m()\u001b[0m\n",
"\u001b[0;31mValueError\u001b[0m: Cannot take a larger sample than population when 'replace=False'"
]
}
],
"source": [
"np.random.choice(grid, 20, replace=False)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"With replacement:"
]
},
{
"cell_type": "code",
"execution_count": 64,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"array([ 0. , 1. , 0.3, 1. , 0. , 0.9, 1. , 0.7, 0.2, 0.7, 0.4,\n",
" 0.6, 0.1, 0.6, 0.4, 0.3, 0.6, 0.3, 0.8, 1. ])"
]
},
"execution_count": 64,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"np.random.choice(grid, 20, replace=True)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Tensors\n",
"\n",
"We can think of tensors as a name to include multidimensional arrays of numerical values. While tensors first emerged in the 20th century, they have since been applied to numerous other disciplines, including machine learning. In this class you will only be using **scalars**, **vectors**, and **2D arrays**, so you do not need to worry about the name 'tensor'.\n",
"\n",
"We will use the following naming conventions:\n",
"\n",
"- scalar = just a number = rank 0 tensor ($a$ ∈ $F$,)\n",
"
\n",
"- vector = 1D array = rank 1 tensor ( $x = (\\;x_1,...,x_i\\;)⊤$ ∈ $F^n$ )\n",
"
\n",
"- matrix = 2D array = rank 2 tensor ( $\\textbf{X} = [a_{ij}] ∈ F^{m×n}$ )\n",
"
\n",
"- 3D array = rank 3 tensor ( $\\mathscr{X} =[t_{i,j,k}]∈F^{m×n×l}$ )\n",
"
\n",
"- $\\mathscr{N}$D array = rank $\\mathscr{N}$ tensor ( $\\mathscr{T} =[t_{i1},...,t_{i\\mathscr{N}}]∈F^{n_1×...×n_\\mathscr{N}}$ ) \n",
"\n",
"\n",
"### Slicing a 2D array"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"
\n",
"\n",
"[source:oreilly](https://www.oreilly.com/library/view/python-for-data/9781449323592/ch04.html)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"# how do we get just the second row of the above array?"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### Numpy supports vector operations\n",
"\n",
"What does this mean? It means that instead of adding two arrays, element by element, you can just say: add the two arrays. "
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"first = np.ones(5)\n",
"second = np.ones(5)\n",
"first + second # adds in-place"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Note that this behavior is very different from python lists where concatenation happens."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"first_list = [1., 1., 1., 1., 1.]\n",
"second_list = [1., 1., 1., 1., 1.]\n",
"first_list + second_list # concatenation"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"On some computer chips, this numpy addition actually happens in parallel and can yield significant increases in speed. But even on regular chips, the advantage of greater readability is important."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### Broadcasting\n",
"\n",
"Numpy supports a concept known as *broadcasting*, which dictates how arrays of different sizes are combined together. There are too many rules to list here, but importantly, multiplying an array by a number multiplies each element by the number. Adding a number adds the number to each element."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"first + 1"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"first*5"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"This means that if you wanted the distribution $N(5, 7)$ you could do:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"normal_5_7 = 5 + 7*normal_array\n",
"np.mean(normal_5_7), np.std(normal_5_7)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Multiplying two arrays multiplies them element-by-element"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"(first +1) * (first*5)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"You might have wanted to compute the dot product instead:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"np.dot((first +1) , (first*5))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Probabilitiy Distributions from `scipy.stats` and `statsmodels`\n",
"\n",
"Two useful statistics libraries in python are `scipy` and `statsmodels`.\n",
"\n",
"For example to load the z_test:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"import statsmodels\n",
"from statsmodels.stats.proportion import proportions_ztest"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"x = np.array([74,100])\n",
"n = np.array([152,266])\n",
"\n",
"zstat, pvalue = statsmodels.stats.proportion.proportions_ztest(x, n) \n",
"print(\"Two-sided z-test for proportions: \\n\",\"z =\",zstat,\", pvalue =\",pvalue)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"#The `%matplotlib inline` ensures that plots are rendered inline in the browser.\n",
"%matplotlib inline\n",
"import matplotlib.pyplot as plt"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Let's get the normal distribution namespace from `scipy.stats`. See here for [Documentation](https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.norm.html)."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"from scipy.stats import norm"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Let's create 1,000 points between -10 and 10"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"x = np.linspace(-10, 10, 1000) # linspace() returns evenly-spaced numbers over a specified interval\n",
"x[0:10], x[-10:]"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Let's get the pdf of a normal distribution with a mean of 1 and standard deviation 3, and plot it using the grid points computed before:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"scrolled": true
},
"outputs": [],
"source": [
"pdf_x = norm.pdf(x, 1, 3)\n",
"plt.plot(x, pdf_x);"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"And you can get random variables using the `rvs` function."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Referencies\n",
"\n",
"A useful book by Jake Vanderplas: [PythonDataScienceHandbook](https://jakevdp.github.io/PythonDataScienceHandbook/).\n",
"\n",
"You may also benefit from using [Chris Albon's web site](https://chrisalbon.com) as a reference. It contains lots of useful information."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Dictionaries\n",
"A dictionary is another data structure (aka storage container) -- arguably the most powerful. Like a list, a dictionary is a sequence of items. Unlike a list, a dictionary is unordered and its items are accessed with keys and not integer positions. \n",
"\n",
"Dictionaries are the closest data structure we have to a database.\n",
"\n",
"Let's make a dictionary with a few Harvard courses and their corresponding enrollment numbers."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"enroll2017_dict = {'CS50': 692, 'CS109A / Stat 121A / AC 209A': 352, 'Econ1011a': 95, 'AM21a': 153, 'Stat110': 485}\n",
"enroll2017_dict"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"One can obtain the value corresponding to a key via:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"enroll2017_dict['CS50']"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"If you try to access a key that isn't present, your code will yield an error:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"enroll2017_dict['CS630']"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Alternatively, the `.get()` function allows one to gracefully handle these situations by providing a default value if the key isn't found:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"enroll2017_dict.get('CS630', 5)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Note, this does not _store_ a new value for the key; it only provides a value to return if the key isn't found."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"enroll2017_dict['CS630']"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"enroll2017_dict.get('C730', None)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"All sorts of iterations are supported:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"enroll2017_dict.values()"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"enroll2017_dict.items()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We can iterate over the tuples obtained above:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"for key, value in enroll2017_dict.items():\n",
" print(\"%s: %d\" %(key, value))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Simply iterating over a dictionary gives us the keys. This is useful when we want to do something with each item:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"second_dict={}\n",
"for key in enroll2017_dict:\n",
" second_dict[key] = enroll2017_dict[key]\n",
"second_dict"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The above is an actual __copy__ of _enroll2017_dict's_ allocated memory, unlike, `second_dict = enroll2017_dict` which would have made both variables label the same memory location."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"In the previous dictionary example, the keys were strings corresponding to course names. Keys don't have to be strings, though; they can be other _immutable_ data type such as numbers or tuples (not lists, as lists are mutable).\n",
"\n",
"### Dictionary comprehension: \"Do not try this at home\"\n",
"\n",
"You can construct dictionaries using a *dictionary comprehension*, which is similar to a list comprehension. Notice the brackets {} and the use of `zip` (see next cell for more on `zip`)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"float_list = [1., 3., 5., 4., 2.]\n",
"int_list = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]\n",
"\n",
"my_dict = {k:v for (k, v) in zip(int_list, float_list)}\n",
"my_dict"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### Creating tuples with `zip`\n",
"\n",
"`zip` is a Python built-in function that returns an iterator that aggregates elements from each of the iterables. This is an iterator of tuples, where the i-th tuple contains the i-th element from each of the argument sequences or iterables. The iterator stops when the shortest input iterable is exhausted. The `set()` built-in function returns a `set` object, optionally with elements taken from another iterable. By using `set()` you can make `zip` printable. In the example below, the iterables are the two lists, `float_list` and `int_list`. We can have more than two iterables."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"float_list = [1., 3., 5., 4., 2.]\n",
"int_list = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]\n",
"\n",
"viz_zip = set(zip(int_list, float_list))\n",
"viz_zip"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"type(viz_zip)"
]
}
],
"metadata": {
"anaconda-cloud": {},
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.7.3"
}
},
"nbformat": 4,
"nbformat_minor": 2
}