Preparation
In order to get the most out of CS109A, knowledge of multivariate calculus, probability theory, statistics, and some basic linear algebra (e.g., matrix operations, eigenvectors, etc.) is suggested. Below are some resources for self-assessment and review:
-
Multivariate Calculus: multiple exams /w solutions
-
Probability: exams with solutions and problem sets with solutions
-
Statistics: multiple pairs of exam questions and answers Q1, A1, Q2, A2, Q3, A3
Here is a useful textbook for reviewing many of the above topics: Mathematics for Machine Learning
Note: you can be successful in the course (assignments, quizzes, etc.) with the listed pre-requisites, but some of the material presented in lectures may be more easily understood with more background.
This course dives right into core Machine Learning topics. For this, students are expected to be fluent in Python programming. You can familiarize yourself with Python by completing online tutorials. Additionally, there are free Basics Python courses that cover all necessary python topics. You are also expected to be able to manipulate data using pandas DataFrames, perform basic operations with numpy arrays, and make use of basic plotting functions (e.g. line chart, histogram, scatter plot, bar chart) from matplotlib.
Python basics
Throughout this course, we will be using Python as our primary programming language for exercises, labs, and homework. Thus, you must have basic Python programming knowledge. The following are the topics you need to cross off your checklist before the course begins:
Variables, Datatypes, strings, file operations, Data structures such as lists, dictionaries, tuples and classes.
Pandas Basics
Most of the exercises you will encounter in this course exploit various datasets. Pandas is an open-source data analysis and manipulation tool, built on top of Python. In this course, we have provided the necessary support material and resources to work with Pandas. However, it is highly recommended that you get yourselves familiar with basic data manipulation using Pandas to ensure a smooth learning experience.
Numpy Basics
NumPy is a library for the Python programming language that provides support for large, multi-dimensional arrays and matrices and a large collection of high-level mathematical functions to operate on these data structures. Because of the extensive exercises provided in this course, it is important to use Numpy for efficient problem-solving to get identical results. Though this course aims to support individuals with no prior Numpy knowledge, you must go through the basics of this library to avoid any possible hiccups.
Matplotlib Basics
A large portion of this course uses different graphs and charts to explain topics and validate results. Matplotlib is a plotting library for Python. This library has been used to create all the graphs you will see throughout the course. Additionally, the exercises and homeworks are structured in a manner that integrates this library. Henceforth, it is highly recommended to get yourselves acquainted with Matplotlib Basics.
For this course, we will be using Jupyter Notebooks. You can familiarize yourself with Jupyter notebooks by reading the following tutorials:
A Beginner’s Tutorial to Jupyter Notebooks
Finally, we assume that students have a strong foundation in calculus, linear algebra, statistics, and probability. You should review these concepts before the course begins. Here is one useful resource: