{
"cells": [
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"# Lecture 10\n",
"## Tuesday, October 8th 2019\n",
"### Automatic Differentiation: The Forward Mode"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "subslide"
}
},
"source": [
"#### References\n",
"* [A Hitchhikerâ€™s Guide to Automatic Differentiation](https://link.springer.com/article/10.1007/s11075-015-0067-6)\n",
"* Griewank, A. and Walther, A., 2008. Evaluating derivatives: principles and techniques of algorithmic differentiation (Vol. 105). Siam.\n",
"* Nocedal, J. and Wright, S., 2001. Numerical Optimization, Second Edition. Springer."
]
},
{
"cell_type": "code",
"execution_count": 1,
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"outputs": [],
"source": [
"# Load libraries that we'll need\n",
"import matplotlib.pyplot as plt\n",
"import numpy as np\n",
"%matplotlib notebook\n",
"# or inline\n",
"\n",
"def f(x):\n",
" # Hard-coded f(x)\n",
" return x - np.exp(-2.0 * np.sin(4.0*x) * np.sin(4.0*x))\n",
"\n",
"def dfdx(x):\n",
" # Hard-coded \"Jacobian matrix\" of f(x)\n",
" return 1.0 + 16.0 * np.exp(-2.0 * np.sin(4.0*x) * np.sin(4.0*x)) * np.sin(4.0*x) * np.cos(4.0*x)\n",
"\n",
"def dfdx_h(x, epsilon):\n",
" return (f(x + epsilon) - f(x)) / epsilon"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"# Introduction and Motivation"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "subslide"
}
},
"source": [
"Differentiation is one of the most important operations in science. Finding extrema of functions and determining zeros of functions are central to optimization. Numerically solving differential equations forms a cornerstone of modern science and engineering and is intimately linked with predictive science."
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "subslide"
}
},
"source": [
"# Recap\n",
"Last time:\n",
"* Provided introduction and motivation for automatic differentiation (AD)\n",
" - Root-finding (Newton's method)\n",
" - The Jacobian\n",
" - The finite difference method"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "subslide"
}
},
"source": [
"# Today\n",
"1. The basics of AD\n",
" - The computational graph\n",
" - Evaluation trace\n",
" - Computing derivatives of a function of one variable using the forward mode\n",
" - Computing derivatives of a function of two variables using the forward mode"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "subslide"
}
},
"source": [
"2. Just beyond the basics\n",
" - The Jacobian in forward mode\n",
" - What the forward mode actually computes\n",
" - Dual numbers\n",
" - Towards implementation"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "subslide"
}
},
"source": [
"### Note on Exercises\n",
"Today's exercises are mainly on pencil and paper. You do not need to turn these in."
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"## Automatic Differentiation: The Mechanics and Basic Ideas\n",
"In the introduction, we motivated the need for computational techniques to compute derivatives. The focus in the introduction was on the finite difference method, but we also computed a symbolic derivative. The finite difference approach is nice because it is quick and easy. However, it suffers from accuracy and stability problems. On the other hand, symbolic derivatives can be evaluated to machine precision, but can be costly to evaluate. We'll have more to say about cost of symbolic differentiation later."
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "subslide"
}
},
"source": [
"Automatic differentiation (AD) overcomes both of these deficiencies. It is less costly than symbolic differentiation while evaluating derivatives to machine precision. There are two modes of automatic differentiation: forward and reverse. This course will be primarily concerned with the forward mode. Time-permitting, we will give an introduction to the reverse mode. In fact, the famous backpropagation algorithm from machine learning is a special case of the reverse mode of automatic differentiation."
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "subslide"
}
},
"source": [
"## Review of the Chain Rule\n",
"\n",
"At the heart of AD is the famous *chain rule* from calculus."
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "subslide"
}
},
"source": [
"### Back to the Beginning\n",
"Suppose we have a function $h\\left(u\\left(t\\right)\\right)$ and we want the derivative of $h$ with respect to $t$. The derivative is $$\\dfrac{\\partial h}{\\partial t} = \\dfrac{\\partial h}{\\partial u}\\dfrac{\\partial u}{\\partial t}.$$ This is the rule that we used in symbolically computing the derivative of the function $f\\left(x\\right) = x - \\exp\\left(-2\\sin^{2}\\left(4x\\right)\\right)$ last time."
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "subslide"
}
},
"source": [
"### Adding an Argument\n",
"Now suppose $h$ has another argument so that we have $h\\left(u\\left(t\\right),v\\left(t\\right)\\right)$. Once again, we want the derivative of $h$ with respect to $t$. Applying the chain rule in this case gives\n",
"\\begin{align}\n",
" \\displaystyle \n",
" \\frac{\\partial h}{\\partial t} = \\frac{\\partial h}{\\partial u}\\frac{\\partial u}{\\partial t} + \\frac{\\partial h}{\\partial v}\\frac{\\partial v}{\\partial t}.\n",
"\\end{align}"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "subslide"
}
},
"source": [
"### The Gradient\n",
"What if we replace $t$ by a vector $x\\in\\mathbb{R}^{m}$? Now we want the gradient of $h$ with respect to $x$. We write $h = h\\left(u\\left(x\\right),v\\left(x\\right)\\right)$ and the derivative is now \n",
"\\begin{align}\n",
" \\nabla_{x} h = \\frac{\\partial h}{\\partial u}\\nabla u + \\frac{\\partial h}{\\partial v} \\nabla v\n",
"\\end{align}\n",
"where we have written $\\nabla_{x}$ on the left hand side to avoid any confusion with arguments. The gradient operator on the right hand side is clearly with respect to $x$ since $u$ and $v$ have no other arguments."
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "subslide"
}
},
"source": [
"### The (Almost) General Rule\n",
"In general $h = h\\left(y\\left(x\\right)\\right)$ where $y\\in\\mathbb{R}^{n}$ and $x\\in\\mathbb{R}^{m}$. Now $h$ is a function of possibly $n$ other functions themselves a function of $m$ variables. The gradient of $h$ is now given by \n",
"\\begin{align}\n",
" \\nabla_{x}h = \\sum_{i=1}^{n}{\\frac{\\partial h}{\\partial y_{i}}\\nabla y_{i}\\left(x\\right)}.\n",
"\\end{align}"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "subslide"
}
},
"source": [
"The next section introduces the computational graph of our example function. Following that, the derivative of our example function is evaluated using the forward and reverse modes of AD."
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "subslide"
}
},
"source": [
"## The Computational Graph\n",
"Consider again the example function $$f\\left(x\\right) = x - \\exp\\left(-2\\sin^{2}\\left(4x\\right)\\right).$$ We'd like to evalute $f$ at the point $a$. \n",
"\n",
"* First, we'll generate the \"evaluation trace\".\n",
"* Then we'll generate the \"evaluation graph\". In the graph, we indicate the input value as $x$ and the output value as $f$. Note that $x$ should take on whatever value you want it to."
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "subslide"
}
},
"source": [
"Let's find $f\\left(\\dfrac{\\pi}{16}\\right)$. The evaluation trace looks like:\n",
"\n",
"| Trace | Elementary Operation | Numerical Value |\n",
"| :------: | :----------------------: | :------------------------------: |\n",
"| $x_{1}$ | $\\dfrac{\\pi}{16}$ | $\\dfrac{\\pi}{16}$ |\n",
"| $x_{2}$ | $4x_{1}$ | $\\dfrac{\\pi}{4}$ |\n",
"| $x_{3}$ | $\\sin\\left(x_{2}\\right)$ | $\\dfrac{\\sqrt{2}}{2}$ |\n",
"| $x_{4}$ | $x_{3}^{2}$ | $\\dfrac{1}{2}$ |\n",
"| $x_{5}$ | $-2x_{4}$ | $-1$ |\n",
"| $x_{6}$ | $\\exp\\left(x_{5}\\right)$ | $\\dfrac{1}{e}$ |\n",
"| $x_{7}$ | $-x_{6}$ | $-\\dfrac{1}{e}$ |\n",
"| $x_{8}$ | $x_{1} + x_{7}$ | $\\dfrac{\\pi}{16} - \\dfrac{1}{e}$ |\n",
"\n",
"Recall: $f\\left(x\\right) = x - \\exp\\left(-2\\sin^{2}\\left(4x\\right)\\right).$"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "subslide"
}
},
"source": [
"\n",
"Of course, the computer holds floating point values. The value of $f\\left(\\dfrac{\\pi}{16}\\right)$ is $-0.17152990032208026$. We can check this with our function."
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {
"slideshow": {
"slide_type": "subslide"
}
},
"outputs": [
{
"data": {
"text/plain": [
"-0.17152990032208032"
]
},
"execution_count": 2,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"f(np.pi / 16.0)"
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"-0.17152990032208026"
]
},
"execution_count": 2,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"np.pi / 16.0 - 1.0 / np.exp(1.0)"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "subslide"
}
},
"source": [
"One way to visualize what is going on is to represent the evaluation trace with a graph.\n",
"\n",
"![comp-graph](../fig/Computational-Graph.png)"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "subslide"
}
},
"source": [
"## Carrying Derivatives Along\n",
"Let's compute $f^{\\prime}\\left(\\dfrac{\\pi}{16}\\right)$ where $f^{\\prime}\\left(x\\right) = \\dfrac{\\partial f}{\\partial x}$. We'll extend the table to calculate the derivatives as we go along. Note that here the overdot $\\dot{\\left(\\cdot\\right)}$ signifies a derivative of an elemental function with respect to its argument."
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "subslide"
}
},
"source": [
"| Trace | Elementary Operation | Derivative | $\\left(f\\left(a\\right), f^{\\prime}\\left(a\\right)\\right)$ |\n",
"| :------: | :----------------------: | :------------------------------: | :------------------------------: |\n",
"| $x_{1}$ | $\\dfrac{\\pi}{16}$ | $1$ | $\\left(\\dfrac{\\pi}{16}, 1\\right)$ |\n",
"| $x_{2}$ | $4x_{1}$ | $4\\dot{x}_{1}$ | $\\left(\\dfrac{\\pi}{4}, 4\\right)$ |\n",
"| $x_{3}$ | $\\sin\\left(x_{2}\\right)$ | $\\cos\\left(x_{2}\\right)\\dot{x}_{2}$ | $\\left(\\dfrac{\\sqrt{2}}{2}, 2\\sqrt{2}\\right)$ |\n",
"| $x_{4}$ | $x_{3}^{2}$ | $2x_{3}\\dot{x}_{3}$ | $\\left(\\dfrac{1}{2}, 4\\right)$ |\n",
"| $x_{5}$ | $-2x_{4}$ | $-2\\dot{x}_{4}$ | $\\left(-1, -8\\right)$ |\n",
"| $x_{6}$ | $\\exp\\left(x_{5}\\right)$ | $\\exp\\left(x_{5}\\right)\\dot{x}_{5}$ | $\\left(\\dfrac{1}{e}, - \\dfrac{8}{e}\\right)$ |\n",
"| $x_{7}$ | $-x_{6}$ | $-\\dot{x}_{6}$ | $\\left(-\\dfrac{1}{e}, \\dfrac{8}{e}\\right)$ |\n",
"| $x_{8}$ | $x_{1} + x_{7}$ | $\\dot{x}_{1} + \\dot{x}_{7}$ | $\\left(\\dfrac{\\pi}{16} - \\dfrac{1}{e}, 1 + \\dfrac{8}{e}\\right)$ |"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "subslide"
}
},
"source": [
"Believe it or not, that's all there is to forward mode. Notice that at each evaluation step, we also evaluate the derivative with the chain rule. Therefore, if our function $f\\left(x\\right)$ is composed of elemental functions for which we know the derivatives, it is a simple task to compute the derivative."
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "subslide"
}
},
"source": [
"#### Notes\n",
"* It is not necessary, or desirable, in practice to form the computational graph explicitly. We did so here simply to provide some intuition."
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "fragment"
}
},
"source": [
"* In fact, a software implementation will perform such tasks automatically."
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "fragment"
}
},
"source": [
"* Moreover, it is not necessary to store function values and derivatives at each node. Once all the children of a node have been evaluated, the parent node can discard (or overwrite) its value."
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "subslide"
}
},
"source": [
"### Comments\n",
"Let's reiterate what we just did. We had a function $f: \\mathbb{R} \\mapsto \\mathbb{R}$. This function can be written as a composition of $7$ functions $$f\\left(a\\right) = \\varphi_{7}\\left(\\varphi_{6}\\left(\\varphi_{5}\\left(\\varphi_{4}\\left(\\varphi_{3}\\left(\\varphi_{2}\\left(\\varphi_{1}\\left(a\\right)\\right)\\right)\\right)\\right)\\right)\\right).$$ The derivative becomes, by the chain rule, $$f^{\\prime}\\left(a\\right) = \\varphi_{7}^{\\prime}\\left(\\cdot\\right)\\varphi_{6}^{\\prime}\\left(\\cdot\\right)\\varphi_{5}^{\\prime}\\left(\\cdot\\right)\\varphi_{4}^{\\prime}\\left(\\cdot\\right)\\varphi_{3}^{\\prime}\\left(\\cdot\\right)\\varphi_{2}^{\\prime}\\left(\\varphi_{1}\\left(a\\right)\\cdot\\right)\\varphi_{1}^{\\prime}\\left(a\\right)$$ where the $\\left(\\cdot\\right)$ should be apparent from the context. In the evaluation trace, we computed the elemental functions $\\varphi_{i}$ as we went along with the derivatives."
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "subslide"
}
},
"source": [
"Remember: $\\left(\\varphi\\left(x\\right)\\right)^{\\prime}$ represents the derivative of the function with respect to its input.\n",
"\n",
"For example:\n",
"\\begin{align*}\n",
" \\dfrac{\\mathrm{d}}{\\mathrm{d}x}\\varphi_{2}\\left(\\varphi_{1}\\left(x\\right)\\right) = \\dfrac{\\mathrm{d}\\varphi_{2}}{\\mathrm{d}\\varphi_{1}}\\left(\\varphi_{1}\\left(x\\right)\\right)\\dfrac{\\mathrm{d}\\varphi_{1}}{\\mathrm{d}x}\\left(x\\right) = \\varphi_{2}^{\\prime}\\left(\\varphi_{1}\\left(x\\right)\\right)\\varphi_{1}^{\\prime}\\left(x\\right)\n",
"\\end{align*}"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "subslide"
}
},
"source": [
"### Exercise\n",
"You will work with the following function for this exercise,\n",
"\\begin{align}\n",
" f\\left(x,y\\right) = \\exp\\left(-\\left(\\sin\\left(x\\right) - \\cos\\left(y\\right)\\right)^{2}\\right).\n",
"\\end{align}\n",
"\n",
"1. Draw the computational graph for the function.\n",
" - Note: This graph will have $2$ inputs.\n",
"2. Create the evaluation trace for this function.\n",
"3. Use the graph / trace to evaluate $f\\left(\\dfrac{\\pi}{2}, \\dfrac{\\pi}{3}\\right)$.\n",
"3. Compute $\\dfrac{\\partial f}{\\partial x}\\left(\\dfrac{\\pi}{2}, \\dfrac{\\pi}{3}\\right)$ and $\\dfrac{\\partial f}{\\partial y}\\left(\\dfrac{\\pi}{2}, \\dfrac{\\pi}{3}\\right)$ using the forward mode of AD."
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "subslide"
}
},
"source": [
"**Hints:**\n",
"* $\\sin\\left(\\dfrac{\\pi}{3}\\right) = \\dfrac{\\sqrt{3}}{2}$, $\\cos\\left(\\dfrac{\\pi}{3}\\right) = \\dfrac{1}{2}$\n",
"* When calculating $\\dot{x}_{1}$ with respect to $y$ we say $\\dot{x}_{1} = 0$ and $\\dot{x}_{2} = 1$.\n",
"* Feel free to combine steps 3 and 4.\n",
"\n",
"Start of evaluation trace:\n",
"\n",
"| Trace | Elementary Function | Current Value | Elementary Function Derivative | $\\nabla_{x}$ Value | $\\nabla_{y}$ Value |\n",
"| :---: | :-----------------: | :-----------: | :----------------------------: | :-----------------: | :-----------------: |\n",
"| $x_{1}$ | $x_{1}$ | $\\dfrac{\\pi}{2}$ | $\\dot{x}_{1}$ | $1$ | $0$ |\n",
"| $x_{2}$ | $x_{2}$ | $\\dfrac{\\pi}{3}$ | $\\dot{x}_{2}$ | $0$ | $1$ |"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "subslide"
}
},
"source": [
"# Just Beyond the Basics"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "subslide"
}
},
"source": [
"## Comments on Notation\n",
"The \"dot\" notation is somewhat standard notation in the AD community (as much as any terminology is standard in the AD community). You should feel free to use whatever notation makes you happy in this class. Note, too, that when we start into the reverse mode, the intermediate steps are no longer denoted by a \"dot\". In the reverse mode, intermediate steps are denoted with an \"overbar\". This differing notation helps in distinguishing the two modes."
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "subslide"
}
},
"source": [
"## Towards the Jacobian Matrix\n",
"Time to take a step back and recap. In the very beginning of the lecture, we motivated calculation of derivatives by Newton's method for solving a system of nonlinear equations. One of the key elements in that method is the Jacobian matrix $J = \\dfrac{\\partial f_{i}}{\\partial x_{j}}$. So far, you have learned how to manually perform the forward mode of automatic differentiation both for scalar functions of a single variable and scalar functions of multiple variables. The solution of systems of equations requires differentation of a vector-function of multiple variables."
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "subslide"
}
},
"source": [
"### Some Notation and the Seed Vector\n",
"Note that every time we computed the derivative using the forward mode, we \"seeded\" the derivative with a value."
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "fragment"
}
},
"source": [
"We chose this value to be $1$."
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "fragment"
}
},
"source": [
"Let's look at a very simple function $f\\left(x,y\\right) = xy$."
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "subslide"
}
},
"source": [
"Clearly $\\nabla f = \\begin{bmatrix} y \\\\ x \\end{bmatrix}$ for $f\\left(x,y\\right) = xy$.\n",
"\n",
"The evaluation trace consists of $f = x_{3} = x_{1}x_{2}$.\n",
"\n",
"![](../fig/fxy_graph.png)"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "subslide"
}
},
"source": [
"Let's introduce a seed vector $p$ and recall the definition of the directional derivative to be $$D_{p}x_{3} = \\sum_{j=1}^{2}{\\dfrac{\\partial x_{3}}{\\partial x_{j}}p_{j}}.$$"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"This tells us the derivative of $x_{3}$ ($f$) in the direction of $p$."
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "fragment"
}
},
"source": [
"Expanding the sum gives\n",
"\\begin{align}\n",
" D_{p}x_{3} &= \\dfrac{\\partial x_{3}}{\\partial x_{1}}p_{1} + \\dfrac{\\partial x_{3}}{\\partial x_{2}}p_{2} \\\\\n",
" &= x_{2}p_{1} + x_{1}p_{2}.\n",
"\\end{align}"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "subslide"
}
},
"source": [
"Two choices for $p$:\n",
"1. $p = \\left(1,0\\right)$ gives us $\\dfrac{\\partial f}{\\partial x}$\n",
"2. $p = \\left(0,1\\right)$ gives us $\\dfrac{\\partial f}{\\partial y}$"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "subslide"
}
},
"source": [
"Of course, we aren't required to choose the seed vectors to be unit vectors. However, the utility should be apparent."
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "fragment"
}
},
"source": [
"Note that this could be written as $$D_{p}x_{3} = \\nabla x_{3}\\cdot p.$$ So, **the forward mode of AD is really computing the *product* of the gradient of our function with the seed vector**!"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "skip"
}
},
"source": [
"## Towards the Jacobian\n",
"Time to take a step back and recap. In the very beginning of the lecture, we motivated calculation of derivatives by Newton's method for solving a system of nonlinear equations. One of the key elements in that method is the Jacobian matrix $J = \\dfrac{\\partial f_{i}}{\\partial x_{j}}$. So far, you have learned how to manually perform the forward mode of automatic differentiation both for scalar functions of a single variable and scalar functions of multiple variables. The solution of systems of equations requires differentation of a vector-function of multiple variables.\n",
"\n",
"### Some Notation and the Seed Vector\n",
"Note that every time we computed the derivative using the forward mode, we \"seeded\" the derivative with a value. We chose this value to be $1$. Let's look at a very simple function $f\\left(x,y\\right) = xy$. Clearly $$\\nabla f = \\begin{bmatrix} y \\\\ x \\end{bmatrix}.$$ The evaluation trace consists of $f = x_{3} = x_{1}x_{2}$ where $x_{1}$ and $x_{2}$ take on the values at the point we wish to evaluate the function and its derivative. Let's introduce a seed vector $p$ and define the directional derivative to be $$D_{p}x_{3} = \\sum_{j=1}^{2}{\\dfrac{\\partial x_{3}}{\\partial x_{j}}p_{j}}.$$ Expanding the sum gives\n",
"\\begin{align}\n",
" D_{p}x_{3} &= \\dfrac{\\partial x_{3}}{\\partial x_{1}}p_{1} + \\dfrac{\\partial x_{3}}{\\partial x_{2}}p_{2} \\\\\n",
" &= x_{2}p_{1} + x_{1}p_{2}.\n",
"\\end{align}\n",
"Notice that if we choose the seed vector to be $p = \\left(1,0\\right)$ we get $\\dfrac{\\partial f}{\\partial x}$ and if we choose $p = \\left(0,1\\right)$ we get $\\dfrac{\\partial f}{\\partial y}$. Of course, we aren't required to choose the seed vectors to be unit vectors. However, the utility should be apparent.\n",
"\n",
"Note that this could be written as $$D_{p}x_{3} = \\nabla x_{3}\\cdot p.$$ So, the forward mode of AD is really computing the *product* of the gradient of our function with the seed vector!\n",
"\n",
"### What Forward AD Actually Computes\n",
"The forward mode of automatic differentiation actually computes the product $\\nabla f \\cdot p$. If $f$ is a vector, then the forward mode actually computes $Jp$ where $J = \\dfrac{\\partial f_{i}}{\\partial x_{j}}, \\quad i = 1,\\ldots, n, \\quad j = 1,\\ldots,m$ is the Jacobian matrix. In many applications, we really only want the \"action\" of the Jacobian on a vector. That is, we just want to compute the matrix-vector product $Jp$ for some vector $p$. Of course, if we really want to form the entire Jacobian, this is readily performed by forming a set of $m$ unit seed vectors $\\left\\{p_{1},\\ldots p_{m}\\right\\}$ where $p_{k}$ consists of all zeros except for a single $1$ in position $k$."
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "subslide"
}
},
"source": [
"### What Forward AD Actually Computes\n",
"The forward mode of automatic differentiation actually computes the product $\\nabla f \\cdot p$."
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "fragment"
}
},
"source": [
"If $f$ is a vector, then the forward mode actually computes $Jp$ where $J = \\dfrac{\\partial f_{i}}{\\partial x_{j}}, \\quad i = 1,\\ldots, n, \\quad j = 1,\\ldots,m$ is the Jacobian matrix."
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "fragment"
}
},
"source": [
"In many applications, we really only want the \"action\" of the Jacobian on a vector. That is, we just want to compute the matrix-vector product $Jp$ for some vector $p$."
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "subslide"
}
},
"source": [
"Of course, if we really want to form the entire Jacobian, this is readily performed by forming a set of $m$ unit seed vectors $\\left\\{p_{1},\\ldots p_{m}\\right\\}$ where $p_{k}\\in \\mathbb{R}^{m}$ consists of all zeros except for a single $1$ in position $k$."
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "subslide"
}
},
"source": [
"### Exercise\n",
"That was a lot of jargon. Let's do a little example to give the basic ideas. Let\n",
"\\begin{align}\n",
" f\\left(x,y\\right) = \n",
" \\begin{bmatrix}\n",
" xy + \\sin\\left(x\\right) \\\\\n",
" x + y + \\sin\\left(xy\\right)\n",
" \\end{bmatrix}.\n",
"\\end{align}"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "subslide"
}
},
"source": [
"The computational graph will have two inputs $x$ and $y$ and two outputs, this time $x_{7}$ and $x_{8}$.\n",
"\n",
"In the notation below $x_{7}$ corresponds to $f_{1}$ and $x_{8}$ corresponds to $f_{2}$."
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "subslide"
}
},
"source": [
"Let's start by computing the gradient of $f_{1}$. Using the directional derivative we have \n",
"\\begin{align}\n",
" D_{p}x_{7} = \\left(\\cos\\left(x_{1}\\right) + x_{2}\\right)p_{1} + x_{1}p_{2}.\n",
"\\end{align}\n",
"**Note:** You should fill in the details!\n",
"- Draw the graph\n",
"- Write out the trace\n",
"- You can do this by evaluating the function at $\\left(a,b\\right)$, although this isn't strictly necessary for this problem."
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "subslide"
}
},
"source": [
"Choosing $p = (1,0)$ gives $\\dfrac{\\partial f_{1}}{\\partial x}$ and choosing $p = (0,1)$ gives $\\dfrac{\\partial f_{1}}{\\partial y}$."
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "fragment"
}
},
"source": [
"These form the first row of the Jacobian! The second row of the Jacobian can be computed similarly by working with $x_{8}$."
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "fragment"
}
},
"source": [
"The take-home message is that **we can form the full Jacobian by using $m$ seed vectors** where $m$ is the number of independent variables."
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "fragment"
}
},
"source": [
"This is **independent** of the number of functions!"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "skip"
}
},
"source": [
"### Exercise\n",
"That was a lot of jargon. Let's do a little example to give the basic ideas. Let\n",
"\\begin{align}\n",
" f\\left(x,y\\right) = \n",
" \\begin{bmatrix}\n",
" xy + \\sin\\left(x\\right) \\\\\n",
" x + y + \\sin\\left(xy\\right)\n",
" \\end{bmatrix}.\n",
"\\end{align}\n",
"The computational graph will have two inputs $x$ and $y$ and two outputs this time $x_{7}$ and $x_{8}$. In the notation below $x_{7}$ corresponds to $f_{1}$ and $x_{8}$ corresponds to $f_{2}$. Let's start by computing the gradient of $f_{1}$. Using the directional derivative we have \n",
"\\begin{align}\n",
" D_{p}x_{7} = \\left(\\cos\\left(x_{1}\\right) + x_{2}\\right)p_{1} + x_{1}p_{2}.\n",
"\\end{align}\n",
"**Note:** You should fill in the details!\n",
"Choosing $p = (1,0)$ gives $\\dfrac{\\partial f_{1}}{\\partial x}$ and choosing $p = (0,1)$ gives $\\dfrac{\\partial f_{1}}{\\partial y}$. These form the first row of the Jacobian! The second row of the Jacobian can be computed similarly by working with $x_{8}$.\n",
"\n",
"The take-home message is that we can form the full Jacobian by using $m$ seed vectors where $m$ is the number of independent variables."
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"# Automatic Differentiation and Dual Numbers\n",
"A dual number is an extension of the real numbers. Written out, the form looks similar to a complex number."
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "subslide"
}
},
"source": [
"## Review of Complex Numbers\n",
"Recall that a complex number has the form $$z = a + ib$$ where we *define* the number $i$ so that $i^{2} = -1$."
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "fragment"
}
},
"source": [
"No real number has this property but it is a useful property for a number to have. Hence the introduction of complex numbers."
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "fragment"
}
},
"source": [
"Visually, you can think of a real number as a number lying on a straight line. Then, we \"extend\" the real line \"up\". The new axis is called the *imaginary* axis."
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "subslide"
}
},
"source": [
"Complex numbers have several properties that we can use.\n",
"* Complex conjugate: $z^{*} = a - ib$.\n",
"* Magnitude of a complex number: $\\left|z\\right|^{2} = zz^{*} = \\left(a+ib\\right)\\left(a-ib\\right) = a^{2} + b^{2}$.\n",
"* Polar form: $z = \\left|z\\right|\\exp\\left(i\\theta\\right)$ where $\\displaystyle \\theta = \\tan^{-1}\\left(\\dfrac{b}{a}\\right)$."
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "subslide"
}
},
"source": [
"## Towards Dual Numbers\n",
"A dual number has a real part and a dual part. We write $$z = a + \\epsilon b$$ and refer to $b$ as the dual part."
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "fragment"
}
},
"source": [
"We *define* the number $\\epsilon$ so that $\\epsilon^{2} = 0$."
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "fragment"
}
},
"source": [
"**This does not mean that $\\epsilon$ is zero!** $\\epsilon$ is not a real number."
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "subslide"
}
},
"source": [
"#### Some properties of dual numbers:\n",
"* Conjugate: $z^{*} = a - \\epsilon b$.\n",
"* Magnitude: $\\left|z\\right|^{2} = zz^{*} = \\left(a+\\epsilon b\\right)\\left(a-\\epsilon b\\right) = x^{2}$.\n",
"* Polar form: $z = a\\left(1 + \\dfrac{b}{a}\\right)$."
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "subslide"
}
},
"source": [
"### Example\n",
"Recall that the derivative of $y=x^{2}$ is $y^{\\prime} = 2x$."
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "fragment"
}
},
"source": [
"Now if we extend $x$ so that it has a real part and a dual part ($x\\leftarrow a + \\epsilon b$) and evaluate $y$ we have\n",
"\\begin{align}\n",
" y &= \\left(a + \\epsilon b\\right)^{2} \\\\\n",
" &= x^{2} + 2ab\\epsilon + \\underbrace{b^{2}\\epsilon^{2}}_{=0} \\\\\n",
" &= a^{2} + 2ab\\epsilon.\n",
"\\end{align}"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "subslide"
}
},
"source": [
"#### Notice that the dual part contains the derivative of our function evaluated at $a$!!"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "subslide"
}
},
"source": [
"### Example\n",
"Evaluate $y = \\sin\\left(x\\right)$ when $x\\leftarrow a + \\epsilon b$.\n",
"\n",
"We have\n",
"\\begin{align}\n",
" y & = \\sin\\left(a + \\epsilon b\\right) \\\\\n",
" & = \\sin\\left(a\\right)\\cos\\left(\\epsilon b\\right) + \\cos\\left(a\\right)\\sin\\left(\\epsilon b\\right).\n",
"\\end{align}"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "subslide"
}
},
"source": [
"Expanding $\\cos$ and $\\sin$ in their Taylor series gives \n",
"\\begin{align}\n",
" \\sin\\left(\\epsilon b\\right) &= \\sum_{n=0}^{\\infty}{\\left(-1\\right)^{n}\\dfrac{\\left(\\epsilon b\\right)^{2n+1}}{\\left(2n+1\\right)!}} = \\epsilon b + \\dfrac{\\left(\\epsilon b\\right)^{3}}{3!} + \\cdots = \\epsilon b \\\\\n",
" \\cos\\left(\\epsilon b\\right) &= \\sum_{n=0}^{\\infty}{\\left(-1\\right)^{n}\\dfrac{\\left(\\epsilon b\\right)^{2n}}{\\left(2n\\right)!}} = 1 + \\dfrac{\\left(\\epsilon b\\right)^{2}}{2} + \\cdots = 1.\n",
"\\end{align}\n",
"Note that the definition of $\\epsilon$ was used which resulted in the collapsed sum."
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "subslide"
}
},
"source": [
"So we see that \n",
"\\begin{align}\n",
" y & = \\sin\\left(a\\right) + \\cos\\left(a\\right) b \\epsilon.\n",
"\\end{align}\n",
"And once again the real component is the function and the dual component is the derivative."
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "skip"
}
},
"source": [
"# Automatic Differentiation and Dual Numbers\n",
"A dual number is an extension of the real numbers. Written out, the form looks similar to a complex number.\n",
"\n",
"## Review of Complex Numbers\n",
"Recall that a complex number has the form $$z = a + ib$$ where we *define* the number $i$ so that $i^{2} = -1$. No real number has this property but it is a useful property for a number to have. Hence the introduction of complex numbers. Visually, you can think of a real number as a number lying on a straight line. Then, we \"extend\" the real line \"up\". The new axis is called the *imaginary* axis.\n",
"\n",
"Complex numbers have several properties that we can use.\n",
"* Complex conjugate: $z^{*} = a - ib$.\n",
"* Magnitude of a complex number: $\\left|z\\right|^{2} = zz^{*} = \\left(a+ib\\right)\\left(a-ib\\right) = a^{2} + b^{2}$.\n",
"* Polar form: $z = \\left|z\\right|\\exp\\left(i\\theta\\right)$ where $\\displaystyle \\theta = \\tan^{-1}\\left(\\dfrac{b}{a}\\right)$.\n",
"\n",
"## Towards Dual Numbers\n",
"A dual number has a real part and a dual part. We write $$z = a + \\epsilon b$$ and refer to $b$ as the dual part. We *define* the number $\\epsilon$ so that $\\epsilon^{2} = 0$. **This does not mean that $\\epsilon$ is zero!** $\\epsilon$ is not a real number.\n",
"\n",
"#### Some properties of dual numbers:\n",
"* Conjugate: $z^{*} = a - \\epsilon b$.\n",
"* Magnitude: $\\left|z\\right|^{2} = zz^{*} = \\left(a+\\epsilon b\\right)\\left(a-\\epsilon b\\right) = x^{2}$.\n",
"* Polar form: $z = a\\left(1 + \\dfrac{b}{a}\\right)$.\n",
"\n",
"### Example\n",
"Recall that the derivative of $y=x^{2}$ is $y^{\\prime} = 2x$.\n",
"\n",
"Now if we extend $x$ so that it has a real part and a dual part ($x\\leftarrow a + \\epsilon a$) and evaluate $y$ we have\n",
"\\begin{align}\n",
" y &= \\left(a + \\epsilon b\\right)^{2} \\\\\n",
" &= a^{2} + 2ab\\epsilon + \\underbrace{b^{2}\\epsilon^{2}}_{=0} \\\\\n",
" &= a^{2} + 2ab\\epsilon.\n",
"\\end{align}\n",
"#### Notice that the dual part contains the derivative of our function!!\n",
"\n",
"### Example\n",
"Evaluate $y = \\sin\\left(x\\right)$ when $x\\leftarrow a + \\epsilon b$.\n",
"\n",
"We have\n",
"\\begin{align}\n",
" y & = \\sin\\left(a + \\epsilon b\\right) \\\\\n",
" & = \\sin\\left(a\\right)\\cos\\left(\\epsilon b\\right) + \\cos\\left(a\\right)\\sin\\left(\\epsilon b\\right).\n",
"\\end{align}\n",
"Expanding $\\cos$ and $\\sin$ in their Taylor series gives \n",
"\\begin{align}\n",
" \\sin\\left(\\epsilon b\\right) &= \\sum_{n=0}^{\\infty}{\\left(-1\\right)^{n}\\dfrac{\\left(\\epsilon b\\right)^{2n+1}}{\\left(2n+1\\right)!}} = \\epsilon b + \\dfrac{\\left(\\epsilon b\\right)^{3}}{3!} + \\cdots = \\epsilon b \\\\\n",
" \\cos\\left(\\epsilon b\\right) &= \\sum_{n=0}^{\\infty}{\\left(-1\\right)^{n}\\dfrac{\\left(\\epsilon b\\right)^{2n}}{\\left(2n\\right)!}} = 1 + \\dfrac{\\left(\\epsilon b\\right)^{2}}{2} + \\cdots = 1.\n",
"\\end{align}\n",
"Note that the definition of $\\epsilon$ was used which resulted in the collapsed sum.\n",
"\n",
"So we see that \n",
"\\begin{align}\n",
" y & = \\sin\\left(a\\right) + \\cos\\left(a\\right) b \\epsilon.\n",
"\\end{align}\n",
"And once again the real component is the function and the dual component is the derivative."
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "subslide"
}
},
"source": [
"### Exercise\n",
"Using dual numbers, find the derivative of $$ y = e^{x^{2}}.$$ **Show your work!**"
]
}
],
"metadata": {
"celltoolbar": "Slideshow",
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.6.7"
}
},
"nbformat": 4,
"nbformat_minor": 2
}