Key Word(s): Automatic differentiation, Forward mode, Dual numbers
References¶
- A Hitchhiker’s Guide to Automatic Differentiation
- Griewank, A. and Walther, A., 2008. Evaluating derivatives: principles and techniques of algorithmic differentiation (Vol. 105). Siam.
- Nocedal, J. and Wright, S., 2001. Numerical Optimization, Second Edition. Springer.
# Load libraries that we'll need
import matplotlib.pyplot as plt
import numpy as np
%matplotlib notebook
# or inline
def f(x):
# Hard-coded f(x)
return x - np.exp(-2.0 * np.sin(4.0*x) * np.sin(4.0*x))
def dfdx(x):
# Hard-coded "Jacobian matrix" of f(x)
return 1.0 + 16.0 * np.exp(-2.0 * np.sin(4.0*x) * np.sin(4.0*x)) * np.sin(4.0*x) * np.cos(4.0*x)
def dfdx_h(x, epsilon):
return (f(x + epsilon) - f(x)) / epsilon
Recap¶
Last time:
- The basics of AD
- The computational graph
- Evaluation trace
- Computing derivatives of a function of one variable using the forward mode
- Computing derivatives of a function of two variables using the forward mode
- The Jacobian in forward mode
- What the forward mode actually computes
Today¶
- Dual numbers
- Towards implementation
Automatic Differentiation and Dual Numbers¶
A dual number is an extension of the real numbers. Written out, the form looks similar to a complex number.
Review of Complex Numbers¶
Recall that a complex number has the form $$z = a + ib$$ where we define the number $i$ so that $i^{2} = -1$.
No real number has this property but it is a useful property for a number to have. Hence the introduction of complex numbers.
Visually, you can think of a real number as a number lying on a straight line. Then, we "extend" the real line "up". The new axis is called the imaginary axis.
Complex numbers have several properties that we can use.
- Complex conjugate: $z^{*} = a - ib$.
- Magnitude of a complex number: $\left|z\right|^{2} = zz^{*} = \left(a+ib\right)\left(a-ib\right) = a^{2} + b^{2}$.
- Polar form: $z = \left|z\right|\exp\left(i\theta\right)$ where $\displaystyle \theta = \tan^{-1}\left(\dfrac{b}{a}\right)$.
Towards Dual Numbers¶
A dual number has a real part and a dual part. We write $$z = a + \epsilon b$$ and refer to $b$ as the dual part.
We define the number $\epsilon$ so that $\epsilon^{2} = 0$.
This does not mean that $\epsilon$ is zero! $\epsilon$ is not a real number.
Some properties of dual numbers:¶
- Conjugate: $z^{*} = a - \epsilon b$.
- Magnitude: $\left|z\right|^{2} = zz^{*} = \left(a+\epsilon b\right)\left(a-\epsilon b\right) = a^{2}$.
- Polar form: $z = a\left(1 + \dfrac{b}{a}\right)$.
Example¶
Recall that the derivative of $y=x^{2}$ is $y^{\prime} = 2x$.
Now if we extend $x$ so that it has a real part and a dual part ($x\leftarrow a + \epsilon b$) and evaluate $y$ we have \begin{align} y &= \left(a + \epsilon b\right)^{2} \\ &= a^{2} + 2ab\epsilon + \underbrace{b^{2}\epsilon^{2}}_{=0} \\ &= a^{2} + 2ab\epsilon. \end{align}
Notice that the dual part contains the derivative of our function evaluated at $a$!!¶
Example¶
Evaluate $y = \sin\left(x\right)$ when $x\leftarrow a + \epsilon b$.
We have \begin{align} y & = \sin\left(a + \epsilon b\right) \\ & = \sin\left(a\right)\cos\left(\epsilon b\right) + \cos\left(a\right)\sin\left(\epsilon b\right). \end{align}
Expanding $\cos$ and $\sin$ in their Taylor series gives \begin{align} \sin\left(\epsilon b\right) &= \sum_{n=0}^{\infty}{\left(-1\right)^{n}\dfrac{\left(\epsilon b\right)^{2n+1}}{\left(2n+1\right)!}} = \epsilon b + \dfrac{\left(\epsilon b\right)^{3}}{3!} + \cdots = \epsilon b \\ \cos\left(\epsilon b\right) &= \sum_{n=0}^{\infty}{\left(-1\right)^{n}\dfrac{\left(\epsilon b\right)^{2n}}{\left(2n\right)!}} = 1 + \dfrac{\left(\epsilon b\right)^{2}}{2} + \cdots = 1. \end{align} Note that the definition of $\epsilon$ was used which resulted in the collapsed sum.
So we see that \begin{align} y & = \sin\left(a\right) + \cos\left(a\right) b \epsilon. \end{align} And once again the real component is the function and the dual component is the derivative.
Automatic Differentiation and Dual Numbers¶
A dual number is an extension of the real numbers. Written out, the form looks similar to a complex number.
Review of Complex Numbers¶
Recall that a complex number has the form $$z = a + ib$$ where we define the number $i$ so that $i^{2} = -1$. No real number has this property but it is a useful property for a number to have. Hence the introduction of complex numbers. Visually, you can think of a real number as a number lying on a straight line. Then, we "extend" the real line "up". The new axis is called the imaginary axis.
Complex numbers have several properties that we can use.
- Complex conjugate: $z^{*} = a - ib$.
- Magnitude of a complex number: $\left|z\right|^{2} = zz^{*} = \left(a+ib\right)\left(a-ib\right) = a^{2} + b^{2}$.
- Polar form: $z = \left|z\right|\exp\left(i\theta\right)$ where $\displaystyle \theta = \tan^{-1}\left(\dfrac{b}{a}\right)$.
Towards Dual Numbers¶
A dual number has a real part and a dual part. We write $$z = a + \epsilon b$$ and refer to $b$ as the dual part. We define the number $\epsilon$ so that $\epsilon^{2} = 0$. This does not mean that $\epsilon$ is zero! $\epsilon$ is not a real number.
Some properties of dual numbers:¶
- Conjugate: $z^{*} = a - \epsilon b$.
- Magnitude: $\left|z\right|^{2} = zz^{*} = \left(a+\epsilon b\right)\left(a-\epsilon b\right) = a^{2}$.
- Polar form: $z = a\left(1 + \dfrac{b}{a}\right)$.
Example¶
Recall that the derivative of $y=x^{2}$ is $y^{\prime} = 2x$.
Now if we extend $x$ so that it has a real part and a dual part ($x\leftarrow a + \epsilon a$) and evaluate $y$ we have \begin{align} y &= \left(a + \epsilon b\right)^{2} \\ &= a^{2} + 2ab\epsilon + \underbrace{b^{2}\epsilon^{2}}_{=0} \\ &= a^{2} + 2ab\epsilon. \end{align}
Notice that the dual part contains the derivative of our function!!¶
Example¶
Evaluate $y = \sin\left(x\right)$ when $x\leftarrow a + \epsilon b$.
We have \begin{align} y & = \sin\left(a + \epsilon b\right) \\ & = \sin\left(a\right)\cos\left(\epsilon b\right) + \cos\left(a\right)\sin\left(\epsilon b\right). \end{align} Expanding $\cos$ and $\sin$ in their Taylor series gives \begin{align} \sin\left(\epsilon b\right) &= \sum_{n=0}^{\infty}{\left(-1\right)^{n}\dfrac{\left(\epsilon b\right)^{2n+1}}{\left(2n+1\right)!}} = \epsilon b + \dfrac{\left(\epsilon b\right)^{3}}{3!} + \cdots = \epsilon b \\ \cos\left(\epsilon b\right) &= \sum_{n=0}^{\infty}{\left(-1\right)^{n}\dfrac{\left(\epsilon b\right)^{2n}}{\left(2n\right)!}} = 1 + \dfrac{\left(\epsilon b\right)^{2}}{2} + \cdots = 1. \end{align} Note that the definition of $\epsilon$ was used which resulted in the collapsed sum.
So we see that \begin{align} y & = \sin\left(a\right) + \cos\left(a\right) b \epsilon. \end{align} And once again the real component is the function and the dual component is the derivative.
Exercise 1¶
Using dual numbers, find the derivative of $$ y = e^{x^{2}}.$$ Show your work!
You do not need to turn this exercise in.
Thoughts on Implementation¶
There are different ways of implementing automatic differentiation. Two predominant approaches are:
- code translation through pre-processor directives and
- operator-overloading.
We won't much to say about code translation. It can be quite complicated, even going so far as to write a compiler. The high-level idea is that the compiler writes additional code to compute derivatives of functions. The benefits are that it can be very efficient and fast.
Our approach will use operator overloading.
Operator Overloading¶
The Wikipedia article on operator overloading states:
...operator overloading, sometimes termed operator ad hoc polymorphism, is a specific case of polymorphism, where different operators have different implementations depending on their arguments.
You have already seen the basics of this although we didn't call it operator overloading. Python
allows operator overloading via the special methods. When you wrote the circle class, some of you tried to define the __add__
method (and __radd__
for communtivity) to add two circle objects. In this way, you overloaded the additon operator (+
).
Example¶
Let's create a class to do complex numbers again.
# Complex class again
class Complex():
def __init__(self, a, b):
# Constructor to set up real and imag parts
self.a = a
self.b = b
# Create some complex numbers
z1 = Complex(0, 1)
z2 = Complex(2, 0)
# And add them
z = z1 + z2
print(z.a, z.b)
That didn't work. python
doesn't know how to apply the addition operator (+
) to Complex
objects! Let's remedy that.
# Complex class again
class Complex():
def __init__(self, a, b):
# Constructor to st up real and imag parts
self.a = a
self.b = b
def __add__(self, other):
# Implementating complex addition
return Complex(self.a + other.a, self.b + other.b)
z1 = Complex(0, 1)
z2 = Complex(2, 0)
z = z1 + z2
print(z.a, z.b)
Okay, much better. Of course, there are still many problems here:
- No convenient way to print out a complex number (where's
__str__
?) - What happens when you try to add a complex number to a real number (e.g.
2 + z
)?- Try it!
- Then check out
__radd__
. - This is how you get commutivity.
- Many operations are missing (e.g. where's multiplication?).
Nevertheless, the addition operator has been (partly) successfully overloaded (we didn't get commutivity yet).
Comments on __add__
¶
- Python calls
__add__
to add two object instancesa
andb
. - When you write
a+b
Python really doesa.__add__(b)
. - So if you define
__add__
, you can add things to and object. - But, you can't add the object to things!
More Comments on __add__
¶
- Suppose you have
object1
andobject2
. These both have__add__
defined. - So there's no problem for Python to call
object1.__add__(object2)
. - But if you tried to do
object1 + 1.0
you would (probably) get an error because thefloat
type probably doesn't have the same kind of addition asobject1
. - So you would have to enhance
__add__
with duck-typing viatry-except
blocks.- Homework!
Comments on __radd__
¶
- But wait, there's more! Suppose you successfully support both
object1 + object2
andobject1 + 1.0
. - Many things in life are commutative, so you'd really like to be able to do
1.0 + object1
. - So Python calls
1.0.__add__(object1)
and you get an attribute error! - To solve this, you must use
__radd__
- Homework!
- First Python tries
1.0.__add__(object1)
and when that fails it callsobject1.__radd__(1.0)
. - If
__radd__
just calls__add__
, then all your bases are covered.
See Emulating Numeric Types for more.
Implementation with the Duals¶
This section will give you a hint on how you may go about implementing the forward mode. I will be deliberately terse. In fact, I will not even define a class or use any operator overloading. What I'm about to show you is almost purely conceptual. You will do the hard work of coming up with a user-friendly implementation for your project.
Let's start with a baby example. Suppose at some point in our evaluation trace we have $$x_{3} = x_{1}x_{2}.$$ We want to evaluate $x_{3}$ and the derivative. Following our usual conventions we have $$\dot{x}_{3} = x_{1}\dot{x}_{2} + \dot{x}_{1}x_{2}.$$ For our purposes here, suppose we have that $x_{1} = 2$ and $x_{2} = 3$ as well as $\dot{x}_{1} = 1$ and $\dot{x}_{2} = 4$. It doesn't matter where these values came from at the moment.
Let's define an object that has two components. The first component will hold the value of the function and the second component will hold the value of the derivative.
# x1 x1 dot
x1 = [2, 1] # value and derivative
x2 = [3, 4]
It's clear how to compute the value of the function at the point $a$. Simply use the first component of our object to compute the function value.
f = [0, 0] # Allocate the values of the function and its derivative
# f(a) = x1 * x2
f[0] = x1[0] * x2[0]
print(f)
But to calculate the derivative, we need to overload multiplication. Any time we encounter a multiplication of something in the second slot, we need to use the chain rule.
#f'(a) = x1 * x2dot + x1dot * x2
f[1] = x1[0] * x2[1] + x1[1] * x2[0]
print(f)
Let's recap. We'll use OOP vernacular.
- We instantiated two gradient objects $x_{1}$ and $x_{2}$ (here just lists).
- These objects have two name attributes.
- The first attribute is the function value.
- The second attribute is the derivative.
- When performing a multiplication, the usual multiplication operator can be used for the function value part of the object.
- However, for the derivative part, one must overload the multiplication operator.
- Note that this could be accomplished in a class by implementing the
__mul__
and__rmul__
special methods and returning the gradient object (like we did for__add__
in theComplex
class example).
- Note that this could be accomplished in a class by implementing the
Exercise 2¶
Your turn. Using the same approach as above, write some code to do forward mode automatic differentiation of the function $$f\left(x\right) = x^{r}$$ where $r \in \mathbb{R}$. Evaluate the derivative and function at $a = 3$ and $r = 4$.
You can turn this into a function or closure if that helps you with the concepts.
Hints¶
- Remember, your $x$ value will actually be a data structure with two values (one for the evaluation point, and one for the derivative).
- Write out the derivative of your function by hand first. Don't forget the chain rule!
- Note that this function would be considered an elemental function and would therefore be just one little piece of your overall software.
Some Other Ideas if Time Permits¶
- Turn this into a class
- Overload some methods (e.g.
__pow__
)
Deliverables¶
- Submit your solution as
L11E2.py
.