Lecture 11

Thursday, October 10th 2019

Automatic Differentiation: The Forward Mode

References

  • A Hitchhiker’s Guide to Automatic Differentiation
  • Griewank, A. and Walther, A., 2008. Evaluating derivatives: principles and techniques of algorithmic differentiation (Vol. 105). Siam.
  • Nocedal, J. and Wright, S., 2001. Numerical Optimization, Second Edition. Springer.
In [1]:
# Load libraries that we'll need
import matplotlib.pyplot as plt
import numpy as np
%matplotlib notebook
# or inline

def f(x):
    # Hard-coded f(x)
    return x - np.exp(-2.0 * np.sin(4.0*x) * np.sin(4.0*x))

def dfdx(x):
    # Hard-coded "Jacobian matrix" of f(x)
    return 1.0 + 16.0 * np.exp(-2.0 * np.sin(4.0*x) * np.sin(4.0*x)) * np.sin(4.0*x) * np.cos(4.0*x)

def dfdx_h(x, epsilon):
    return (f(x + epsilon) - f(x)) / epsilon

Recap

Last time:

  • The basics of AD
    • The computational graph
    • Evaluation trace
    • Computing derivatives of a function of one variable using the forward mode
    • Computing derivatives of a function of two variables using the forward mode
    • The Jacobian in forward mode
    • What the forward mode actually computes

Today

  • Dual numbers
  • Towards implementation

Automatic Differentiation and Dual Numbers

A dual number is an extension of the real numbers. Written out, the form looks similar to a complex number.

Review of Complex Numbers

Recall that a complex number has the form $$z = a + ib$$ where we define the number $i$ so that $i^{2} = -1$.

No real number has this property but it is a useful property for a number to have. Hence the introduction of complex numbers.

Visually, you can think of a real number as a number lying on a straight line. Then, we "extend" the real line "up". The new axis is called the imaginary axis.

Complex numbers have several properties that we can use.

  • Complex conjugate: $z^{*} = a - ib$.
  • Magnitude of a complex number: $\left|z\right|^{2} = zz^{*} = \left(a+ib\right)\left(a-ib\right) = a^{2} + b^{2}$.
  • Polar form: $z = \left|z\right|\exp\left(i\theta\right)$ where $\displaystyle \theta = \tan^{-1}\left(\dfrac{b}{a}\right)$.

Towards Dual Numbers

A dual number has a real part and a dual part. We write $$z = a + \epsilon b$$ and refer to $b$ as the dual part.

We define the number $\epsilon$ so that $\epsilon^{2} = 0$.

This does not mean that $\epsilon$ is zero! $\epsilon$ is not a real number.

Some properties of dual numbers:

  • Conjugate: $z^{*} = a - \epsilon b$.
  • Magnitude: $\left|z\right|^{2} = zz^{*} = \left(a+\epsilon b\right)\left(a-\epsilon b\right) = a^{2}$.
  • Polar form: $z = a\left(1 + \dfrac{b}{a}\right)$.

Example

Recall that the derivative of $y=x^{2}$ is $y^{\prime} = 2x$.

Now if we extend $x$ so that it has a real part and a dual part ($x\leftarrow a + \epsilon b$) and evaluate $y$ we have \begin{align} y &= \left(a + \epsilon b\right)^{2} \\ &= a^{2} + 2ab\epsilon + \underbrace{b^{2}\epsilon^{2}}_{=0} \\ &= a^{2} + 2ab\epsilon. \end{align}

Notice that the dual part contains the derivative of our function evaluated at $a$!!

Example

Evaluate $y = \sin\left(x\right)$ when $x\leftarrow a + \epsilon b$.

We have \begin{align} y & = \sin\left(a + \epsilon b\right) \\ & = \sin\left(a\right)\cos\left(\epsilon b\right) + \cos\left(a\right)\sin\left(\epsilon b\right). \end{align}

Expanding $\cos$ and $\sin$ in their Taylor series gives \begin{align} \sin\left(\epsilon b\right) &= \sum_{n=0}^{\infty}{\left(-1\right)^{n}\dfrac{\left(\epsilon b\right)^{2n+1}}{\left(2n+1\right)!}} = \epsilon b + \dfrac{\left(\epsilon b\right)^{3}}{3!} + \cdots = \epsilon b \\ \cos\left(\epsilon b\right) &= \sum_{n=0}^{\infty}{\left(-1\right)^{n}\dfrac{\left(\epsilon b\right)^{2n}}{\left(2n\right)!}} = 1 + \dfrac{\left(\epsilon b\right)^{2}}{2} + \cdots = 1. \end{align} Note that the definition of $\epsilon$ was used which resulted in the collapsed sum.

So we see that \begin{align} y & = \sin\left(a\right) + \cos\left(a\right) b \epsilon. \end{align} And once again the real component is the function and the dual component is the derivative.

Automatic Differentiation and Dual Numbers

A dual number is an extension of the real numbers. Written out, the form looks similar to a complex number.

Review of Complex Numbers

Recall that a complex number has the form $$z = a + ib$$ where we define the number $i$ so that $i^{2} = -1$. No real number has this property but it is a useful property for a number to have. Hence the introduction of complex numbers. Visually, you can think of a real number as a number lying on a straight line. Then, we "extend" the real line "up". The new axis is called the imaginary axis.

Complex numbers have several properties that we can use.

  • Complex conjugate: $z^{*} = a - ib$.
  • Magnitude of a complex number: $\left|z\right|^{2} = zz^{*} = \left(a+ib\right)\left(a-ib\right) = a^{2} + b^{2}$.
  • Polar form: $z = \left|z\right|\exp\left(i\theta\right)$ where $\displaystyle \theta = \tan^{-1}\left(\dfrac{b}{a}\right)$.

Towards Dual Numbers

A dual number has a real part and a dual part. We write $$z = a + \epsilon b$$ and refer to $b$ as the dual part. We define the number $\epsilon$ so that $\epsilon^{2} = 0$. This does not mean that $\epsilon$ is zero! $\epsilon$ is not a real number.

Some properties of dual numbers:

  • Conjugate: $z^{*} = a - \epsilon b$.
  • Magnitude: $\left|z\right|^{2} = zz^{*} = \left(a+\epsilon b\right)\left(a-\epsilon b\right) = a^{2}$.
  • Polar form: $z = a\left(1 + \dfrac{b}{a}\right)$.

Example

Recall that the derivative of $y=x^{2}$ is $y^{\prime} = 2x$.

Now if we extend $x$ so that it has a real part and a dual part ($x\leftarrow a + \epsilon a$) and evaluate $y$ we have \begin{align} y &= \left(a + \epsilon b\right)^{2} \\ &= a^{2} + 2ab\epsilon + \underbrace{b^{2}\epsilon^{2}}_{=0} \\ &= a^{2} + 2ab\epsilon. \end{align}

Notice that the dual part contains the derivative of our function!!

Example

Evaluate $y = \sin\left(x\right)$ when $x\leftarrow a + \epsilon b$.

We have \begin{align} y & = \sin\left(a + \epsilon b\right) \\ & = \sin\left(a\right)\cos\left(\epsilon b\right) + \cos\left(a\right)\sin\left(\epsilon b\right). \end{align} Expanding $\cos$ and $\sin$ in their Taylor series gives \begin{align} \sin\left(\epsilon b\right) &= \sum_{n=0}^{\infty}{\left(-1\right)^{n}\dfrac{\left(\epsilon b\right)^{2n+1}}{\left(2n+1\right)!}} = \epsilon b + \dfrac{\left(\epsilon b\right)^{3}}{3!} + \cdots = \epsilon b \\ \cos\left(\epsilon b\right) &= \sum_{n=0}^{\infty}{\left(-1\right)^{n}\dfrac{\left(\epsilon b\right)^{2n}}{\left(2n\right)!}} = 1 + \dfrac{\left(\epsilon b\right)^{2}}{2} + \cdots = 1. \end{align} Note that the definition of $\epsilon$ was used which resulted in the collapsed sum.

So we see that \begin{align} y & = \sin\left(a\right) + \cos\left(a\right) b \epsilon. \end{align} And once again the real component is the function and the dual component is the derivative.

Exercise 1

Using dual numbers, find the derivative of $$ y = e^{x^{2}}.$$ Show your work!

You do not need to turn this exercise in.

Thoughts on Implementation

There are different ways of implementing automatic differentiation. Two predominant approaches are:

  1. code translation through pre-processor directives and
  2. operator-overloading.

We won't much to say about code translation. It can be quite complicated, even going so far as to write a compiler. The high-level idea is that the compiler writes additional code to compute derivatives of functions. The benefits are that it can be very efficient and fast.

Our approach will use operator overloading.

Operator Overloading

The Wikipedia article on operator overloading states:

...operator overloading, sometimes termed operator ad hoc polymorphism, is a specific case of polymorphism, where different operators have different implementations depending on their arguments.

You have already seen the basics of this although we didn't call it operator overloading. Python allows operator overloading via the special methods. When you wrote the circle class, some of you tried to define the __add__ method (and __radd__ for communtivity) to add two circle objects. In this way, you overloaded the additon operator (+).

Example

Let's create a class to do complex numbers again.

In [2]:
# Complex class again
class Complex():
    def __init__(self, a, b):
        # Constructor to set up real and imag parts
        self.a = a
        self.b = b

# Create some complex numbers
z1 = Complex(0, 1)
z2 = Complex(2, 0)

# And add them
z = z1 + z2
print(z.a, z.b)
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
 in 
     11 
     12 # And add them
---> 13 z = z1 + z2
     14 print(z.a, z.b)

TypeError: unsupported operand type(s) for +: 'Complex' and 'Complex'

That didn't work. python doesn't know how to apply the addition operator (+) to Complex objects! Let's remedy that.

In [6]:
# Complex class again
class Complex():
    def __init__(self, a, b):
        # Constructor to st up real and imag parts
        self.a = a
        self.b = b
    
    def __add__(self, other):
        # Implementating complex addition
        return Complex(self.a + other.a, self.b + other.b)

z1 = Complex(0, 1)
z2 = Complex(2, 0)
z = z1 + z2
print(z.a, z.b)
2 1

Okay, much better. Of course, there are still many problems here:

  • No convenient way to print out a complex number (where's __str__?)
  • What happens when you try to add a complex number to a real number (e.g. 2 + z)?
    • Try it!
    • Then check out __radd__.
    • This is how you get commutivity.
  • Many operations are missing (e.g. where's multiplication?).

Nevertheless, the addition operator has been (partly) successfully overloaded (we didn't get commutivity yet).

Comments on __add__

  • Python calls __add__ to add two object instances a and b.
  • When you write a+b Python really does a.__add__(b).
  • So if you define __add__, you can add things to and object.
  • But, you can't add the object to things!

More Comments on __add__

  • Suppose you have object1 and object2. These both have __add__ defined.
  • So there's no problem for Python to call object1.__add__(object2).
  • But if you tried to do object1 + 1.0 you would (probably) get an error because the float type probably doesn't have the same kind of addition as object1.
  • So you would have to enhance __add__ with duck-typing via try-except blocks.
    • Homework!

Comments on __radd__

  • But wait, there's more! Suppose you successfully support both object1 + object2 and object1 + 1.0.
  • Many things in life are commutative, so you'd really like to be able to do 1.0 + object1.
  • So Python calls 1.0.__add__(object1) and you get an attribute error!
  • To solve this, you must use __radd__
    • Homework!
  • First Python tries 1.0.__add__(object1) and when that fails it calls object1.__radd__(1.0).
  • If __radd__ just calls __add__, then all your bases are covered.

See Emulating Numeric Types for more.

Implementation with the Duals

This section will give you a hint on how you may go about implementing the forward mode. I will be deliberately terse. In fact, I will not even define a class or use any operator overloading. What I'm about to show you is almost purely conceptual. You will do the hard work of coming up with a user-friendly implementation for your project.

Let's start with a baby example. Suppose at some point in our evaluation trace we have $$x_{3} = x_{1}x_{2}.$$ We want to evaluate $x_{3}$ and the derivative. Following our usual conventions we have $$\dot{x}_{3} = x_{1}\dot{x}_{2} + \dot{x}_{1}x_{2}.$$ For our purposes here, suppose we have that $x_{1} = 2$ and $x_{2} = 3$ as well as $\dot{x}_{1} = 1$ and $\dot{x}_{2} = 4$. It doesn't matter where these values came from at the moment.

Let's define an object that has two components. The first component will hold the value of the function and the second component will hold the value of the derivative.

In [4]:
#     x1   x1 dot
x1 = [2,     1] # value and derivative
x2 = [3,     4]

It's clear how to compute the value of the function at the point $a$. Simply use the first component of our object to compute the function value.

In [5]:
f = [0, 0] # Allocate the values of the function and its derivative
# f(a) = x1    * x2
f[0]   = x1[0] * x2[0]
print(f)
[6, 0]

But to calculate the derivative, we need to overload multiplication. Any time we encounter a multiplication of something in the second slot, we need to use the chain rule.

In [6]:
#f'(a) = x1    * x2dot + x1dot * x2
f[1]   = x1[0] * x2[1] + x1[1] * x2[0]
print(f)
[6, 11]

Let's recap. We'll use OOP vernacular.

  • We instantiated two gradient objects $x_{1}$ and $x_{2}$ (here just lists).
  • These objects have two name attributes.
    1. The first attribute is the function value.
    2. The second attribute is the derivative.
  • When performing a multiplication, the usual multiplication operator can be used for the function value part of the object.
  • However, for the derivative part, one must overload the multiplication operator.
    • Note that this could be accomplished in a class by implementing the __mul__ and __rmul__ special methods and returning the gradient object (like we did for __add__ in the Complex class example).

Exercise 2

Your turn. Using the same approach as above, write some code to do forward mode automatic differentiation of the function $$f\left(x\right) = x^{r}$$ where $r \in \mathbb{R}$. Evaluate the derivative and function at $a = 3$ and $r = 4$.

You can turn this into a function or closure if that helps you with the concepts.

Hints

  • Remember, your $x$ value will actually be a data structure with two values (one for the evaluation point, and one for the derivative).
  • Write out the derivative of your function by hand first. Don't forget the chain rule!
    • Note that this function would be considered an elemental function and would therefore be just one little piece of your overall software.

Some Other Ideas if Time Permits

  • Turn this into a class
  • Overload some methods (e.g. __pow__)

Deliverables

  • Submit your solution as L11E2.py.