CS109B Data Science 2: Advanced Topics in Data Science

Lecture 21: Adversarial Examples

Harvard University
Spring 2019
Instructors: Pavlos Protopapas and Mark Glickman


Code and ideas taken/borrowed/adapted from Deep Learning From Basics to Practice, by Andrew Glassner, https://dlbasics.com, http://glassner.com, Python utilities for saving and loading files, mostly images and Keras model weights

Authors: Pavlos Protopapas, Amil Merchant, Alex Lin, Thomas Chang, ZiZi Zhang

In [1]:
#RUN THIS CELL 
import requests
from IPython.core.display import HTML
styles = requests.get("https://raw.githubusercontent.com/Harvard-IACS/2018-CS109A/master/content/styles/cs109.css").text
HTML(styles)
Out[1]:

Adversarial Examples

Adversarial examples for neural networks are inputs that are specifically meant to fool neural networks but be descernible to the human eye.

Contents:

  1. Datasets - MNIST
  2. Cleverhans Integration
  3. FGSM (non-targeted attack)
  4. JSMA (targeted attack)
  5. Exercise: Repeat process with CIFAR

Datasets - MNIST / CIFAR

In this examples, we will explore some of these examples and a few attack methods on common datasets such as MNIST and CIFAR10. We first load the data in through Keras.

In [2]:
import numpy as np
from keras.datasets import mnist
from keras.datasets import cifar10
import matplotlib.pyplot as plt
import tensorflow as tf
import keras

session = tf.Session()
keras.backend.set_session(session)
Using TensorFlow backend.
/Users/pavlos/anaconda3/lib/python3.6/importlib/_bootstrap.py:219: RuntimeWarning: compiletime version 3.5 of module 'tensorflow.python.framework.fast_tensor_util' does not match runtime version 3.6
  return f(*args, **kwds)
In [3]:
(mn_x_train, mn_y_train), (mn_x_test, mn_y_test) = mnist.load_data()

As a refresher, MNIST is a dataset for black and white handwritten digits. This dataset is particularly simple, as we will see, neural networks can achieve very high levels of test accuracy.

Each image is 28 x 28, for a size of 784 for each input.

In [4]:
print ("Training Examples: %d" % len(mn_x_train))
print ("Test Examples: %d" % len(mn_x_test))
Training Examples: 60000
Test Examples: 10000
In [5]:
n_classes = 10
inds=np.array([mn_y_train==i for i in range(n_classes)])
f,ax=plt.subplots(2,5,figsize=(10,5))
ax=ax.flatten()
for i in range(n_classes):
    ax[i].imshow(mn_x_train[np.argmax(inds[i])].reshape(28,28))
    ax[i].set_title(str(i))
plt.show()

Neural Network Training

Keras is a high level library which can be used to train neural network models. It simplies coding neural networks for the datasets, and as installed, uses tensorflow for the backend. We use Keras for its simplicity and because these models can easily be linked into the cleverhans library to generate adversarial examples.

We shall start with MNIST as the models and the results are easy to see. The second half of the notebook will repeat the results with CIFAR10. For MNIST, we will use a very simple neural network which takes in the 28 x 28 input, uses a single hidden layer of size 512, and goes uses dense connections to lead to the 10 output classes. This should be familiar and may seem too simple, we will will build up to more complex examples when we get to CIFAR10.

In [6]:
from keras import models
from keras import layers

network = models.Sequential()
network.add(layers.Dense(512, activation='relu', input_shape=(28 * 28,)))
network.add(layers.Dense(10, activation='softmax'))

network.compile(optimizer='rmsprop',
                loss='categorical_crossentropy',
                metrics=['accuracy'])

The data must be pre-processed before analysis. We flatten the image and normalize the images so pixel values are between 0 and 1 instead of 0 and 255. The labels must also be inputted as one hot vectors, so we use built in keras functions.

In [14]:
train_images_1d = mn_x_train.reshape((60000, 28 * 28))
train_images_1d = train_images_1d.astype('float32') / 255

test_images_1d = mn_x_test.reshape((10000, 28 * 28))
test_images_1d = test_images_1d.astype('float32') / 255
In [15]:
from keras.utils import to_categorical #this just converts the labels to one-hot class
train_labels = to_categorical(mn_y_train)
test_labels = to_categorical(mn_y_test)

Training the network does not take long and can easily be done quickly on a CPU. Validation accuracy quickly rises to about 99%.

In [16]:
from keras.callbacks import ModelCheckpoint

h=network.fit(train_images_1d, 
              train_labels, 
              epochs=5, 
              batch_size=128, 
              shuffle=True, 
              callbacks=[ModelCheckpoint('tutorial_MNIST.h5',save_best_only=True)])
Epoch 1/5
60000/60000 [==============================] - 6s 107us/step - loss: 0.0283 - acc: 0.9920
Epoch 2/5
 1152/60000 [..............................] - ETA: 7s - loss: 0.0249 - acc: 0.9931
/Users/pavlos/anaconda3/lib/python3.6/site-packages/keras/callbacks.py:403: RuntimeWarning: Can save best model only with val_loss available, skipping.
  'skipping.' % (self.monitor), RuntimeWarning)
60000/60000 [==============================] - 7s 113us/step - loss: 0.0217 - acc: 0.9935
Epoch 3/5
60000/60000 [==============================] - 5s 89us/step - loss: 0.0168 - acc: 0.9949
Epoch 4/5
60000/60000 [==============================] - 6s 92us/step - loss: 0.0127 - acc: 0.9962
Epoch 5/5
60000/60000 [==============================] - 7s 114us/step - loss: 0.0097 - acc: 0.9973
In [17]:
# summarize history for accuracy
plt.plot(h.history['acc'])
plt.title('model accuracy')
plt.ylabel('accuracy')
plt.xlabel('epoch')
plt.legend(['train'], loc='upper left')
plt.show()
In [18]:
score, acc = network.evaluate(test_images_1d, 
                            test_labels,
                            batch_size=128)

print ("Test Accuracy: %.5f" % acc)
10000/10000 [==============================] - 0s 41us/step
Test Accuracy: 0.98310
In [19]:
network.save('tutorial_MNIST.h5')

Cleverhans Integration

Cleverhans is a library written by researchers on adversarial examples, many of whom are with Google Brain. The library has wrappers that allow us to take the Keras model that we just made (or an already trained model) and create adversarial examples.

If the model has already been created and we do not want to recreate it, just run the code below to reload the model.

In [20]:
from keras.models import load_model
network = load_model('tutorial_MNIST.h5')
In [21]:
#%pip install -e git+https://github.com/tensorflow/cleverhans.git#egg=cleverhans
In [22]:
from cleverhans.utils_keras import KerasModelWrapper
wrap = KerasModelWrapper(network)

FGSM

The Fast Gradient Sign Method is a non-target attack. Using the parameters in the neural network we trained, the FGSM method calculates the $\nabla$ of the cost function for the particular input. By adding this gradient to the image times some parameter $\epsilon$, we know that the cost function will increase. If the gradient is large enough, then the predictor is likely to change.

In [23]:
x = tf.placeholder(tf.float32, shape=(None, 784))
y = tf.placeholder(tf.float32, shape=(None, 10))

As mentioned above, the $\epsilon \nabla$ is added to the image to create the adversarial example. $\epsilon$ is given by fgsm_rate in the code below. We chose this value since it leads low adversarial accuracy but the images are still easily discernible by eye. Too high of a value would make the images look like random noise, but too low values would leave the adversarial accuracy very high.

In [24]:
from cleverhans.attacks import FastGradientMethod
fgsm = FastGradientMethod(wrap, sess=session)

fgsm_rate = 0.08
fgsm_params = {'eps': fgsm_rate,'clip_min': 0.,'clip_max': 1.}
adv_x = fgsm.generate(x, **fgsm_params)
adv_x = tf.stop_gradient(adv_x)
adv_prob = network(adv_x)
In [25]:
fetches = [adv_prob]
fetches.append(adv_x)
outputs = session.run(fetches=fetches, feed_dict={x:test_images_1d}) 
adv_prob = outputs[0]
adv_examples = outputs[1]
In [26]:
adv_predicted = adv_prob.argmax(1)
adv_accuracy = np.mean(adv_predicted == mn_y_test)

print("Adversarial accuracy: %.5f" % adv_accuracy)
Adversarial accuracy: 0.32230
In [27]:
n_classes = 10
f,ax=plt.subplots(2,5,figsize=(10,5))
ax=ax.flatten()
for i in range(n_classes):
    ax[i].imshow(adv_examples[i].reshape(28,28))
    ax[i].set_title("Adv: %d, Label: %d" % (adv_predicted[i], mn_y_test[i]))
plt.show()

From the examples above, we can see that many of the examples are misclassified from the true labels. Since the changes are non-targetted, the adversarial labels do not seem to show significant trends but are clearly not the numbers in the picture.

JSMA

The JSMA method is slightly more complicated than FGSM and full explanations can be found in the following paper: https://arxiv.org/abs/1511.07528. One of the main differences is that this method is that the method generates targeted examples. For example, given a 4, we could perturb the image to register as any digit we desire between 0 and 9.

In [18]:
x = tf.placeholder(tf.float32, shape=(None, 784))
y = tf.placeholder(tf.float32, shape=(None, 10))

results = np.zeros((10, 10000))
perturbations = np.zeros((10, 10000))
grid_shape = (10, 10, 28, 28, 1)
grid_data = np.zeros(grid_shape)


from cleverhans.attacks import SaliencyMapMethod
jsma = SaliencyMapMethod(wrap, sess=session)

jsma_params = {'theta': 1., 
               'gamma': 0.1,
               'clip_min': 0., 
               'clip_max': 1.,
               'y_target': None}
In [19]:
from cleverhans.utils import other_classes, grid_visual

for index in range(int(len(mn_x_test) / 100)):
    sample = test_images_1d[index: index + 1]
    current = mn_y_test[index]
    target_classes = other_classes(10, current)
    grid_data[current, current, :, :, :] = np.reshape(
            sample, (28, 28, 1))
    
    for target in target_classes:
        one_hot_target = np.zeros((1, 10))
        one_hot_target[0, target] = 1
        jsma_params['y_target'] = one_hot_target
        adv_x = jsma.generate_np(sample, **jsma_params)
        
        grid_data[target, current, :, :, :] = np.reshape(
                adv_x, (28, 28, 1))
        
    if index % 10 == 0:
        print(index)
        print(sample.shape)
        print(one_hot_target)
        print(adv_x.shape)
0
(1, 784)
[[ 0.  0.  0.  0.  0.  0.  0.  0.  0.  1.]]
(1, 784)
10
(1, 784)
[[ 0.  0.  0.  0.  0.  0.  0.  0.  0.  1.]]
(1, 784)
20
(1, 784)
[[ 0.  0.  0.  0.  0.  0.  0.  0.  1.  0.]]
(1, 784)
30
(1, 784)
[[ 0.  0.  0.  0.  0.  0.  0.  0.  0.  1.]]
(1, 784)
40
(1, 784)
[[ 0.  0.  0.  0.  0.  0.  0.  0.  0.  1.]]
(1, 784)
50
(1, 784)
[[ 0.  0.  0.  0.  0.  0.  0.  0.  0.  1.]]
(1, 784)
60
(1, 784)
[[ 0.  0.  0.  0.  0.  0.  0.  0.  0.  1.]]
(1, 784)
70
(1, 784)
[[ 0.  0.  0.  0.  0.  0.  0.  0.  0.  1.]]
(1, 784)
80
(1, 784)
[[ 0.  0.  0.  0.  0.  0.  0.  0.  0.  1.]]
(1, 784)
90
(1, 784)
[[ 0.  0.  0.  0.  0.  0.  0.  0.  0.  1.]]
(1, 784)
In [22]:
plt.figure(figsize = (80, 80))
_ = grid_visual(grid_data)

Each row in the above plot shows an example for a particular starting class / digit. Using JSMA, we were able to generate examples that adversarially were designed to be predicted as $0, 1, \dots$ where each column represents the adversarial target class.

CIFAR10

Below, you can repeat the attack mehtods described above for a neural network on CIFAR10. This library is decently larger and requires a more complex CNN to train properly.

In [23]:
import numpy as np
from keras.datasets import mnist
from keras.datasets import cifar10
import matplotlib.pyplot as plt
import tensorflow as tf
import keras

session = tf.Session()
keras.backend.set_session(session)

(c10_x_train, c10_y_train), (c10_x_test, c10_y_test) = cifar10.load_data()

As a refresher, CIFAR10 is a dataset for small color images which should be classifying into classes based on the object the image such as an airplane or boat.

Each image is 32 x 32 x 3, for a size of 3072 for each input.

In [24]:
print ("Training Examples: %d" % len(c10_x_train))
print ("Test Examples: %d" % len(c10_x_test))
Training Examples: 50000
Test Examples: 10000
In [25]:
n_classes = 10
inds=np.array([c10_y_train==i for i in range(n_classes)])
f,ax=plt.subplots(2,5,figsize=(10,5))
ax=ax.flatten()
for i in range(n_classes):
    ax[i].imshow(c10_x_train[np.argmax(inds[i])].reshape(32,32,3))
    ax[i].set_title(str(i))
plt.show()