CS-109B Advanced Data Science
Lab 11: Generative Adversarial Networks¶
Harvard University
Spring 2019
Lab instructor: Srivatsan Srinivasan
Instructors: Pavlos Protopapas and Mark Glickman
Authors: Srivatsan Srinivasan, Pavlos Protopapas
import numpy
from keras.datasets import imdb
from keras.models import Sequential, Model
from keras.layers.core import Dense, Dropout
from keras.layers import LSTM, SimpleRNN, Input
from keras.layers.embeddings import Embedding
from keras.layers import Flatten
from keras.preprocessing import sequence
from keras.layers.convolutional import Conv1D
from keras.layers.convolutional import MaxPooling1D
from keras.layers.embeddings import Embedding
import numpy as np
import matplotlib.pyplot as plt
import scipy
# fix random seed for reproducibility
numpy.random.seed(1)
Learning Goals¶
In this lab we will look at Generative Adversarial Networks (GANs), their construction and training.
By the end of this lab, you should:
- know how to put together the building blocks used in GANs
- have a good undertanding of generative models and implicit distributions that generators learn
- learning properties of GAN at a small scale
- concepts of adversarial training - min-max etc.
- mode collapse problems in GAN
EXERCISE 1 : Generate 1-D Gaussian Distribution from Uniform Noise¶
In this exercise, we are going to generate 1-D Gaussian distribution from a n-D uniform distribution. This is a toy exercise in order to understand the ability of GANs (generators) to generate arbitrary distributions from random noise.
Generate training data - Gaussian Distribution
def generate_data(n_samples = 10000,n_dim=1):
return np.random.randn(n_samples, n_dim)
Let us define a function that gives you a keras model of general feedforward network based on the parameters.
#INPUT is of input dim,, goes through n_layers number of hidden layers and output is of output_dim
def set_model(input_dim, output_dim, hidden_dim=64,n_layers = 1,activation='tanh',
optimizer='adam', loss = 'binary_crossentropy'):
#### YOUR CODE HERE ####
#INPUT (z) is of random_dim dimension
#OUTPUT should be a keras model - D(G(z)) - Discriminator score for the generator's images generated
#from synthetic data.
def get_gan_network(discriminator, random_dim, generator, optimizer = 'adam'):
### YOUR CODE HERE ###
Let us now write the training function for a GAN
NOISE_DIM = 10
DATA_DIM = 1
G_LAYERS = 1
D_LAYERS = 1
def train_gan(epochs=1,batch_size=128):
x_train = generate_data(n_samples=12800,n_dim=DATA_DIM)
batch_count = x_train.shape[0]/batch_size
generator = set_model(NOISE_DIM, DATA_DIM, n_layers=G_LAYERS, activation='tanh',loss = 'mean_squared_error')
discriminator = set_model(DATA_DIM, 1, n_layers= D_LAYERS, activation='sigmoid')
gan = get_gan_network(discriminator, NOISE_DIM, generator, 'adam')
for e in range(1,epochs+1):
# Noise is generated from a uniform distribution
noise = np.random.rand(batch_size,NOISE_DIM)
true_batch = x_train[np.random.choice(x_train.shape[0], batch_size, replace=False), :]
generated_values = generator.predict(noise)
X = np.concatenate([generated_values,true_batch])
y_dis = np.zeros(2*batch_size)
#One-sided label smoothing to avoid overconfidence. In GAN, if the discriminator depends on a small set of features to detect real images,
#the generator may just produce these features only to exploit the discriminator.
#The optimization may turn too greedy and produces no long term benefit.
#To avoid the problem, we penalize the discriminator when the prediction for any real images go beyond 0.9 (D(real image)>0.9).
y_dis[:batch_size] = 0.9
discriminator.trainable = True
###YOUR CODE HERE####
# One line : Train discriminator using train_on_batch
discriminator.trainable = False
# Train generator. Noise is generated from a uniform distribution
### YOUR CODE HERE. Couple of lines. Should call gan.train_on_batch()###
return generator, discriminator
generator, discriminator = train_gan()
Let us visualize what the generator has learned.
noise = np.random.rand(10000,NOISE_DIM)
generated_values = generator.predict(noise)
plt.hist(generated_values,bins=100)
true_gaussian = [np.random.randn() for x in range(10000)]
print('1st order moment - ', 'True : ', scipy.stats.moment(true_gaussian, 1) , ', GAN :', scipy.stats.moment(generated_values,1))
print('2nd order moment - ', 'True : ', scipy.stats.moment(true_gaussian, 2) , ', GAN :', scipy.stats.moment(generated_values,2))
print('3rd order moment - ', 'True : ', scipy.stats.moment(true_gaussian, 3) , ', GAN :', scipy.stats.moment(generated_values,3))
print('4th order moment - ', 'True : ', scipy.stats.moment(true_gaussian, 4) , ', GAN :', scipy.stats.moment(generated_values,4))
plt.show()
CONCLUSIONS¶
GANs are able to learn a generative model from arbitrary noise distributions.
Traditional GANs do not learn the higher-order moments well. Possible issues : Number of samples, approximating higher moments is hard. Usually known to under-predict higher order variances. For people interested in learning why, read more about divergence measures between distributions (particularly about Wasserstein etc.)
PLAY WITH IT WHEN YOU HAVE TIME !¶
- Try different noise dimensions and see what minimum dimension you need to learn this well.
- Try to generate multimodal distribution like a Gaussian Mixture instead of simple Gaussian and see if GAN is able to learn multimodal distributions well.
EXERCISE 2 : MNIST GAN - Learn to generate MNIST digits¶
from keras.datasets import mnist
from keras.utils import np_utils
from keras.models import Sequential, Model
from keras.layers import Input, Dense, Dropout, Activation, Flatten
from keras.layers.advanced_activations import LeakyReLU
from keras.optimizers import Adam, RMSprop
import numpy as np
import matplotlib.pyplot as plt
import random
from tqdm import tqdm_notebook
# Dataset of 60,000 28x28 grayscale images of the 10 digits, along with a test set of 10,000 images.
(X_train, Y_train), (X_test, Y_test) = mnist.load_data()
Rescale data since we are using ReLU activations. WHY ?
X_train = X_train.reshape(60000, 784)
X_test = X_test.reshape(10000, 784)
X_train = X_train.astype('float32')/255
X_test = X_test.astype('float32')/255
z_dim = 100
BUILD MODEL¶
We are using LeakyReLU activations.
We will build
a.) Generator b.) Discriminator c.) GAN
as feedforwards with multiple layers, dropout and LeakyReLU.
#@title
adam = Adam(lr=0.0002, beta_1=0.5)
#GENERATOR
g = Sequential()
#Build your generator - noise_dim -> 256 -> 512 ->1024 ->784. LeakyRelU(0.2), adam
###YOUR CODE HERE###
#DISCRIMINATOR
#Build your discriminator - 784 -> 1024 -> 512 -> 256 -> 1. LeakyRelu(0.2), adam
d = Sequential()
###YOUR CODE HERE###
#GAN
d.trainable = False
inputs = Input(shape=(z_dim, ))
hidden = g(inputs)
output = d(hidden)
gan = Model(inputs, output)
gan.compile(loss='binary_crossentropy', optimizer=adam, metrics=['accuracy'])
Let us write some visualization code.
def plot_loss(losses):
"""
@losses.keys():
0: loss
1: accuracy
"""
d_loss = [v[0] for v in losses["D"]]
g_loss = [v[0] for v in losses["G"]]
plt.figure(figsize=(10,8))
plt.plot(d_loss, label="Discriminator loss")
plt.plot(g_loss, label="Generator loss")
plt.xlabel('Epochs')
plt.ylabel('Loss')
plt.legend()
plt.show()
def plot_generated(n_ex=10, dim=(1, 10), figsize=(12, 2)):
noise = np.random.normal(0, 1, size=(n_ex, z_dim))
generated_images = g.predict(noise)
generated_images = generated_images.reshape(n_ex, 28, 28)
plt.figure(figsize=figsize)
for i in range(generated_images.shape[0]):
plt.subplot(dim[0], dim[1], i+1)
plt.imshow(generated_images[i], interpolation='nearest', cmap='gray_r')
plt.axis('off')
plt.tight_layout()
plt.show()
TRAIN THE MODEL
Generate noise, feed into generator, compare them with discriminator, train the GAN and REPEAT.
losses = {"D":[], "G":[]}
def train(epochs=1, plt_frq=1, BATCH_SIZE=128):
batchCount = int(X_train.shape[0] / BATCH_SIZE)
print('Epochs:', epochs)
print('Batch size:', BATCH_SIZE)
print('Batches per epoch:', batchCount)
for e in tqdm_notebook(range(1, epochs+1)):
if e == 1 or e%plt_frq == 0:
print('-'*15, 'Epoch %d' % e, '-'*15)
for _ in range(batchCount): # tqdm_notebook(range(batchCount), leave=False):
# Create a batch by drawing random index numbers from the training set
image_batch = X_train[np.random.randint(0, X_train.shape[0], size=BATCH_SIZE)]
# Create noise vectors for the generator
noise = np.random.normal(0, 1, size=(BATCH_SIZE, z_dim))
# Generate the images from the noise
generated_images = g.predict(noise)
X = np.concatenate((image_batch, generated_images))
# Create Y labels similar to last exercise.
### YOUR CODE HERE ####
# Train gan and disrciminator similar to last exercise.
##YOUR CODE HERE###
# Only store losses from final
losses["D"].append(d_loss)
losses["G"].append(g_loss)
# Update the plots
if e == 1 or e%plt_frq == 0:
plot_generated()
plot_loss(losses)
train(epochs=200, plt_frq=40, BATCH_SIZE=128)
TAKE-HOME EXERCISE¶
- Try to build CNN models instead of feedforwards
- Try different noise dimensions
- Try implementing some training tricks suggested in https://github.com/soumith/ganhacks and study incremental improvements.
DISCUSSION : Why can GANs potentially mode collapse ? Remember there is no guarantee of mode collapse.¶
TL;DR - There is inherently no "motivation" for generator to produce a diverse set of samples as discriminator only penalizes for producing "bad" samples. It is easy to learn a few modes than all modes of a multi modal distribution.¶
Remember, the goal of the generator G is to fool the discriminator by causing it to assign the generated sample the highest probability of being real as possible. Mathematically, G tries to minimize E_z∼p_z(z)[log(1−D(G(z)))], or in other words, to generate the point x=G(z) such that x=argmax_x D(x) (of course, we’re assuming that we hold the discriminator fixed for now; we’re merely describing the optimization objective at a given timestep). Note that this x is fixed regardless of the value of z, the input to the generator! x only depends on the discriminator at the given timestep. This means that on expectation, there exists a single fixed point that the generator thinks is the most optimal point to generate regardless of whatever input noise we feed it - there is nothing in the objective function that explicitly forces the generator to generate different samples given the input. During this training step, stochastic gradient descent - again, on expectation - would cause the generator to update its weights towards generating this ideal point.
This by itself doesn’t immediately mean mode collapse; during the entirety of the training process, mode collapse may happen only partially or not at all. Since training is a stochastic process, during the beginning stages in training the generated samples will vary depending on z and the samples drawn from the real distribution will also vary - this means that gradients backpropagated to the generator will vary between training steps depending on the generated and real samples. Moreover the discriminator, ideally, should be able to identify generator mode collapse while it’s happening and assign the collapse point a low probability to force the generator to spread out. This is why we do see training runs succeed in GAN/DCGAN-based models.
But in practice, especially in default GAN models, mode collapse happens quite often. The discriminator ends up not really forcing more diversity in the generator, so much as simply pushing the partially collapsed generator to a different part of output space - if it assigns the collapse point a low probability, the generator will simply move its collapsed distribution to focus on a new output point. And finally, in the case where the generator has actually collapsed to a single point, it can’t get out; you’ll have to restart your training. To see why this is the case, remember how I said above that the gradient updates to the generator are stochastic, because its generated outputs will vary based on z. Well, in the world where the generator is already collapsed, it will emit the same output for every z. This means that if you feed a batch of generator outputs to the discriminator and get the gradients back, the generator gradients will all essentially be identical. And they’ll all be racing towards the same maximum point x*! Which means the generator will continue to generate the same output regardless of input. Even if the discriminator identifies this and sets the point to low probability, still, the identical gradient updates will cause all outputs of the generator rushing to another fixed point. At this point your training is ruined.
Thanks : This version of the answer is from https://www.quora.com/What-causes-mode-collapse-in-GANs