"
]
},
"execution_count": 1,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# RUN THIS CELL TO PROPERLY HIGHLIGHT THE EXERCISES\n",
"import requests\n",
"from IPython.core.display import HTML\n",
"styles = requests.get(\"https://raw.githubusercontent.com/Harvard-IACS/2019-CS109B/master/content/styles/cs109.css\").text\n",
"HTML(styles)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Welcome to our New Virtual Classroom!! \n",
" \n",
"Chris Tanner, myself, and the lab TFs, have very much enjoyed your participation during our previous on-campus lab meetings, and will try to maintain interactivity via this new medium as best as we can. You can also do your part by:\n",
" \n",
"- using your real name, if possible, so as to recreate a classroom feeling :)\n",
"- turning off your video to conserve bandwith \n",
"- muting your microphone unless you are invited to speak \n",
"- **raising your hand** in the Chat when we invite questions\n",
"- writing comments and questions in the Chat\n",
" \n",
"If you have any questions after the lab is done, please post them on **Ed**, in the Lab7 section.\n",
" \n",
"
"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Learning Goals\n",
"\n",
"By the end of this lab, you should be able to:\n",
"\n",
"- Connect the representation that Principal Component Analysis produces to the that of an autoencoder (AE).\n",
"- Add tf.keras Functional API into your machine learning arsenal.\n",
"- Implement an autoencoder using `tf.keras`:\n",
" - build the encoder network/model\n",
" - build the decoder network/model\n",
" - decide on the latent/bottleneck dimension\n",
" - train your AE\n",
" - predict on unseen data\n",
" \n",
"### Note: To see solutions, uncomment and run the following: \n",
"\n",
"```\n",
"# %load solutions/exercise2.py\n",
"```\n",
"\n",
"First time you run will load solution, then you need to **run the cell again** to actually run the code."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
" \n",
"\n",
"## Table of Contents\n",
"\n",
"- **Part 1**: [Autoencoders and their connection to Principal Component Analysis](#part1).\n",
"- **Part 2**: [Denoising Images using AEs](#part2).\n",
"- **Part 3**: [Visualizing Intermediate Layers of an AE](#part3)."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from __future__ import annotations\n",
"import numpy as np\n",
"import seaborn as sns\n",
"import os\n",
"import datetime\n",
"\n",
"import matplotlib.pyplot as plt\n",
"plt.rcParams[\"figure.figsize\"] = (5,5)\n",
"%matplotlib inline"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"import tensorflow as tf\n",
"from tensorflow import keras\n",
"from tensorflow.keras.models import Sequential, Model\n",
"from tensorflow.keras.layers import Dense, Conv2D, Conv1D, MaxPooling2D, MaxPooling1D,\\\n",
" Dropout, Flatten, Activation, Input, UpSampling2D\n",
"from tensorflow.keras.optimizers import Adam, SGD, RMSprop\n",
"from tensorflow.keras.utils import to_categorical\n",
"from tensorflow.keras.metrics import AUC, Precision, Recall, FalsePositives, \\\n",
" FalseNegatives, TruePositives, TrueNegatives\n",
"from tensorflow.keras.preprocessing import image\n",
"from tensorflow.keras.regularizers import l2"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"tf.keras.backend.clear_session() # For easy reset of notebook state.\n",
"print(tf.__version__) # You should see a > 2.0.0 here!\n",
"from tf_keras_vis.utils import print_gpus\n",
"print_gpus()"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# set the seed for reproducability of results\n",
"seed = 109\n",
"np.random.seed(seed)\n",
"tf.random.set_seed(seed)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# install this if you want to play around with Tensorboard\n",
"#!pip install tf-keras-vis tensorflow\n",
"%load_ext tensorboard"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# remove old logs\n",
"!rm -rf ./logs/ "
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
" \n",
"\n",
"## Part 1: Autoencoders and the connection to Principal Component Analysis \n",
"\n",
"#### Principal Component Analysis (PCA)\n",
"\n",
"**PCA** decomposes a multivariate dataset in a set of eigenvectors - successive orthogonal coeefficients that explain a large amount of the variance. By using only a number of the highest valued vectors, let's say $N$ of them, we effectively reduce the dimensionality of our data to $N$ with minimal loss of information as measured by RMS (Root Mean Squared) Error. \n",
"\n",
"PCA in `sklearn` is a transformer that learns those $N$ components via its `.fit` method. It can then be used to project a new data object in these components. Remember from 109a that we always `.fit` only to the training set and `.transform` both training and test set.\n",
"\n",
"```\n",
"from sklearn.decomposition import PCA\n",
"k = 2 # number of components that we want to keep\n",
"\n",
"X_train, X_test = load_data()\n",
"pca = PCA(n_components=k)\n",
"\n",
"principal_components = pca.fit_transform(X_train)\n",
"principal_components = pca.transform(X_test)\n",
"```\n",
"\n",
"#### Autoencoders (AE)\n",
"\n",
"\n",
"\n",
"*image source: Deep Learning by Francois Collet*\n",
"\n",
"An **AE** maps its input, usually an image, to a latent vector space via an encoder function, and then decodes it back to an output that is the same as the input, via a decoder function. It’s effectively being trained to reconstruct the original input. By trying to minimize the reconstruction MSE error, on the output of the encoder, you can get the autoencoder to learn interesting latent representations of the data. Historically, autoencoders have been used for tasks such as dimentionality reduction, feature learning, and outlier detection. \n",
"\n",
"One type of architecture for an AE is to have the decoder network be a 'mirror image' of the encoder. It makes more sense this way but it is not necessary. \n",
"\n",
"We can say that AEs are self-supervised learning networks! \n",
"\n",
"\n",
"#### Understandind the connection between PCA and AEs \n",
"\n",
"If the hidden and output layers of an autoencoder are linear, the autoencoder will learn hidden units that are linear representations of the data, just like PCA does. If we have $M$ hidden units in our AE, those will span the same space as the $M$ first principal components. The hidden layers of the AE will not produce orthogonal representations of the data as PCA would but if we add non-linear components in our encoder-decoder networks we can represent a non-linear space/manifold; "
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### Fashion-MNIST\n",
"\n",
"We will use the dataset of clothing article images (created by [Zalando](https://github.com/zalandoresearch/fashion-mnist)), consisting of a training set of 60,000 examples and a test set of 10,000 examples. Each example is a **28 x 28** grayscale image, associated with a label from **10 classes**. The names of the classes corresponding to numbers 0-9 are: \n",
"```\n",
"'T-shirt/top', 'Trouser', 'Pullover', 'Dress', 'Coat','Sandal', 'Shirt', 'Sneaker', 'Bag', and 'Ankle boot'\n",
"```\n",
"The creators intend Fashion-MNIST to serve as a direct drop-in replacement for the original MNIST dataset for benchmarking machine learning algorithms. It shares the same image size and structure of training and testing splits. Each pixel is 8 bits so its value ranges from 0 to 255.\n",
"\n",
"Let's load and look at it!"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# get the data from keras - how convenient!\n",
"fashion_mnist = tf.keras.datasets.fashion_mnist\n",
"\n",
"# load the data splitted in train and test! how nice!\n",
"(X_train, y_train),(X_test, y_test) = fashion_mnist.load_data()\n",
"\n",
"# normalize the data by dividing with pixel intensity\n",
"# (each pixel is 8 bits so its value ranges from 0 to 255)\n",
"X_train, X_test = X_train / 255.0, X_test / 255.0\n",
"\n",
"print(f'X_train shape: {X_train.shape}, X_test shape: {X_test.shape}')\n",
"print(f'y_train shape: {y_train.shape}, and y_test shape: {y_test.shape}')\n",
"\n",
"# classes are named 0-9 so define names for plotting clarity\n",
"class_names = ['T-shirt/top', 'Trouser', 'Pullover', 'Dress', 'Coat',\n",
" 'Sandal', 'Shirt', 'Sneaker', 'Bag', 'Ankle boot']\n",
"\n",
"plt.figure(figsize=(10,10))\n",
"for i in range(25):\n",
" plt.subplot(5,5,i+1)\n",
" plt.xticks([])\n",
" plt.yticks([])\n",
" plt.grid(False)\n",
" plt.imshow(X_train[i], cmap=plt.cm.binary)\n",
" plt.xlabel(class_names[y_train[i]])\n",
"plt.show()"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# choose one image to look at\n",
"i = 5\n",
"plt.imshow(X_train[i], cmap='gray');\n",
"plt.xlabel(class_names[y_train[i]]);"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Exercise 1: Calculate the dimensionality of the Fashion dataset. Then flatten it to prepare for PCA.
"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"## your code here\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# %load solutions/exercise1.py"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from sklearn.decomposition import PCA"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Exercise 2: Find the 2 first principal components by fitting on the train set. Print the shape of the components matrix.
"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# your code here\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# %load solutions/exercise2.py"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"scrolled": false
},
"outputs": [],
"source": [
"# print the variance explained by those components\n",
"pca.explained_variance_"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"**Note:** The first two components explain ~19+12 = 31% of the variance."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Discussion: Comment on what you see here.
"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# transform the train and test set\n",
"X_train_pca = pca.transform(X_train_flat)\n",
"X_test_pca = pca.transform(X_test_flat)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"X_train_pca.shape, X_train_pca[1:5,0].shape"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"scrolled": false
},
"outputs": [],
"source": [
"class_names = ['T-shirt/top', 'Trouser', 'Pullover', 'Dress', 'Coat',\n",
" 'Sandal', 'Shirt', 'Sneaker', 'Bag', 'Ankle boot']\n",
"\n",
"fig, ax1 = plt.subplots(1,1, figsize=(9,9))\n",
"sns.scatterplot(x=X_train_pca[:,0], y=X_train_pca[:,1], hue=y_train,\n",
" palette=sns.color_palette(\"deep\", 10), ax=ax1)\n",
"ax1.set_title(\"FASHION MNIST, First 2 principal components\");"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"\n",
"\n",
"## Part 2: Denoise Images using AEs\n",
"\n",
"We will create an autoencoder which will accept \"noisy\" images as input and try to produce the original images. Since we do not have noisy images to start with, we will add random noise to our Fashion-MNIST images. To do this we will use the image augmentation library [imgaug docs](https://imgaug.readthedocs.io/en/latest/source/api_augmenters_arithmetic.html). \n",
"\n",
"From this library we will use `SaltAndPepper`, an augmenter which replaces pixels in images with salt/pepper noise (white/black-ish colors), randomly with probability passed as parameter `p`. Use the code below to install the library in your virtual environment."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# !conda install imgaug \n",
"# ### OR\n",
"# !pip install imgaug"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from imgaug import augmenters"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# NNs want the inputs to be 4D\n",
"X_train = X_train.reshape(-1, h, w, 1)\n",
"X_test = X_test.reshape(-1, h, w, 1)\n",
"\n",
"# Lets add sample noise - Salt and Pepper\n",
"noise = augmenters.SaltAndPepper(p=0.1, seed=seed)\n",
"seq_object = augmenters.Sequential([noise])\n",
"\n",
"# Augment the data (add the noise)\n",
"X_train_n = seq_object.augment_images(X_train * 255) / 255\n",
"X_test_n = seq_object.augment_images(X_test * 255) / 255"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"f, ax = plt.subplots(1,5, figsize=(20,10))\n",
"for i in range(5,10):\n",
" ax[i-5].imshow(X_train_n[i, :, :, 0].reshape(28, 28), cmap=plt.cm.binary)\n",
" ax[i-5].set_xlabel('Noisy '+class_names[y_train[i]])\n",
" \n",
"f, ax = plt.subplots(1,5, figsize=(20,10))\n",
"for i in range(5,10):\n",
" ax[i-5].imshow(X_train[i, :, :, 0].reshape(28, 28), cmap=plt.cm.binary)\n",
" ax[i-5].set_xlabel('Clean '+class_names[y_train[i]])"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### `tf.keras.Sequential` API\n",
"\n",
"This is what we have been using so far for building our models. Its pros are: it's simple to use, it allows you to create models layer-by-layer. Its basic con is: it is not very flexible, and although it includes layers such as Merge, Concatenate, and Add that allow for a combination of models, it is difficult to make models with many inputs or shared-layers. All layers, as the name implies, are connected sequentially.\n",
"\n",
"#### Intro to `tf.keras.Functional` API\n",
"\n",
"https://www.tensorflow.org/guide/keras/functional. \n",
"\n",
"In this API, layers are built as graphs, with each layer indicating to which layer it is connected. Functional API helps us make more complex models which include non-sequential connections and multiple inputs or outputs.\n",
"\n",
"Let's say we have an image input with a shape of (28, 28, 1) and a classification task:\n",
"\n",
"```\n",
"num_classes = 10\n",
"\n",
"inputs = keras.Input(shape=(h, w, 1))\n",
"x = Dense(64, activation='relu')(inputs)\n",
"x = layers.Dense(64, activation='relu')(x)\n",
"outputs = Dense(num_classes, activation='softmax', name='output')(x)\n",
"\n",
"ae_model = Model(inputs=inputs, outputs=outputs, name='autoencoder')\n",
"\n",
"ae_model.summary()\n",
"```\n",
"\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### Create the Encoder"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# input layer\n",
"input_layer = Input(shape=(h, w, 1))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Exercise 3: Create your \"encoder\" as a 2D CNN a follows:
\n",
"\n",
"- Use the Functional API\n",
"- Create a pair of layers consisting of a `Conv2D` and a `MaxPool` layer which takes in our `input_layer`. Choose the number of filters. \n",
"- Stack 3 of these layers, one after the other.\n",
"- Give this model the name `latent_model` (it's not your final model).\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# your code here\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# %load solutions/exercise3.py"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Exercise 4: Create your \"decoder\" as a 2D CNN as follows:
\n",
"\n",
"- repeat the structure of your encoder but in \"reverse\".\n",
"- What is the output layer activation function? What are the dimensions of the output? \n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# your code here\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# %load solutions/exercise4.py"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Exercise 5: Connect the two parts (encoder, decoder) to create your autoencoder. Compile and then train your autoencoder.
\n",
"\n",
"Choose an optimizer and a loss function. Use Early Stopping. To get good results you will need to run this about 20 epochs and this will take a long time depending on your machine. **For the purposes of this lab run only for 2 epochs**.\n",
"\n",
"(Optional: add Tensorboard). \n",
"\n",
"Here is how to connect the two models:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# create the model\n",
"ae_model = Model(input_layer, output_layer, name='ae_model')\n",
"ae_model.summary()"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# your code here\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# %load solutions/exercise5-1.py"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# %load solutions/exercise5-2.py"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# Let's see how our AE did \n",
"fig, ax = plt.subplots(1,1, figsize=(10,6))\n",
"ax.plot(history.history['loss'], label='Train')\n",
"ax.plot(history.history['val_loss'], label='Val')\n",
"ax.set_xlabel(\"Epoch\", fontsize=20)\n",
"ax.set_ylabel(\"Loss\", fontsize=20)\n",
"ax.legend()\n",
"ax.set_title('Autoencoder Loss')"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# start Tensorboard - requires grpcio>=1.24.3\n",
"#%tensorboard --logdir logs"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# save your model\n",
"ae_model.save_weights('ae_model.h5')"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
" \n",
"\n",
"## Part 3: Visualizing Intermediate Layers of AE"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Exercise 6: Let's now visualize the latent layer of our encoder network.
\n",
"\n",
"This is our \"encoder\" model which we have saved as:\n",
"```\n",
"encoder = Model(input_layer, latent_view, name='encoder_model')\n",
"```"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# your code here\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# %load solutions/exercise6.py"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Discussion: \n",
"
What do you see in the little images above?\n",
" Could we have included Dense layers as bottleneck instead of just Conv2D and MaxPoool/upsample?"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### possible answers\n",
"We can have bottleneck layers in convolutional autoencoders that are not dense but simply a few stacked featuremaps such as above. They might have better generalizability due to only using shared weights. One interesting consequence is that without the dense layer you'll force translational equivariance on the latent representation (a particular feature in the top right corner will appear as an activation in the top right corner of the featuremaps at the level of the bottleneck, and if the feature is moved in the original image the activation in the bottleneck will move proportionally in the same direction). This isn't necessarily a problem, but you are enforcing some constraints on the relationships between the latent space directions that you wouldn't be with the presence of a dense layer."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Visualize Samples reconstructed by our AE"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"n = np.random.randint(0,len(X_test)-5)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"f, ax = plt.subplots(1,5)\n",
"f.set_size_inches(80, 40)\n",
"for i,a in enumerate(range(n,n+5)):\n",
" ax[i].imshow(X_test[a, :, :, 0].reshape(28, 28), cmap='gray')"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"f, ax = plt.subplots(1,5)\n",
"f.set_size_inches(80, 40)\n",
"for i,a in enumerate(range(n,n+5)):\n",
" ax[i].imshow(X_test_n[a, :, :, 0].reshape(28, 28), cmap='gray')"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"preds = ae_model.predict(X_test_n[n:n+5])\n",
"f, ax = plt.subplots(1,5)\n",
"f.set_size_inches(80, 40)\n",
"for i,a in enumerate(range(n,n+5)):\n",
" ax[i].imshow(preds[i].reshape(28, 28), cmap='gray')\n",
"plt.show()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Discussion: Comment on the predictions.
"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.7.6"
}
},
"nbformat": 4,
"nbformat_minor": 2
}