Key Word(s): ??



Title :

Pavlos Recurrent Unit

Description :

The goal of this exercise is to build the Pavlos Recurrent Unit discussed in class.

Alternative notation used in the exercise:

Instructions:

  • Read the IMDB dataset from the helper code given.
  • Take a quick look at your training inputs and labels.
  • Pad the values to a fix number max_words in-order to have sequences of the same size.
  • Fill in the helper code given to build the PRU cell.
  • Using the tensorflow.keras Functional API, build, compile and fit the PRU RNN and evaluate it on the test set.
  • For reference, also refit the model with a vanilla RNN and a GRU.
  • Again evaluate the model performance on the test set of both models and compare it with the PRU unit.

Pavlos Recurrent Unit Pavlos

In this exercise, we will build the PRU as discussed in class to perform sentiment analysis in tensorflow.keras. We will continue to use the custom dataset from the previous exercise.

In [1]:
# Import necessary libraries
import numpy as np
import tensorflow as tf
from tensorflow import keras
from tensorflow.keras import backend as K
from tensorflow.keras.layers import RNN
from tensorflow.keras.models import Model,Sequential
from tensorflow.keras.layers import Input,Dense,Embedding
from tensorflow.keras.layers import SimpleRNN
from tensorflow.keras.preprocessing import sequence
import pickle
from tensorflow.keras.datasets import imdb
In [2]:
# We use the same dataset as the previous exercise 
with open('imdb_mini.pkl','rb') as f:
    X_train, y_train, X_test, y_test = pickle.load(f)
In [3]:
# Similar to the previous exercise, we will pre-preprocess our review sequences
# We fix the vocabulary size to 5000 because our custom 
# dataset was curated with that
vocabulary_size = 5000
# Max word length for each review will be 500
max_words = 500
# we set the embedding size to 32
embedding_size=32
# Pre-padding sequences to max_words lenth
X_train = sequence.pad_sequences(X_train, maxlen=max_words,padding='pre')
X_test = sequence.pad_sequences(X_test, maxlen=max_words,padding='pre')
In [4]:
# We create the mapping between words and sequences
word2id = imdb.get_word_index()
# We need to adjust the mapping by 3 because of tensorflow.keras preprocessing
# more here: https://stackoverflow.com/questions/42821330/restore-original-text-from-keras-s-imdb-dataset
word2id = {k:(v+3) for k,v in word2id.items()}
word2id[""] = 0
word2id[""] = 1
word2id[""] = 2
word2id[""] = 3

# Reversing the key,value pair will give the id2word
id2word = {i: word for word, i in word2id.items()}
Downloading data from https://storage.googleapis.com/tensorflow/tf-keras-datasets/imdb_word_index.json
1646592/1641221 [==============================] - 2s 1us/step

⏸ For the current problem, if the memory state size is 5, what will be the dimension of $W_{xh}$?

A. (32,32)

B. (32,5)

D. (5,5)

In [40]:
### edTest(test_chow1) ###
# Submit an answer choice as a string below (eg. if you choose option A, put 'A')
answer1 = '___'
In [5]:
# Complete the helper code below to build the Pavlos Recurrent Unit
# We do this by building a PRU cell unit
# which we can wrap around tf.keras.layers.RNN
# Read more here on layer subclassing https://keras.io/guides/making_new_layers_and_models_via_subclassing/

class PRUCell(tf.keras.layers.Layer):
    def __init__(self,units,**kwargs):
        self.units = units
        self.state_size = units
        self.activation = tf.math.tanh
        self.recurrent_activation = tf.math.sigmoid
        super(PRUCell, self).__init__(**kwargs)
        
                
        # In the build function we initialize the weights
        # Which will be used for training        
    def build(self, input_shape):
        
        # Initializing weights for candidate Ht
        ## W_{XH}
        self.kernel_h = self.add_weight(shape=(input_shape[-1], self.units),
                                      initializer='uniform',
                                      name='kernel')
        ## W_{HH}
        self.recurrent_kernel_h = self.add_weight(
            shape=(self.units, self.units),
            initializer='uniform',
            name='recurrent_kernel')
    
        
        # Initializing weights for PP gate
        ## W_{XPP} 
        self.kernel_pp = self.add_weight(shape=(input_shape[-1], self.units),
                                      initializer='uniform',
                                      name='PP_kernel')
        ## W_{HPP}
        self.recurrent_kernel_pp = self.add_weight(
            shape=(self.units, self.units),
            initializer='uniform',
            name='PP_recurrent_kernel')

        self.built = True
        
        # Note that we do not include a bias term for ease of understanding
        
    def call(self, inputs, states):
        ## inputs: X_t 
        ## states: h_{t-1}
        ## self.XXXX contains the weights (see above)
        # Previous output comes from states tuple, H_{t-1}
        prev_output = states[0]
        
        # First we compute the PPgate
        PP_XW = K.dot(___, ___)
        PP_HV = K.dot(___, ___)
        PPgate = self.recurrent_activation( ___ +  ___)
        
        # Now we use the PPgate as per the equation for candidate Ht
        nn_XW = K.dot(___, ___)
        dotted_output = ___*___
        nn_HV = K.dot(dotted_output, ___)
        output = self.activation(___ + ___)
        return output, [output]
In [6]:
# Now that we have our PRU RNN
# we will build a simple model similar to the previous exercise
# We will use the functional API to do this

hidden_state_units = 5 

# Specify the input dimensions HINT: It is max_words
inputs = Input(shape=(max_words,))
# The inputs will go in an embedding layer
embedding = Embedding(vocabulary_size,embedding_size, input_length=max_words)(inputs)

# The embeddings will be an input to the PRU layer
cell = PRUCell(hidden_state_units)
layer = RNN(cell)
hidden_output = layer(embedding)
# The output from the PRU block will go in a dense layer
output = Dense(1, activation='sigmoid')(hidden_output)
# Connecting the architecture using tf.keras.models.Model
pru_model = Model(inputs=inputs, outputs=output)

# Get the summary to see if your model is built correctly
print(pru_model.summary())
Model: "model"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
input_1 (InputLayer)         [(None, 500)]             0         
_________________________________________________________________
embedding (Embedding)        (None, 500, 32)           160000    
_________________________________________________________________
rnn (RNN)                    (None, 5)                 370       
_________________________________________________________________
dense (Dense)                (None, 1)                 6         
=================================================================
Total params: 160,376
Trainable params: 160,376
Non-trainable params: 0
_________________________________________________________________
None

⏸ For the current PRU model, how many weights are associated with the PPgate?

Bias is not included

A. 370

B. 10

C. 185

D. 190

In [26]:
### edTest(test_chow2) ###
# Submit an answer choice as a string below (eg. if you choose option A, put 'A')
answer2 = '____'
In [18]:
# Compile the model using 'binary_crossentropy' loss 
# and 'adam' optimizer, additionally add 'accuracy' metric
pru_model.compile(___)
In [19]:
# Train the model with appropriate batch size and number of epochs
batch_size = 256
num_epochs = 3
pru_model.fit(___)
Epoch 1/3
40/40 [==============================] - 6s 146ms/step - loss: 0.6923 - accuracy: 0.5330
Epoch 2/3
40/40 [==============================] - 6s 152ms/step - loss: 0.6785 - accuracy: 0.6459
Epoch 3/3
40/40 [==============================] - 6s 144ms/step - loss: 0.6301 - accuracy: 0.6881
Out[19]:
In [33]:
# Evaluate the model on the custom test set and report the 
accuracy = pru_model.evaluate(X_test, y_test)[1]
print(f'The accuracy for the PRU model is {100*accuracy:.2f}%')
157/157 [==============================] - 2s 14ms/step - loss: 0.6337 - accuracy: 0.6324
The accuracy for the PRU model is 63.24%

🍲 Adding the bias to the PRU model

Go back and add a bias term to the PRUCell (one for the PPGate and the other for $H_t$)

Does your model performance improve under the same training conditions?

In [27]:
### edTest(test_chow3) ###
# Type your answer within in the quotes given
answer3 = '___'