Key Word(s): ??



Title :

RNN from scratch

Description :

The aim of this exercise is to understand what happens within an RNN unit that is wrapped within the tensorflow.keras.layers.SimpleRNN

The idea is to write a Recurrent Neural Network from scratch that generates names of dinosaurs by training on the existing names character-wise.

Instructions:

  • Read the file dino.txt and convert all the names in the files to lowercase.
  • Save the names in the file as a list.
  • Get the number of words in the file and the vocabulary size which is equal to the number of alphabets plus the newline character.
  • Define a dictionary char_to_ix where the key is the sorted vocabulary and the value is the integer value assigned to it.
  • Define a dictionary ix_to_char where the key is the integer value assigned to the unique vocabulary and the value is the sorted vocabulary.
  • To get the model parameters (weights and biases) call the get_weights function twice once by:
    • Setting the random parameter as 1 to get random weights
    • Setting the random parameter as 0 to get the trained weights by specifying the number of iterations.
  • Define a function rnn_model that takes in the network parameters and outputs the generated dinosaur name based on the instructions in the scaffold.

Hints:

$$h_t\ =\ \tanh\left(\ Uh_{t-1}\ +\ Vx_t\ +\ \beta_1\right)$$$$Y_{t\ }\ =\ \sigma\left(Wh_t\ +\ \beta_2\right)$$

sorted() Return a new sorted list from the items in iterable.

enumerate() Allows to loop over something and have an automatic counter.

lower() Return a copy of the string with all the cased characters converted to lowercase.

strip() Return a copy of the string with the leading and trailing characters removed.

np.random.shuffle() Modify a sequence in-place by shuffling its contents.

np.zeros() Return a new array of given shape and type, filled with zeros.

join() Return a string which is the concatenation of the strings in iterable.

np.tanh() Compute hyperbolic tangent element-wise.

np.dot() Returns the dot product of two arrays.

In [1]:
# Import necessary libraries
import random
import numpy as np
from helper import softmax, get_weights
In [1]:
# Function to predict the next set of characters which forms the dinosaur name
def rnn_model(parameters, char_to_ix):

    # Get the weights and biases from the parameters dictionary
    U, V, W = parameters['U'], parameters['V'], parameters['W']
    beta1, beta2 = parameters['beta1'], parameters['beta2']

    # Get the size of the vocabulary i.e. 27
    # One for each alphabet plus the new line character
    vocab_size = beta2.shape[0]

    # Get the size of the weights
    n_h = U.shape[1]

    # Initialize the input as an array of zeroes with size as (vocab_size,1)
    # This one-hot encodes the input
    x = ___

    # Initialize the inital hidden state as an array of zeroes with size as (n_h,1)    
    h_prev = ___

    # Initialize a list to store the indices of the predicted characters
    indices = []
    
    # Initialize an idx variable to hold the index values of the characters 
    idx = -1 
    
    # Initialize a counter to fix the maximum length of the predicted word
    counter = 0

    # Get the value of the new line from the char_to_ix dictionary
    newline_character = char_to_ix['\n']
    
    # Loop until the newline_character is predicted or until the max length of the word is 50
    while (idx != newline_character and counter != 50):

        # Compute the new state h of the RNN unit using the equation given in the instructions
        h = ___

        # Compute the output of the RNN unit using the equation 
        # given in the instructions using the softmax function
        y = softmax(___)

        # Get the index value of the predicted/generated character
        # Instead of taking the argmax, we perform sampling on the probabilities 
        # got from the softmax function
        idx = np.random.choice(list(range(vocab_size)), p=y.ravel())

        # Append the index value to the indices list
        indices.append(idx)
        
        # Initialize an array of with zeroes with size (vocab_size,1)
        x = np.zeros((vocab_size, 1))

        # Set the idx position of x as 1.
        # This will act as the output y and the next input. 
        x[idx] = 1
        
        # Update the previous state value with the current state
        h_prev = ___
        
        # Increment the counter
        counter+=1
    
    # If the counter value reaches 50 append a newline character to the indices list
    if (counter == 50):
        indices.append(char_to_ix['\n'])
    
    # Return the list of indices
    return indices
In [0]:
# Read the dinos.txt file
data = open('dinos.txt', 'r').read()

# Convert the data to lower case
data= data.lower()

# Convert the file data into list
chars = list(set(data))

# Get length of the file and length of the vocabulary
data_size, vocab_size = len(data), len(chars)

# Define a dictionary with the sorted vocabulary as key and 
# value as a unique integer using enumerate
char_to_ix = ___

# Define a dictionary with the unique integers assigned to the sorted vocabulary as key
# and value as a unique integer using enumerate
ix_to_char = ___

# Call the get_weights function to get the model weights
# To get random weights set random=1
# To get the trained weights specify the number of iterations and set random=0
parameters = get_weights(num_iterations=1000, random=0)
In [0]:
# Call the predict function defined above passing 
# the parameters dictionary, char_to_ix dictionary
sampled_indices = rnn_model(parameters, char_to_ix)

# Convert the list of indices returned by the predict function to 
# their respective characters and then join to form a word
txt = ___

# Captializing the first character
txt = txt[0].upper() + txt[1:]

# Print the generated dinosaur name 
print('%s' % (txt, ), end='')

⏸ What do you observe from the generated names when the number of iterations in the get_weights function increase to 20,000 with random=0?

A. The length of the names generated increases proportionately with the number of iterations.

B. Insufficient storage because of large window size.

C. The names generated are better due to longer training.

D. Larger number of iterations causes the loss of the model to increase.

In [0]:
### edTest(test_chow1) ###
# Submit an answer choice as a string below (eg. if you choose option A, put 'A')
answer1 = '___'

⏸ What is the difference in the generated name from when the model is given random weights and trained weights?

In [0]:
### edTest(test_chow2) ###
# Type your answer within in the quotes given
answer2 = '___'