Key Word(s): seq2seq, attention



Title¶

Exercise: Attention

Description :¶

In this exercise, you will implement an Attention mechanism. We load three encoder hidden states into enc_states, and 1 decoder hidden state into dec_state. Your task is to compute the final context_vector.

That is, you should calculate an Attention score for every encoder hidden state, exponentiate these, then normalize them so they sum to 1. These are your Attention weights. Then, produce a context vector by multiplying each Attention weight by its corresponding encoder hidden state.

**REMINDER**: After running every cell, be sure to auto-grade your work by clicking 'Mark' in the lower-right corner. Otherwise, no credit will be given.¶

In [63]:
# imports useful libraries
import math
import numpy as np

YOU DO NOT NEED TO EDIT THE CELL BELOW

The follow code loads three encoder states into the dictionary enc_states, whereby the keys are 0, 1, and 2, and their respective values are lists of 50 floats (representing each hidden state). The code also populates a single list of floats, dec_state, which contains 50 floats (representing the hidden state).

In [64]:
# assumes we're passing in several enc states but only 1 dec states
def load_hidden_states(filename):
    enc_states = {}
    dec_state = []
    
    f = open(filename)
    for line in f.readlines():
        model, num = line.split()[0].split("_")
        if model == "enc":
            enc_states[int(num)] = [float(t) for t in line.split(" ")[1:]]
        else:
            dec_state = [float(t) for t in line.split(" ")[1:]]
    return enc_states, dec_state

enc_states, dec_state = load_hidden_states("hidden_states.txt")

YOU DO NOT NEED TO EDIT THE CELL BELOW

The follow code simply computes the attention score as the dot-product between the two passed-in embeddings.

In [65]:
# calculates the attention score as the dot product
def calculate_attention_score(v1, v2):
    return sum(a*b for a, b in zip(v1, v2))

In the cell below, populate attention_scores with the exponentiated attention scores: $e^{(\text{score(enc_i, dec_j)})}$. The main aspect to figure out is which hidden states to pass to calculate_attention_score().

In [ ]:
### edTest(test_a) ###
attention_scores = []

# YOUR CODE HERE

In the cell below, simply normalize each of the exponentiated scores and store them in attention_weights. They should sum to 1.

In [ ]:
### edTest(test_b) ###
attention_weights = []

# YOUR CODE HERE

In the cell below, create the final context vector context_vector.

In [ ]:
### edTest(test_c) ###

# YOUR CODE HERE

context_vector = # YOUR CODE HERE