Title¶
Exercise: Attention
Description :¶
In this exercise, you will implement an Attention mechanism. We load three encoder hidden states into
enc_states
, and 1 decoder hidden state into dec_state
. Your task is to
compute the final context_vector
.
That is, you should calculate an Attention score for every encoder hidden state, exponentiate these, then normalize them so they sum to 1. These are your Attention weights. Then, produce a context vector by multiplying each Attention weight by its corresponding encoder hidden state.
**REMINDER**: After running every cell, be sure to auto-grade your work by clicking 'Mark' in the lower-right corner. Otherwise, no credit will be given.¶
# imports useful libraries
import math
import numpy as np
YOU DO NOT NEED TO EDIT THE CELL BELOW
The follow code loads three encoder states into the dictionary enc_states
, whereby the
keys are 0, 1, and 2, and their respective values are lists of 50 floats (representing each hidden
state). The code also populates a single list of floats, dec_state
, which contains 50
floats (representing the hidden state).
# assumes we're passing in several enc states but only 1 dec states
def load_hidden_states(filename):
enc_states = {}
dec_state = []
f = open(filename)
for line in f.readlines():
model, num = line.split()[0].split("_")
if model == "enc":
enc_states[int(num)] = [float(t) for t in line.split(" ")[1:]]
else:
dec_state = [float(t) for t in line.split(" ")[1:]]
return enc_states, dec_state
enc_states, dec_state = load_hidden_states("hidden_states.txt")
YOU DO NOT NEED TO EDIT THE CELL BELOW
The follow code simply computes the attention score as the dot-product between the two passed-in embeddings.
# calculates the attention score as the dot product
def calculate_attention_score(v1, v2):
return sum(a*b for a, b in zip(v1, v2))
In the cell below, populate attention_scores
with the exponentiated attention
scores: $e^{(\text{score(enc_i, dec_j)})}$. The main aspect to figure out is which hidden states to
pass to calculate_attention_score()
.
### edTest(test_a) ###
attention_scores = []
# YOUR CODE HERE
In the cell below, simply normalize each of the exponentiated scores and store them in
attention_weights
. They should sum to 1.
### edTest(test_b) ###
attention_weights = []
# YOUR CODE HERE
In the cell below, create the final context vector context_vector
.
### edTest(test_c) ###
# YOUR CODE HERE
context_vector = # YOUR CODE HERE