Key Word(s): seq2seq, attention



Title¶

Exercise: Self-Attention

Description :¶

In this exercise, you will implement a Self-Attention Head for the 3rd word of a 4-word input. From a Pickled file, we load query, key, and value vectors that corresponds to 4 different inputs (thus, a total of 12 vectors). Specifically, this is loaded into 3 respective dicts.

You only need to calculate the final z "context" vector that corresponds to the 3rd word (i.e., z2).

**REMINDER**: After running every cell, be sure to auto-grade your work by clicking 'Mark' in the lower-right corner. Otherwise, no credit will be given.¶

In [1]:
# imports useful libraries
import math
import pickle
import numpy as np

YOU DO NOT NEED TO EDIT THE CELL BELOW

The follow code loads the queries, keys, and values vectors that correspond to 4 distinct words. Here, all vectors have a length of 25. Each of these variables (e.g., queries) is a dict, indexed by the word number. For example queries[2] corresponds to the 3rd word (it's 0-indexed), and its value is a list of length 25, which corresponds to the actual query vector.

In [2]:
pickled_content = pickle.load(open("L25.p", "rb"))
queries, keys, values = [pickled_content[i] for i in range(3)]

# to illustrate, let's print the query, key, and value vectors that correspond to teh 3rd word in the sentence:
print("query vector for 3rd word:", queries[2])
print("\nkey vector for 3rd word:", keys[2])
print("\nvalue vector for 3rd word:", values[2])
query vector for 3rd word: [0.15272, 0.36181, -0.22168, 0.066051, 0.13029, 0.37075, -0.75874, -0.44722, 0.22563, 0.10208, 0.054225, 0.13494, -0.43052, -0.2134, 0.56139, -0.21445, 0.077974, 0.10137, -0.51306, -0.40295, 0.40639, 0.23309, 0.20696, -0.12668, -0.50634]

key vector for 3rd word: [0.88387, -0.14199, 0.13566, 0.098682, 0.51218, 0.49138, -0.47155, -0.30742, 0.01963, 0.12686, 0.073524, 0.35836, -0.60874, -0.18676, 0.78935, 0.54534, 0.1106, -0.2923, 0.059041, -0.69551, -0.18804, 0.19455, 0.32269, -0.49981, 0.306]

value vector for 3rd word: [0.30045, 0.25006, -0.16692, 0.1923, 0.026921, -0.079486, -0.91383, -0.1974, -0.053413, -0.40846, -0.26844, -0.28212, -0.5, 0.1221, 0.3903, 0.17797, -0.4429, -0.40478, -0.9505, -0.16897, 0.77793, 0.33525, 0.3346, -0.1754, -0.12017]

YOU DO NOT NEED TO EDIT THE CELL BELOW

In [3]:
# returns the dot product of two passed-in vectors
def calculate_dot_product(v1, v2):
    return sum(a*b for a, b in zip(v1, v2))

In the cell below, populate the self_attention_scores list by calculating the four Attention scores that correspond to the 3rd word. Each Attention score should be divided by $\sqrt{d_k}$ (where $d_k$ represents the length of the key vector), and the ordering should be natural. That is, the 1st item in self_attention_scores should correspond to the Attention score for the 1st word, the 2nd item in self_attention_scores should correspond to the Attention score for 2nd word, and so on.

In [4]:
### edTest(test_a) ###
self_attention_scores = []

# YOUR CODE HERE

In the cell below, populate the softmax list by calculating the softmax of each of the four Attention scores found in the previous cell.

In [5]:
### edTest(test_b) ###
softmax_scores = []

# YOUR CODE HERE

In the cell below, create the final $z2$ list that corresponds to the 3rd word. $z2$ should have a length of 25.

In [6]:
### edTest(test_c) ###
z2 = []

# YOUR CODE HERE