Title¶
Exercise: Self-Attention
Description :¶
In this exercise, you will implement a Self-Attention Head for the 3rd word of a
4-word input. From a Pickled file, we load query
, key
, and
value
vectors that corresponds to 4 different inputs (thus, a total of 12 vectors).
Specifically, this is loaded into 3 respective dicts
.
You only need to calculate the final z
"context" vector that corresponds to the 3rd word
(i.e., z2
).
**REMINDER**: After running every cell, be sure to auto-grade your work by clicking 'Mark' in the lower-right corner. Otherwise, no credit will be given.¶
# imports useful libraries
import math
import pickle
import numpy as np
YOU DO NOT NEED TO EDIT THE CELL BELOW
The follow code loads the queries
, keys
, and values
vectors
that correspond to 4 distinct words. Here, all vectors have a length of 25. Each of these variables
(e.g., queries
) is a dict
, indexed by the word number. For example queries[2]
corresponds to the 3rd word (it's 0-indexed), and its value is a list of length 25, which
corresponds to the actual query vector.
pickled_content = pickle.load(open("L25.p", "rb"))
queries, keys, values = [pickled_content[i] for i in range(3)]
# to illustrate, let's print the query, key, and value vectors that correspond to teh 3rd word in the sentence:
print("query vector for 3rd word:", queries[2])
print("\nkey vector for 3rd word:", keys[2])
print("\nvalue vector for 3rd word:", values[2])
YOU DO NOT NEED TO EDIT THE CELL BELOW
# returns the dot product of two passed-in vectors
def calculate_dot_product(v1, v2):
return sum(a*b for a, b in zip(v1, v2))
In the cell below, populate the self_attention_scores
list by calculating the four
Attention scores that correspond to the 3rd word. Each Attention score should be
divided by $\sqrt{d_k}$ (where $d_k$ represents the length of the key vector), and the ordering
should be natural. That is, the 1st item in self_attention_scores
should correspond to
the Attention score for the 1st word, the 2nd item in self_attention_scores
should
correspond to the Attention score for 2nd word, and so on.
### edTest(test_a) ###
self_attention_scores = []
# YOUR CODE HERE
In the cell below, populate the softmax
list by calculating the softmax of each of the
four Attention scores found in the previous cell.
### edTest(test_b) ###
softmax_scores = []
# YOUR CODE HERE
In the cell below, create the final $z2$ list that corresponds to the 3rd word. $z2$ should have a length of 25.
### edTest(test_c) ###
z2 = []
# YOUR CODE HERE