Title¶
Exercise: Self-Attention
Description :¶
In this exercise, you will implement a Self-Attention Head for the 3rd word of a
4-word input. From a Pickled file, we load query, key, and
value vectors that corresponds to 4 different inputs (thus, a total of 12 vectors).
Specifically, this is loaded into 3 respective dicts.
You only need to calculate the final z "context" vector that corresponds to the 3rd word
(i.e., z2).
**REMINDER**: After running every cell, be sure to auto-grade your work by clicking 'Mark' in the lower-right corner. Otherwise, no credit will be given.¶
# imports useful libraries
import math
import pickle
import numpy as np
YOU DO NOT NEED TO EDIT THE CELL BELOW
The follow code loads the queries, keys, and values vectors
that correspond to 4 distinct words. Here, all vectors have a length of 25. Each of these variables
(e.g., queries) is a dict, indexed by the word number. For example queries[2]
corresponds to the 3rd word (it's 0-indexed), and its value is a list of length 25, which
corresponds to the actual query vector.
pickled_content = pickle.load(open("L25.p", "rb"))
queries, keys, values = [pickled_content[i] for i in range(3)]
# to illustrate, let's print the query, key, and value vectors that correspond to teh 3rd word in the sentence:
print("query vector for 3rd word:", queries[2])
print("\nkey vector for 3rd word:", keys[2])
print("\nvalue vector for 3rd word:", values[2])
YOU DO NOT NEED TO EDIT THE CELL BELOW
# returns the dot product of two passed-in vectors
def calculate_dot_product(v1, v2):
return sum(a*b for a, b in zip(v1, v2))
In the cell below, populate the self_attention_scores list by calculating the four
Attention scores that correspond to the 3rd word. Each Attention score should be
divided by $\sqrt{d_k}$ (where $d_k$ represents the length of the key vector), and the ordering
should be natural. That is, the 1st item in self_attention_scores should correspond to
the Attention score for the 1st word, the 2nd item in self_attention_scores should
correspond to the Attention score for 2nd word, and so on.
### edTest(test_a) ###
self_attention_scores = []
# YOUR CODE HERE
In the cell below, populate the softmax list by calculating the softmax of each of the
four Attention scores found in the previous cell.
### edTest(test_b) ###
softmax_scores = []
# YOUR CODE HERE
In the cell below, create the final $z2$ list that corresponds to the 3rd word. $z2$ should have a length of 25.
### edTest(test_c) ###
z2 = []
# YOUR CODE HERE