Reading Discussion 7
Key Word(s): Language Modelling, Attention, Transformers
Selected Readings
-
Expository
-
Adam Kosiorek: Attention in Neural Networks and How to Use It
-
Lilian Weng: Attention? Attention!
-
Jay Alammar: The Illustrated Transformer and The Illustrated BERT, ELMo, and co. (How NLP Cracked Transfer Learning) and Visual guide to using BERT for the first time.
-
Yannic Kilcher's videos explaining the transformers and BERT and GPT-3 papers.
-
Chris McCormick's BERT Research Series. A YouTube playlist covering word embeddings, attention, positional encodings, masked language models and fine-tuning.
-
Use Cases
-
spaCy. An excellent library for using language models in production.
- spaCy meets Transformers: Fine-tune BERT, XLNet and GPT-2.
-
Write With Transformer. Hugging Face's interactive demonstration of GPT-2 and XLNET's predictive power.
-
Gwern Branden's GPT-3 page. Discussions on how GPT-3 is programmed using prompts; its limitations; examples of poetry and prose generated in the style of famous authors, philosophers, etc.; its performance on logic and arithmetic tasks.
-
Research
-
Vaswani et al (2017), 'Attention is All you Need'. Introduces the Transformer, the neural network architecture used by the most powerful language models. Sasha Rush has an excellent line-by-line PyTorch implementation of this paper.
-
OpenAI (2020), 'Language Models are Few-Shot Learners' (GPT-3 paper)
- Big Bird: Transformers for Longer Sequences
- Show, Attend and Tell: Neural Image Caption Generation with Visual Attention
- One Model To Learn Them All
- How to Fine-Tune BERT for Text Classification
* Next presentations, select from Research or Use Case