Reading Discussion 7

Key Word(s): Language Modelling, Attention, Transformers

Selected Readings

Expository
Adam Kosiorek: Attention in Neural Networks and How to Use It
Lilian Weng: Attention? Attention!
Jay Alammar: The Illustrated Transformer and The Illustrated BERT, ELMo, and co. (How NLP Cracked Transfer Learning) and Visual guide to using BERT for the first time.
Annotated Transformer
Yannic Kilcher's videos explaining the transformers and BERT and GPT-3 papers.
Chris McCormick's BERT Research Series. A YouTube playlist covering word embeddings, attention, positional encodings, masked language models and fine-tuning.

Use Cases
spaCy. An excellent library for using language models in production.
- spaCy meets Transformers: Fine-tune BERT, XLNet and GPT-2.
spaCy IRL 2019

Write With Transformer. Hugging Face's interactive demonstration of GPT-2 and XLNET's predictive power.
Gwern Branden's GPT-3 page. Discussions on how GPT-3 is programmed using prompts; its limitations; examples of poetry and prose generated in the style of famous authors, philosophers, etc.; its performance on logic and arithmetic tasks.
Research
Vaswani et al (2017), 'Attention is All you Need'. Introduces the Transformer, the neural network architecture used by the most powerful language models. Sasha Rush has an excellent line-by-line PyTorch implementation of this paper.
Devlin et al (2019), 'BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding'
OpenAI (2020), 'Language Models are Few-Shot Learners' (GPT-3 paper)
Efficient Transformers: A Survey
Big Bird: Transformers for Longer Sequences
Show, Attend and Tell: Neural Image Caption Generation with Visual Attention
One Model To Learn Them All
How to Fine-Tune BERT for Text Classification

* Next presentations, select from Research or Use Case