Back
The Illustrated BERT, ELMo, and co. (How NLP Cracked Transfer Learning) – Jay Alammar – Visualizing machine learning one concept at a time.
webjalammar.github.io·jalammar.github.io/illustrated-bert
Data Status
Not fetched
Summary
The Illustrated BERT, ELMo, and co.
Cited by 1 page
| Page | Type | Quality |
|---|---|---|
| Deep Learning Revolution Era | Historical | 44.0 |
Cached Content Preview
HTTP 200Fetched Feb 22, 202619 KB
The Illustrated BERT, ELMo, and co. (How NLP Cracked Transfer Learning) – Jay Alammar – Visualizing machine learning one concept at a time.
-->
The Illustrated BERT, ELMo, and co. (How NLP Cracked Transfer Learning)
Discussions:
Hacker News (98 points, 19 comments) , Reddit r/MachineLearning (164 points, 20 comments)
Translations: Chinese (Simplified) , French 1 , French 2 , Japanese , Korean , Persian , Russian , Spanish
2021 Update: I created this brief and highly accessible video intro to BERT
The year 2018 has been an inflection point for machine learning models handling text (or more accurately, Natural Language Processing or NLP for short). Our conceptual understanding of how best to represent words and sentences in a way that best captures underlying meanings and relationships is rapidly evolving. Moreover, the NLP community has been putting forward incredibly powerful components that you can freely download and use in your own models and pipelines (It’s been referred to as NLP’s ImageNet moment , referencing how years ago similar developments accelerated the development of machine learning in Computer Vision tasks) .
(ULM-FiT has nothing to do with Cookie Monster. But I couldn’t think of anything else..)
One of the latest milestones in this development is the release of BERT , an event described as marking the beginning of a new era in NLP. BERT is a model that broke several records for how well models can handle language-based tasks. Soon after the release of the paper describing the model, the team also open-sourced the code of the model, and made available for download versions of the model that were already pre-trained on massive datasets. This is a momentous development since it enables anyone building a machine learning model involving language processing to use this powerhouse as a readily-available component – saving the time, energy, knowledge, and resources that would have gone to training a language-processing model from scratch.
The two steps of how BERT is developed. You can download the model pre-trained in step 1 (trained on un-annotated data), and only worry about fine-tuning it for step 2. [ Source for book icon].
BERT builds on top of a number of clever ideas that have been bubbling up in the NLP community recently – including but not limited to Semi-supervised Sequence Learning (by Andrew Dai and Quoc Le ) , ELMo (by Matthew Peters and researchers from AI2 and UW CSE ) , ULMFiT (by fast.ai founder Jeremy Howard and Sebastian Ruder ) , the OpenAI transformer (by OpenAI researchers Radford , Narasimhan , Salimans , and Sutskever ) , and the Transformer ( Vaswani et al ) .
There are a number of concepts one needs to be aware of to properly wrap one’s head around what BERT is. So let’s start by looking at ways you can use BERT before looking at the concepts involved
... (truncated, 19 KB total)Resource ID:
6101ece4184a7530 | Stable ID: MjA2MDhjYm