Transformer
Basic concept for the architecture of Transformer models.
Prerequisite
-
Linear Algebra:
Understanding of vectors, matrices, and operations such as matrix multiplication, dot products isessential. -
Natural Language Processing (NLP):
Basic concepts in NLP, such as tokenization, embeddings, and sequence modeling, are useful for understanding how Transformers handle text data. -
Self-Supervised Learning
What is Transformer?
All the Transformer models mentioned above (GPT, BERT, BART, T5, etc.) have been trained as language models.
This means they have been trained on large amounts of raw text in a self-supervised fashion.
Basic Transformer Structure
From huggging face
https://huggingface.co/datasets/huggingface-course/documentation-images/resolve/main/en/chapter1/transformers_blocks.svg
https://huggingface.co/datasets/huggingface-course/documentation-images/resolve/main/en/chapter1/transformers.svg
Encoder
Decoder
Models
- Encoder-only models
- Decoder-only models
- Encoder-decoder models (sequence-to-sequence models)
Tips
-
If use pre-trained AI model, and we need to go through a process called transfer learning for specific application case.During this process, the model is fine-tuned in a supervised way — that is, using human-annotated labels — on a given task.
-
There are two types of language modeling, causal and masked. This guide illustrates causal language modeling. Causal language models are frequently used for text generation. You can use these models for creative applications like choosing your own text adventure or an intelligent coding assistant like Copilot or CodeParrot.