Transformer

Sun, Aug 4, 2024
2-minute read

Basic concept for the architecture of Transformer models.

Prerequisite

Linear Algebra:
Understanding of vectors, matrices, and operations such as matrix multiplication, dot products isessential.
Natural Language Processing (NLP):
Basic concepts in NLP, such as tokenization, embeddings, and sequence modeling, are useful for understanding how Transformers handle text data.
Self-Supervised Learning

What is Transformer?

All the Transformer models mentioned above (GPT, BERT, BART, T5, etc.) have been trained as language models.
This means they have been trained on large amounts of raw text in a self-supervised fashion.

Basic Transformer Structure

From huggging face
https://huggingface.co/datasets/huggingface-course/documentation-images/resolve/main/en/chapter1/transformers_blocks.svg
https://huggingface.co/datasets/huggingface-course/documentation-images/resolve/main/en/chapter1/transformers.svg

Encoder

Decoder

Models

Encoder-only models
Decoder-only models
Encoder-decoder models (sequence-to-sequence models)

Tips

If use pre-trained AI model, and we need to go through a process called transfer learning for specific application case.During this process, the model is fine-tuned in a supervised way — that is, using human-annotated labels — on a given task.
There are two types of language modeling, causal and masked. This guide illustrates causal language modeling. Causal language models are frequently used for text generation. You can use these models for creative applications like choosing your own text adventure or an intelligent coding assistant like Copilot or CodeParrot.