What Are LLMs (Large Language Models)

The Simple Explanation

An LLM is a next-token prediction machine. Given some text, it predicts what comes next — like autocomplete, but trained on the entire internet and absurdly good at it.

Input:  "The capital of France is"
Model:  P(Paris) = 95%, P(Lyon) = 1.5%, P(a) = 0.8%, ...
Output: "Paris"

That's it. Every conversation, every code generation, every essay — it's all next-token prediction happening thousands of times in sequence.

How They're Built

Step 1: Architecture — The Transformer

The core innovation (Google, 2017 — "Attention Is All You Need" paper):

Input Text → Tokenize → Embeddings → [Transformer Layers × 96] → Probabilities → Next Token
                                          ↓
                                    Self-Attention:
                                    "Which words should I
                                     pay attention to when
                                     predicting the next one?"

Self-attention is the key mechanism: for each token, the model learns which OTHER tokens in the input are relevant. When processing "The cat sat on the ___", attention helps the model focus on "cat" and "sat" to predict "mat".

Step 2: Pre-Training

Feed the model trillions of tokens from the internet:

Books, Wikipedia, code repositories, web pages, academic papers
Training objective: predict the next token (self-supervised — no human labels needed)
Cost: $10M-$100M+ in compute (thousands of GPUs for months)
Result: a foundation model with broad knowledge

Step 3: Instruction Tuning (Fine-Tuning)

The raw pre-trained model just predicts text — it doesn't follow instructions well.

Fine-tune on instruction/response pairs:

Instruction: "Explain quantum computing in simple terms"
Response: "Quantum computing uses quantum bits (qubits) that can be..."

Now the model understands "when a human asks X, respond helpfully."

Step 4: RLHF (Reinforcement Learning from Human Feedback)

Humans rank model outputs from best to worst. Train a reward model from rankings. Use RL to make the LLM produce outputs the reward model scores highly.

This is how models become helpful, harmless, and honest — not just good at text prediction.

Pre-trained model → knows a lot but rambles
+ Instruction tuning → follows instructions
+ RLHF → actually helpful and safe
= Claude, GPT-4, etc.

What Parameters Mean

Model	Parameters	Analogy
GPT-2	1.5B	Bicycle
Llama 3 8B	8B	Car
Claude Sonnet	~100B+ (estimated)	Airplane
GPT-4 / Claude Opus	~400B-1T+ (estimated)	Rocket

Parameters = the learned weights in the neural network. More parameters = more capacity to store patterns and knowledge. But also: more compute, more memory, more cost.

What LLMs Can and Can't Do

Can Do Well

Generate fluent, coherent text
Understand and follow complex instructions
Write, debug, and explain code
Translate between languages
Summarize documents
Reason through problems (with chain-of-thought)
Roleplay, brainstorm, creative writing

Can't Do (Fundamental Limits)

Limitation	Why
Hallucination	Model is confident about things it's wrong about — it generates "plausible" not "true"
Knowledge cutoff	Training data has a date — model doesn't know what happened after
No real-time info	Can't browse the web (unless given tools)
No persistent memory	Each conversation starts fresh (unless explicitly given context)
Math errors	Tokens aren't numbers — model "predicts" math results rather than calculating
Can't learn from conversations	Talking to it doesn't change its weights

The Hallucination Problem

You: "Who won the 2028 Olympics?"
LLM: "The 2028 Olympics were held in Los Angeles..." 
     (confidently generates plausible-sounding but potentially wrong details)

The model doesn't "know" things — it generates text that looks like it should follow your input. Sometimes what looks right IS right. Sometimes it isn't.

The Current Landscape (2025-2026)

Company	Models	Known For
Anthropic	Claude Opus 4, Sonnet 4, Haiku	Safety, coding, long context (1M tokens)
OpenAI	GPT-4o, o1, o3	Pioneered the field, broad capabilities
Google	Gemini 2.0, 2.5	Multimodal, huge context (2M), integrated with Google
Meta	Llama 3, 4	Open-source leader
Mistral	Mistral Large, Codestral	European, efficient, strong at code
DeepSeek	DeepSeek-V3, R1	Chinese, very cost-efficient, reasoning

Key Insight for Using AI Well

LLMs are tools for augmenting your thinking, not replacing it.

They're best when you can verify the output
They're dangerous when you trust blindly
The better YOUR prompt, the better THEIR output
Treat them like a very fast, very knowledgeable junior dev — verify everything

Resources

🎥 3Blue1Brown — But What Is a Neural Network?
🎥 3Blue1Brown — Attention in Transformers
📖 Andrej Karpathy — Intro to LLMs (1hr talk)
🔗 The Illustrated Transformer
📖 "Attention Is All You Need" paper (Google, 2017)

Next: 02 - Tokens & Tokenization