What Are LLMs (Large Language Models)

The Simple Explanation

An LLM is a next-token prediction machine. Given some text, it predicts what comes next — like autocomplete, but trained on the entire internet and absurdly good at it.

Input:  "The capital of France is"
Model:  P(Paris) = 95%, P(Lyon) = 1.5%, P(a) = 0.8%, ...
Output: "Paris"

That's it. Every conversation, every code generation, every essay — it's all next-token prediction happening thousands of times in sequence.


How They're Built

Step 1: Architecture — The Transformer

The core innovation (Google, 2017 — "Attention Is All You Need" paper):

Input Text → Tokenize → Embeddings → [Transformer Layers × 96] → Probabilities → Next Token
                                          ↓
                                    Self-Attention:
                                    "Which words should I
                                     pay attention to when
                                     predicting the next one?"

Self-attention is the key mechanism: for each token, the model learns which OTHER tokens in the input are relevant. When processing "The cat sat on the ___", attention helps the model focus on "cat" and "sat" to predict "mat".

Step 2: Pre-Training

Feed the model trillions of tokens from the internet:

  • Books, Wikipedia, code repositories, web pages, academic papers
  • Training objective: predict the next token (self-supervised — no human labels needed)
  • Cost: $10M-$100M+ in compute (thousands of GPUs for months)
  • Result: a foundation model with broad knowledge

Step 3: Instruction Tuning (Fine-Tuning)

The raw pre-trained model just predicts text — it doesn't follow instructions well.

Fine-tune on instruction/response pairs:

Instruction: "Explain quantum computing in simple terms"
Response: "Quantum computing uses quantum bits (qubits) that can be..."

Now the model understands "when a human asks X, respond helpfully."

Step 4: RLHF (Reinforcement Learning from Human Feedback)

Humans rank model outputs from best to worst. Train a reward model from rankings. Use RL to make the LLM produce outputs the reward model scores highly.

This is how models become helpful, harmless, and honest — not just good at text prediction.

Pre-trained model → knows a lot but rambles
+ Instruction tuning → follows instructions
+ RLHF → actually helpful and safe
= Claude, GPT-4, etc.

What Parameters Mean

ModelParametersAnalogy
GPT-21.5BBicycle
Llama 3 8B8BCar
Claude Sonnet~100B+ (estimated)Airplane
GPT-4 / Claude Opus~400B-1T+ (estimated)Rocket

Parameters = the learned weights in the neural network. More parameters = more capacity to store patterns and knowledge. But also: more compute, more memory, more cost.


What LLMs Can and Can't Do

Can Do Well

  • Generate fluent, coherent text
  • Understand and follow complex instructions
  • Write, debug, and explain code
  • Translate between languages
  • Summarize documents
  • Reason through problems (with chain-of-thought)
  • Roleplay, brainstorm, creative writing

Can't Do (Fundamental Limits)

LimitationWhy
HallucinationModel is confident about things it's wrong about — it generates "plausible" not "true"
Knowledge cutoffTraining data has a date — model doesn't know what happened after
No real-time infoCan't browse the web (unless given tools)
No persistent memoryEach conversation starts fresh (unless explicitly given context)
Math errorsTokens aren't numbers — model "predicts" math results rather than calculating
Can't learn from conversationsTalking to it doesn't change its weights

The Hallucination Problem

You: "Who won the 2028 Olympics?"
LLM: "The 2028 Olympics were held in Los Angeles..." 
     (confidently generates plausible-sounding but potentially wrong details)

The model doesn't "know" things — it generates text that looks like it should follow your input. Sometimes what looks right IS right. Sometimes it isn't.


The Current Landscape (2025-2026)

CompanyModelsKnown For
AnthropicClaude Opus 4, Sonnet 4, HaikuSafety, coding, long context (1M tokens)
OpenAIGPT-4o, o1, o3Pioneered the field, broad capabilities
GoogleGemini 2.0, 2.5Multimodal, huge context (2M), integrated with Google
MetaLlama 3, 4Open-source leader
MistralMistral Large, CodestralEuropean, efficient, strong at code
DeepSeekDeepSeek-V3, R1Chinese, very cost-efficient, reasoning

Key Insight for Using AI Well

LLMs are tools for augmenting your thinking, not replacing it.

  • They're best when you can verify the output
  • They're dangerous when you trust blindly
  • The better YOUR prompt, the better THEIR output
  • Treat them like a very fast, very knowledgeable junior dev — verify everything

Resources


Next: 02 - Tokens & Tokenization