Advanced Prompting Techniques
Beyond the Basics
You know zero-shot, few-shot, and chain-of-thought from 07 - Prompt Engineering Fundamentals. These advanced techniques handle harder problems: multi-step reasoning, reliability at scale, and structured automation.
1. ReAct (Reason + Act)
What: The model alternates between reasoning about the problem and taking actions (tool calls, searches). It forms a loop: Think → Act → Observe → Think → Act → ...
Question: "What's the population of the capital of the country that
won the 2022 FIFA World Cup?"
Thought 1: I need to find who won the 2022 World Cup.
Action 1: search("2022 FIFA World Cup winner")
Observation 1: Argentina won the 2022 FIFA World Cup.
Thought 2: The capital of Argentina is Buenos Aires. I need its population.
Action 2: search("Buenos Aires population 2024")
Observation 2: Buenos Aires metro area population is ~15.5 million.
Thought 3: I have the answer.
Answer: Approximately 15.5 million people.
Why it matters: This is how AI agents work. Claude Code, ChatGPT with tools, and most agentic systems use ReAct-style loops internally.
When to use: Multi-step tasks requiring external data, tool use, or iterative problem-solving.
2. Tree-of-Thought (ToT)
What: Instead of one reasoning path, explore multiple paths simultaneously and pick the best one. Like a chess player considering several moves ahead.
Problem: "How should we migrate our monolith to microservices?"
Path A: Strangler Fig Pattern
→ Low risk, gradual
→ Takes 12-18 months
→ Score: 8/10
Path B: Big Bang Rewrite
→ High risk, clean slate
→ Takes 6-9 months
→ Score: 4/10
Path C: Domain-Driven Decomposition
→ Medium risk, strategic
→ Takes 9-12 months
→ Score: 7/10
Best path: A (Strangler Fig) — lowest risk, proven approach.
How to trigger it in a prompt:
"Consider 3 different approaches to solve this problem.
For each approach:
1. Describe the strategy
2. List pros and cons
3. Rate feasibility (1-10)
Then pick the best approach and explain why."
When to use: Complex decisions with trade-offs, architecture choices, debugging when the root cause is unclear.
3. Self-Consistency (Majority Voting)
What: Generate the same answer multiple times (with temperature > 0), then pick the most common answer. Reduces random errors.
Question: "Is this code thread-safe?"
Run 1: "No — the shared counter has no lock" → NO
Run 2: "No — race condition on line 12" → NO
Run 3: "Yes — the GIL protects it" → YES
Run 4: "No — multiple threads can increment" → NO
Run 5: "No — needs a mutex or atomic operation" → NO
Majority vote: NO (4/5) → Final answer: Not thread-safe.
Implementation pattern (API):
pythonimport collections answers = [] for _ in range(5): response = client.messages.create( model="claude-sonnet-4-20250514", temperature=0.7, messages=[{"role": "user", "content": prompt}] ) answers.append(extract_answer(response)) final = collections.Counter(answers).most_common(1)[0][0]
When to use: High-stakes classification, math problems, any task where you need confidence in the answer. Trade-off: costs N times more tokens.
4. Structured Output
What: Force the model to respond in a specific machine-readable format (JSON, XML, YAML).
In the prompt:
"Analyze this error log and respond in this exact JSON format:
{
"error_type": "string",
"root_cause": "string",
"severity": "low | medium | high | critical",
"suggested_fix": "string",
"affected_files": ["string"]
}"
Via API (JSON mode):
python# OpenAI response = client.chat.completions.create( model="gpt-4o", response_format={"type": "json_object"}, messages=[...] ) # Anthropic (tool use for structured output) response = client.messages.create( model="claude-sonnet-4-20250514", tools=[{ "name": "analyze_error", "input_schema": { "type": "object", "properties": { "error_type": {"type": "string"}, "severity": {"enum": ["low", "medium", "high", "critical"]}, }, "required": ["error_type", "severity"] } }], tool_choice={"type": "tool", "name": "analyze_error"}, messages=[...] )
When to use: Any time your code needs to parse the model's output. Always prefer structured output over regex-parsing free text.
5. Constitutional AI (Self-Critique)
What: The model generates an answer, then critiques it against a set of principles, then revises.
Step 1 — Generate:
"Here's a function to hash passwords: md5(password)"
Step 2 — Critique:
"This uses MD5 which is cryptographically broken.
It doesn't use salt. It's vulnerable to rainbow tables."
Step 3 — Revise:
"Use bcrypt with a work factor of 12:
bcrypt.hashpw(password.encode(), bcrypt.gensalt(rounds=12))"
How to trigger it:
"Answer the question below. Then critique your own answer for:
- Factual accuracy
- Security implications
- Missing edge cases
Finally, provide a revised answer addressing the critique."
When to use: Security-sensitive code, medical/legal content, any domain where errors are costly.
6. Meta-Prompting
What: Use AI to write or improve your prompts.
Prompt to the model:
"I need a prompt that will make an LLM consistently classify
customer support emails into these categories: billing, technical,
account, feedback, spam.
Write me an optimized prompt with:
- A clear role
- 5 few-shot examples
- Explicit output format
- Edge case handling"
Output: (a well-structured prompt you can use directly)
Practical workflow:
- Write your first attempt at a prompt
- Ask the model: "How can this prompt be improved? What ambiguities exist?"
- Apply the suggestions
- Test on edge cases
- Iterate
When to use: When building production prompts, when your prompt gives inconsistent results, when you're stuck on phrasing.
7. Prompt Chaining
What: Break a complex task into sequential steps. The output of one prompt feeds into the next.
Task: "Generate a technical blog post about WebSockets"
Chain:
┌──────────────────┐ ┌──────────────────┐ ┌──────────────────┐
│ Step 1: Outline │ ──→ │ Step 2: Draft │ ──→ │ Step 3: Review │
│ Generate 5-point │ │ Write each │ │ Check accuracy, │
│ outline │ │ section from │ │ improve examples │
│ │ │ the outline │ │ │
└──────────────────┘ └──────────────────┘ └──────────────────┘
Real engineering example:
Step 1: "Analyze this error log and identify the root cause."
→ Output: "Database connection pool exhausted due to leaked connections"
Step 2: "Given this root cause: [Step 1 output]. Find the code responsible.
Here's the codebase: [relevant files]"
→ Output: "db.py line 45: connection acquired but never released in error path"
Step 3: "Fix this bug: [Step 2 output]. Write the corrected code with
proper connection cleanup using context managers."
→ Output: (fixed code)
Why chaining beats one mega-prompt:
- Each step is simpler and more reliable
- You can inspect and verify intermediate results
- You can retry a single step without re-running everything
- Total quality is higher than one complex prompt
8. Multi-Turn Refinement
What: Iteratively improve output through conversation turns.
Turn 1: "Write a Python function to parse CSV files."
→ (basic function)
Turn 2: "Good, but add error handling for malformed rows
and support for custom delimiters."
→ (improved function)
Turn 3: "Now add type hints, a docstring, and make it
a generator for memory efficiency."
→ (production-ready function)
Strategy tips:
- Start broad, then narrow down
- Each turn should address 1-2 specific improvements
- Reference what was good: "Keep the error handling, but also..."
- If the model goes off track, give it the correct version and say "continue from here"
Decision Table: When to Use Which Technique
| Technique | Best For | Complexity | Token Cost |
|---|---|---|---|
| Zero-shot | Simple tasks | Low | Low |
| Few-shot | Custom formats, classification | Low | Medium |
| Chain-of-thought | Math, logic, reasoning | Low | Medium |
| ReAct | Multi-step tasks needing tools | High | High |
| Tree-of-thought | Complex decisions with trade-offs | Medium | High |
| Self-consistency | High-stakes, need confidence | Low | Very High (Nx) |
| Structured output | Machine-readable responses | Low | Low |
| Self-critique | Safety-critical, accuracy-critical | Medium | Medium |
| Meta-prompting | Building production prompts | Low | Medium |
| Prompt chaining | Complex multi-step workflows | Medium | Medium-High |
| Multi-turn | Iterative refinement | Low | Medium |
Combining Techniques
Real-world prompts often combine several techniques:
Role prompting + Few-shot + Chain-of-thought + Structured output:
"You are a senior security engineer. (ROLE)
Analyze code for vulnerabilities. For each finding,
think through the attack vector step by step. (CoT)
Example: (FEW-SHOT)
Code: query = f"SELECT * FROM users WHERE id = {input}"
Analysis: Step 1: User controls 'input' variable.
Step 2: Input is interpolated directly into SQL.
Step 3: Attacker can inject: 1 OR 1=1
Finding: {"vuln": "SQL Injection", "severity": "critical",
"line": 1, "fix": "Use parameterized query"}
Now analyze: (STRUCTURED)
[your code here]
Respond as a JSON array of findings."
Practical Tips
- Start simple. Zero-shot first. Add complexity only when needed.
- Test with adversarial inputs. Not just happy paths.
- Version control your prompts. They're code. Treat them as such.
- Measure results. Run the same prompt 10 times. How consistent is it?
- Budget your tokens. Chain-of-thought and self-consistency cost more. Worth it for hard tasks, wasteful for easy ones.
- Document what works. Build a prompt library for your team.
Resources
- 📄 ReAct: Synergizing Reasoning and Acting (Yao et al., 2022)
- 📄 Tree of Thoughts (Yao et al., 2023)
- 📄 Self-Consistency (Wang et al., 2022)
- 📄 Constitutional AI (Bai et al., 2022)
- 🔗 Anthropic — Tool Use
- 🔗 OpenAI — Structured Outputs
Previous: 07 - Prompt Engineering Fundamentals | Next: 09 - System Prompts & Instructions