AI APIs & SDKs
Why Use an API?
The chatbot UI (claude.ai, chatgpt.com) is great for interactive use, but building real products means talking to the model programmatically. APIs let you:
- Embed AI into your own applications
- Control every parameter (temperature, max tokens, system prompt)
- Process thousands of requests in code
- Build custom UIs, pipelines, and agents
Core Concepts — The Messages API
Both Anthropic and OpenAI use the same fundamental pattern: you send a list of messages with roles, and the model returns a response.
The Three Roles
| Role | Purpose | Example |
|---|---|---|
system | Hidden instructions that shape behavior | "You are a senior Python developer. Be concise." |
user | The human's message | "Write a function to sort a list" |
assistant | The model's response (or prefilled for continuation) | "Here's a sort function..." |
Messages alternate between user and assistant, with system set once at the top.
Anthropic API (Claude)
Python SDK
pythonimport anthropic client = anthropic.Anthropic() # reads ANTHROPIC_API_KEY from env message = client.messages.create( model="claude-sonnet-4-6-20250514", max_tokens=1024, system="You are a helpful coding assistant. Be concise.", messages=[ {"role": "user", "content": "Explain async/await in Python in 3 sentences."} ] ) print(message.content[0].text) # The response text print(message.usage) # Usage(input_tokens=25, output_tokens=87)
TypeScript SDK
typescriptimport Anthropic from "@anthropic-ai/sdk"; const client = new Anthropic(); // reads ANTHROPIC_API_KEY from env const message = await client.messages.create({ model: "claude-sonnet-4-6-20250514", max_tokens: 1024, system: "You are a helpful coding assistant. Be concise.", messages: [ { role: "user", content: "Explain async/await in Python in 3 sentences." }, ], }); console.log(message.content[0].text);
Key Parameters
| Parameter | Type | Purpose |
|---|---|---|
model | string | Which model to use (claude-sonnet-4-6-20250514, claude-opus-4-20250514) |
max_tokens | int | Maximum tokens in the response (required) |
temperature | float | Randomness: 0.0 = deterministic, 1.0 = creative |
system | string | System prompt — sets behavior and constraints |
messages | array | Conversation history with role/content pairs |
tools | array | Tool definitions for function calling (see 15 - Tool Use & Function Calling) |
stop_sequences | array | Custom strings that stop generation early |
OpenAI API (GPT)
Very similar structure, different field names:
pythonfrom openai import OpenAI client = OpenAI() # reads OPENAI_API_KEY from env response = client.chat.completions.create( model="gpt-4o", max_completion_tokens=1024, messages=[ {"role": "system", "content": "You are a helpful coding assistant."}, {"role": "user", "content": "Explain async/await in Python in 3 sentences."}, ] ) print(response.choices[0].message.content) print(response.usage) # CompletionUsage(prompt_tokens=25, completion_tokens=82, total_tokens=107)
API Comparison
| Feature | Anthropic | OpenAI |
|---|---|---|
| System prompt | Top-level system field | Message with role: "system" |
| Response location | message.content[0].text | response.choices[0].message.content |
| Token limit param | max_tokens | max_completion_tokens |
| Tool calling | tools + tool_use blocks | tools + function calls |
| Streaming | SSE via .stream() | SSE via stream=True |
Streaming — Why It Matters
Without streaming, the user stares at a blank screen for 5-30 seconds. With streaming, tokens appear as they are generated — dramatically better UX.
How It Works: Server-Sent Events (SSE)
python# Anthropic streaming with client.messages.stream( model="claude-sonnet-4-6-20250514", max_tokens=1024, messages=[{"role": "user", "content": "Write a haiku about coding"}] ) as stream: for text in stream.text_stream: print(text, end="", flush=True)
Each SSE event delivers a small chunk. Your frontend appends chunks as they arrive, creating the "typing" effect you see in ChatGPT and Claude.
Structured Output
Sometimes you need JSON, not prose. Three approaches:
1. Prompt-Based (Simplest)
pythonmessages=[{"role": "user", "content": """ Extract the person's info as JSON: "John Smith is 30 years old and lives in NYC" Respond ONLY with valid JSON: {"name": ..., "age": ..., "city": ...} """}]
2. Tool Use for Structured Extraction
Define a "tool" that the model "calls" — its arguments become your structured output:
pythontools=[{ "name": "extract_person", "description": "Extract structured person info from text", "input_schema": { "type": "object", "properties": { "name": {"type": "string"}, "age": {"type": "integer"}, "city": {"type": "string"} }, "required": ["name", "age", "city"] } }]
The model returns a tool_use block with validated JSON matching your schema. This is the most reliable approach with Claude.
3. OpenAI JSON Mode
pythonresponse = client.chat.completions.create( model="gpt-4o", response_format={"type": "json_object"}, messages=[{"role": "user", "content": "Extract person info as JSON: ..."}] )
Error Handling
| Error Code | Meaning | What to Do |
|---|---|---|
| 400 | Bad request (invalid params) | Fix the request |
| 401 | Invalid API key | Check ANTHROPIC_API_KEY |
| 429 | Rate limited | Retry with exponential backoff |
| 500 | Server error | Retry after short delay |
| 529 | Overloaded (Anthropic) | Retry with backoff |
Retry Pattern
pythonimport time def call_with_retry(fn, max_retries=3): for attempt in range(max_retries): try: return fn() except anthropic.RateLimitError: wait = 2 ** attempt # 1s, 2s, 4s time.sleep(wait) raise Exception("Max retries exceeded")
Both SDKs have built-in retry logic. The Anthropic Python SDK retries automatically on 429 and 529 errors.
Cost Management
Track Token Usage
Every API response includes usage — input tokens and output tokens. Output tokens cost 3-5x more than input tokens.
| Model | Input (per 1M tokens) | Output (per 1M tokens) |
|---|---|---|
| Claude Sonnet 4 | $3 | $15 |
| Claude Opus 4 | $15 | $75 |
| Claude Haiku 3.5 | $0.80 | $4 |
| GPT-4o | $2.50 | $10 |
Tips for Controlling Costs
- Use cheaper models for simple tasks — Haiku for classification, Sonnet for coding
- Set
max_tokensconservatively — don't request 4096 if you need 200 - Cache system prompts — Anthropic offers prompt caching (90% discount on cached tokens)
- Batch API — send many requests at once, get results in hours, 50% cheaper
Batch API (Anthropic)
python# Create a batch of requests — results arrive within 24 hours batch = client.batches.create( requests=[ {"custom_id": "req-1", "params": {"model": "claude-sonnet-4-6-20250514", ...}}, {"custom_id": "req-2", "params": {"model": "claude-sonnet-4-6-20250514", ...}}, # ... hundreds or thousands of requests ] ) # Poll for results later — 50% cost savings
Quick Start Checklist
- Install SDK:
pip install anthropicornpm install @anthropic-ai/sdk - Set environment variable:
export ANTHROPIC_API_KEY=sk-ant-... - Make first API call with
messages.create() - Try streaming for better UX
- Monitor token usage for cost control
- Add retry logic for production systems
Resources
Previous: 13 - Open Source Models | Next: 15 - Tool Use & Function Calling