AI APIs & SDKs

Why Use an API?

The chatbot UI (claude.ai, chatgpt.com) is great for interactive use, but building real products means talking to the model programmatically. APIs let you:

Embed AI into your own applications
Control every parameter (temperature, max tokens, system prompt)
Process thousands of requests in code
Build custom UIs, pipelines, and agents

Core Concepts — The Messages API

Both Anthropic and OpenAI use the same fundamental pattern: you send a list of messages with roles, and the model returns a response.

The Three Roles

Role	Purpose	Example
`system`	Hidden instructions that shape behavior	"You are a senior Python developer. Be concise."
`user`	The human's message	"Write a function to sort a list"
`assistant`	The model's response (or prefilled for continuation)	"Here's a sort function..."

Messages alternate between user and assistant, with system set once at the top.

Anthropic API (Claude)

Python SDK

python
import anthropic

client = anthropic.Anthropic()  # reads ANTHROPIC_API_KEY from env

message = client.messages.create(
    model="claude-sonnet-4-6-20250514",
    max_tokens=1024,
    system="You are a helpful coding assistant. Be concise.",
    messages=[
        {"role": "user", "content": "Explain async/await in Python in 3 sentences."}
    ]
)

print(message.content[0].text)
# The response text
print(message.usage)
# Usage(input_tokens=25, output_tokens=87)

TypeScript SDK

typescript
import Anthropic from "@anthropic-ai/sdk";

const client = new Anthropic(); // reads ANTHROPIC_API_KEY from env

const message = await client.messages.create({
  model: "claude-sonnet-4-6-20250514",
  max_tokens: 1024,
  system: "You are a helpful coding assistant. Be concise.",
  messages: [
    { role: "user", content: "Explain async/await in Python in 3 sentences." },
  ],
});

console.log(message.content[0].text);

Key Parameters

Parameter	Type	Purpose
`model`	string	Which model to use (`claude-sonnet-4-6-20250514`, `claude-opus-4-20250514`)
`max_tokens`	int	Maximum tokens in the response (required)
`temperature`	float	Randomness: 0.0 = deterministic, 1.0 = creative
`system`	string	System prompt — sets behavior and constraints
`messages`	array	Conversation history with role/content pairs
`tools`	array	Tool definitions for function calling (see 15 - Tool Use & Function Calling)
`stop_sequences`	array	Custom strings that stop generation early

OpenAI API (GPT)

Very similar structure, different field names:

python
from openai import OpenAI

client = OpenAI()  # reads OPENAI_API_KEY from env

response = client.chat.completions.create(
    model="gpt-4o",
    max_completion_tokens=1024,
    messages=[
        {"role": "system", "content": "You are a helpful coding assistant."},
        {"role": "user", "content": "Explain async/await in Python in 3 sentences."},
    ]
)

print(response.choices[0].message.content)
print(response.usage)
# CompletionUsage(prompt_tokens=25, completion_tokens=82, total_tokens=107)

API Comparison

Feature	Anthropic	OpenAI
System prompt	Top-level `system` field	Message with `role: "system"`
Response location	`message.content[0].text`	`response.choices[0].message.content`
Token limit param	`max_tokens`	`max_completion_tokens`
Tool calling	`tools` + `tool_use` blocks	`tools` + `function` calls
Streaming	SSE via `.stream()`	SSE via `stream=True`

Streaming — Why It Matters

Without streaming, the user stares at a blank screen for 5-30 seconds. With streaming, tokens appear as they are generated — dramatically better UX.

How It Works: Server-Sent Events (SSE)

python
# Anthropic streaming
with client.messages.stream(
    model="claude-sonnet-4-6-20250514",
    max_tokens=1024,
    messages=[{"role": "user", "content": "Write a haiku about coding"}]
) as stream:
    for text in stream.text_stream:
        print(text, end="", flush=True)

Each SSE event delivers a small chunk. Your frontend appends chunks as they arrive, creating the "typing" effect you see in ChatGPT and Claude.

Structured Output

Sometimes you need JSON, not prose. Three approaches:

1. Prompt-Based (Simplest)

python
messages=[{"role": "user", "content": """
Extract the person's info as JSON:
"John Smith is 30 years old and lives in NYC"
Respond ONLY with valid JSON: {"name": ..., "age": ..., "city": ...}
"""}]

2. Tool Use for Structured Extraction

Define a "tool" that the model "calls" — its arguments become your structured output:

python
tools=[{
    "name": "extract_person",
    "description": "Extract structured person info from text",
    "input_schema": {
        "type": "object",
        "properties": {
            "name": {"type": "string"},
            "age": {"type": "integer"},
            "city": {"type": "string"}
        },
        "required": ["name", "age", "city"]
    }
}]

The model returns a tool_use block with validated JSON matching your schema. This is the most reliable approach with Claude.

3. OpenAI JSON Mode

python
response = client.chat.completions.create(
    model="gpt-4o",
    response_format={"type": "json_object"},
    messages=[{"role": "user", "content": "Extract person info as JSON: ..."}]
)

Error Handling

Error Code	Meaning	What to Do
400	Bad request (invalid params)	Fix the request
401	Invalid API key	Check `ANTHROPIC_API_KEY`
429	Rate limited	Retry with exponential backoff
500	Server error	Retry after short delay
529	Overloaded (Anthropic)	Retry with backoff

Retry Pattern

python
import time

def call_with_retry(fn, max_retries=3):
    for attempt in range(max_retries):
        try:
            return fn()
        except anthropic.RateLimitError:
            wait = 2 ** attempt  # 1s, 2s, 4s
            time.sleep(wait)
    raise Exception("Max retries exceeded")

Both SDKs have built-in retry logic. The Anthropic Python SDK retries automatically on 429 and 529 errors.

Cost Management

Track Token Usage

Every API response includes usage — input tokens and output tokens. Output tokens cost 3-5x more than input tokens.

Model	Input (per 1M tokens)	Output (per 1M tokens)
Claude Sonnet 4	$3	$15
Claude Opus 4	$15	$75
Claude Haiku 3.5	$0.80	$4
GPT-4o	$2.50	$10

Tips for Controlling Costs

Use cheaper models for simple tasks — Haiku for classification, Sonnet for coding
Set max_tokens conservatively — don't request 4096 if you need 200
Cache system prompts — Anthropic offers prompt caching (90% discount on cached tokens)
Batch API — send many requests at once, get results in hours, 50% cheaper

Batch API (Anthropic)

python
# Create a batch of requests — results arrive within 24 hours
batch = client.batches.create(
    requests=[
        {"custom_id": "req-1", "params": {"model": "claude-sonnet-4-6-20250514", ...}},
        {"custom_id": "req-2", "params": {"model": "claude-sonnet-4-6-20250514", ...}},
        # ... hundreds or thousands of requests
    ]
)
# Poll for results later — 50% cost savings

Quick Start Checklist

Install SDK: pip install anthropic or npm install @anthropic-ai/sdk
Set environment variable: export ANTHROPIC_API_KEY=sk-ant-...
Make first API call with messages.create()
Try streaming for better UX
Monitor token usage for cost control
Add retry logic for production systems

Resources

Previous: 13 - Open Source Models | Next: 15 - Tool Use & Function Calling