AI APIs & SDKs

Why Use an API?

The chatbot UI (claude.ai, chatgpt.com) is great for interactive use, but building real products means talking to the model programmatically. APIs let you:

  • Embed AI into your own applications
  • Control every parameter (temperature, max tokens, system prompt)
  • Process thousands of requests in code
  • Build custom UIs, pipelines, and agents

Core Concepts — The Messages API

Both Anthropic and OpenAI use the same fundamental pattern: you send a list of messages with roles, and the model returns a response.

The Three Roles

RolePurposeExample
systemHidden instructions that shape behavior"You are a senior Python developer. Be concise."
userThe human's message"Write a function to sort a list"
assistantThe model's response (or prefilled for continuation)"Here's a sort function..."

Messages alternate between user and assistant, with system set once at the top.


Anthropic API (Claude)

Python SDK

python
import anthropic client = anthropic.Anthropic() # reads ANTHROPIC_API_KEY from env message = client.messages.create( model="claude-sonnet-4-6-20250514", max_tokens=1024, system="You are a helpful coding assistant. Be concise.", messages=[ {"role": "user", "content": "Explain async/await in Python in 3 sentences."} ] ) print(message.content[0].text) # The response text print(message.usage) # Usage(input_tokens=25, output_tokens=87)

TypeScript SDK

typescript
import Anthropic from "@anthropic-ai/sdk"; const client = new Anthropic(); // reads ANTHROPIC_API_KEY from env const message = await client.messages.create({ model: "claude-sonnet-4-6-20250514", max_tokens: 1024, system: "You are a helpful coding assistant. Be concise.", messages: [ { role: "user", content: "Explain async/await in Python in 3 sentences." }, ], }); console.log(message.content[0].text);

Key Parameters

ParameterTypePurpose
modelstringWhich model to use (claude-sonnet-4-6-20250514, claude-opus-4-20250514)
max_tokensintMaximum tokens in the response (required)
temperaturefloatRandomness: 0.0 = deterministic, 1.0 = creative
systemstringSystem prompt — sets behavior and constraints
messagesarrayConversation history with role/content pairs
toolsarrayTool definitions for function calling (see 15 - Tool Use & Function Calling)
stop_sequencesarrayCustom strings that stop generation early

OpenAI API (GPT)

Very similar structure, different field names:

python
from openai import OpenAI client = OpenAI() # reads OPENAI_API_KEY from env response = client.chat.completions.create( model="gpt-4o", max_completion_tokens=1024, messages=[ {"role": "system", "content": "You are a helpful coding assistant."}, {"role": "user", "content": "Explain async/await in Python in 3 sentences."}, ] ) print(response.choices[0].message.content) print(response.usage) # CompletionUsage(prompt_tokens=25, completion_tokens=82, total_tokens=107)

API Comparison

FeatureAnthropicOpenAI
System promptTop-level system fieldMessage with role: "system"
Response locationmessage.content[0].textresponse.choices[0].message.content
Token limit parammax_tokensmax_completion_tokens
Tool callingtools + tool_use blockstools + function calls
StreamingSSE via .stream()SSE via stream=True

Streaming — Why It Matters

Without streaming, the user stares at a blank screen for 5-30 seconds. With streaming, tokens appear as they are generated — dramatically better UX.

How It Works: Server-Sent Events (SSE)

python
# Anthropic streaming with client.messages.stream( model="claude-sonnet-4-6-20250514", max_tokens=1024, messages=[{"role": "user", "content": "Write a haiku about coding"}] ) as stream: for text in stream.text_stream: print(text, end="", flush=True)

Each SSE event delivers a small chunk. Your frontend appends chunks as they arrive, creating the "typing" effect you see in ChatGPT and Claude.


Structured Output

Sometimes you need JSON, not prose. Three approaches:

1. Prompt-Based (Simplest)

python
messages=[{"role": "user", "content": """ Extract the person's info as JSON: "John Smith is 30 years old and lives in NYC" Respond ONLY with valid JSON: {"name": ..., "age": ..., "city": ...} """}]

2. Tool Use for Structured Extraction

Define a "tool" that the model "calls" — its arguments become your structured output:

python
tools=[{ "name": "extract_person", "description": "Extract structured person info from text", "input_schema": { "type": "object", "properties": { "name": {"type": "string"}, "age": {"type": "integer"}, "city": {"type": "string"} }, "required": ["name", "age", "city"] } }]

The model returns a tool_use block with validated JSON matching your schema. This is the most reliable approach with Claude.

3. OpenAI JSON Mode

python
response = client.chat.completions.create( model="gpt-4o", response_format={"type": "json_object"}, messages=[{"role": "user", "content": "Extract person info as JSON: ..."}] )

Error Handling

Error CodeMeaningWhat to Do
400Bad request (invalid params)Fix the request
401Invalid API keyCheck ANTHROPIC_API_KEY
429Rate limitedRetry with exponential backoff
500Server errorRetry after short delay
529Overloaded (Anthropic)Retry with backoff

Retry Pattern

python
import time def call_with_retry(fn, max_retries=3): for attempt in range(max_retries): try: return fn() except anthropic.RateLimitError: wait = 2 ** attempt # 1s, 2s, 4s time.sleep(wait) raise Exception("Max retries exceeded")

Both SDKs have built-in retry logic. The Anthropic Python SDK retries automatically on 429 and 529 errors.


Cost Management

Track Token Usage

Every API response includes usage — input tokens and output tokens. Output tokens cost 3-5x more than input tokens.

ModelInput (per 1M tokens)Output (per 1M tokens)
Claude Sonnet 4$3$15
Claude Opus 4$15$75
Claude Haiku 3.5$0.80$4
GPT-4o$2.50$10

Tips for Controlling Costs

  • Use cheaper models for simple tasks — Haiku for classification, Sonnet for coding
  • Set max_tokens conservatively — don't request 4096 if you need 200
  • Cache system prompts — Anthropic offers prompt caching (90% discount on cached tokens)
  • Batch API — send many requests at once, get results in hours, 50% cheaper

Batch API (Anthropic)

python
# Create a batch of requests — results arrive within 24 hours batch = client.batches.create( requests=[ {"custom_id": "req-1", "params": {"model": "claude-sonnet-4-6-20250514", ...}}, {"custom_id": "req-2", "params": {"model": "claude-sonnet-4-6-20250514", ...}}, # ... hundreds or thousands of requests ] ) # Poll for results later — 50% cost savings

Quick Start Checklist

  • Install SDK: pip install anthropic or npm install @anthropic-ai/sdk
  • Set environment variable: export ANTHROPIC_API_KEY=sk-ant-...
  • Make first API call with messages.create()
  • Try streaming for better UX
  • Monitor token usage for cost control
  • Add retry logic for production systems

Resources


Previous: 13 - Open Source Models | Next: 15 - Tool Use & Function Calling