11 - Pods & Workloads
What Is a Pod?
A Pod is the smallest deployable unit in Kubernetes. It's a wrapper around one or more containers that:
- Share the same network namespace (same IP, same localhost)
- Share the same IPC namespace
- Can share volumes
- Are scheduled together on the same node
- Have a shared lifecycle
┌─── Pod (IP: 10.244.1.5) ───────────────────┐
│ │
│ ┌─────────────┐ ┌─────────────┐ │
│ │ Container 1 │ │ Container 2 │ │
│ │ (main app) │◄──►│ (sidecar) │ │
│ │ :8080 │ │ :9090 │ │
│ └──────┬──────┘ └──────┬──────┘ │
│ │ │ │
│ └──── localhost ───┘ │
│ │
│ ┌─────────────────────────────────┐ │
│ │ Shared Volume │ │
│ └─────────────────────────────────┘ │
└────────────────────────────────────────────┘
Pod Manifest
yamlapiVersion: v1 kind: Pod metadata: name: my-app namespace: default labels: app: my-app version: v1 annotations: description: "Main application pod" spec: # --- Init Containers (run before main containers, sequentially) --- initContainers: - name: init-db image: busybox:1.36 command: ['sh', '-c', 'until nc -z db-service 5432; do echo waiting for db; sleep 2; done'] - name: init-migrations image: myapp:v1 command: ['./migrate', 'up'] # --- Main Containers --- containers: - name: app image: myapp:v1 ports: - containerPort: 8080 name: http protocol: TCP # --- Environment --- env: - name: DATABASE_URL valueFrom: secretKeyRef: name: db-credentials key: url - name: NODE_ENV value: "production" - name: POD_NAME valueFrom: fieldRef: fieldPath: metadata.name # Downward API - name: CPU_LIMIT valueFrom: resourceFieldRef: containerName: app resource: limits.cpu envFrom: - configMapRef: name: app-config - secretRef: name: app-secrets # --- Resources --- resources: requests: # Minimum guaranteed cpu: "250m" # 0.25 CPU cores memory: "256Mi" # 256 MiB limits: # Maximum allowed cpu: "1" # 1 CPU core memory: "512Mi" # OOM killed if exceeded # --- Probes --- startupProbe: httpGet: path: /health port: 8080 failureThreshold: 30 periodSeconds: 10 # App has 300s to start before being killed livenessProbe: httpGet: path: /health port: 8080 initialDelaySeconds: 0 periodSeconds: 10 timeoutSeconds: 3 failureThreshold: 3 # If 3 consecutive failures: restart container readinessProbe: httpGet: path: /ready port: 8080 periodSeconds: 5 timeoutSeconds: 2 failureThreshold: 3 # If failing: remove from Service endpoints (no traffic) # --- Volume Mounts --- volumeMounts: - name: data mountPath: /app/data - name: config mountPath: /etc/app/config readOnly: true - name: tmp mountPath: /tmp # --- Security --- securityContext: runAsNonRoot: true runAsUser: 1001 readOnlyRootFilesystem: true allowPrivilegeEscalation: false capabilities: drop: ["ALL"] # --- Sidecar Container --- - name: log-shipper image: fluent/fluent-bit:2.2 volumeMounts: - name: data mountPath: /app/data readOnly: true # --- Volumes --- volumes: - name: data persistentVolumeClaim: claimName: app-data-pvc - name: config configMap: name: app-config - name: tmp emptyDir: {} # --- Pod-level Settings --- restartPolicy: Always # Always | OnFailure | Never terminationGracePeriodSeconds: 30 serviceAccountName: my-app-sa # --- Scheduling --- nodeSelector: disktype: ssd tolerations: - key: "dedicated" operator: "Equal" value: "high-memory" effect: "NoSchedule" affinity: podAntiAffinity: preferredDuringSchedulingIgnoredDuringExecution: - weight: 100 podAffinityTerm: labelSelector: matchExpressions: - key: app operator: In values: ["my-app"] topologyKey: kubernetes.io/hostname
Probes Deep Dive
Three Types of Probes
| Probe | Purpose | On Failure |
|---|---|---|
| Startup | Is the app finished starting? | Kill and restart |
| Liveness | Is the app alive and healthy? | Kill and restart |
| Readiness | Can the app serve traffic? | Remove from Service |
Probe Methods
yaml# HTTP GET (most common for web apps) livenessProbe: httpGet: path: /health port: 8080 httpHeaders: - name: Accept value: application/json # TCP Socket (for non-HTTP services) livenessProbe: tcpSocket: port: 5432 # Exec Command (custom check) livenessProbe: exec: command: - cat - /tmp/healthy # gRPC (for gRPC services) livenessProbe: grpc: port: 50051
Probe Timeline
Container Start
│
▼
┌─ Startup Probe ─────────────────┐
│ Runs until success or timeout │
│ (other probes are disabled) │
└──────────────┬──────────────────┘
│ success
▼
┌─ Liveness Probe (periodic) ─────┐
│ Is the container healthy? │
│ Failure → restart container │
└─────────────────────────────────┘
+
┌─ Readiness Probe (periodic) ────┐
│ Can it handle traffic? │
│ Failure → remove from Service │
└─────────────────────────────────┘
Resource Requests and Limits
yamlresources: requests: # Scheduler uses this to find a node cpu: "250m" # 250 millicores = 0.25 CPU memory: "256Mi" # 256 Mebibytes limits: # Maximum the container can use cpu: "1" # 1 full CPU core memory: "512Mi" # OOM killed if exceeded
| Unit | CPU | Memory |
|---|---|---|
| Notation | millicores (m) | Mi, Gi (binary) or M, G (decimal) |
| Example | 500m = 0.5 CPU | 256Mi = ~268 MB |
| What happens on exceed | Throttled | OOM Killed |
QoS Classes (determined by requests/limits):
| Class | Condition | Eviction Priority |
|---|---|---|
| Guaranteed | requests == limits for all containers | Last to be evicted |
| Burstable | At least one request set, not Guaranteed | Middle |
| BestEffort | No requests or limits set | First to be evicted |
Workload Types
1. Deployment (Stateless Apps)
Most common workload. Manages ReplicaSets and rolling updates.
yamlapiVersion: apps/v1 kind: Deployment metadata: name: web-app spec: replicas: 3 selector: matchLabels: app: web-app template: metadata: labels: app: web-app spec: containers: - name: app image: myapp:v2 ports: - containerPort: 8080
2. ReplicaSet
Ensures N identical pods are running. Managed by Deployments -- rarely used directly.
3. Job (One-Time Tasks)
yamlapiVersion: batch/v1 kind: Job metadata: name: data-migration spec: backoffLimit: 3 # Retry up to 3 times activeDeadlineSeconds: 600 # Timeout after 10 minutes template: spec: containers: - name: migrate image: myapp:v1 command: ["./migrate", "up"] restartPolicy: OnFailure
4. CronJob (Scheduled Tasks)
yamlapiVersion: batch/v1 kind: CronJob metadata: name: nightly-backup spec: schedule: "0 2 * * *" # 2 AM daily concurrencyPolicy: Forbid # Don't overlap successfulJobsHistoryLimit: 3 failedJobsHistoryLimit: 1 jobTemplate: spec: template: spec: containers: - name: backup image: backup-tool:v1 command: ["./backup.sh"] restartPolicy: OnFailure
5. DaemonSet (One Pod Per Node)
See 16 - StatefulSets & DaemonSets
6. StatefulSet (Stateful Apps)
See 16 - StatefulSets & DaemonSets
Multi-Container Patterns
Sidecar Pattern
A helper container that extends the main container:
yamlspec: containers: - name: app image: myapp:v1 volumeMounts: - name: logs mountPath: /var/log/app - name: log-shipper # Sidecar image: fluent-bit:2.2 volumeMounts: - name: logs mountPath: /var/log/app readOnly: true volumes: - name: logs emptyDir: {}
Ambassador Pattern
A proxy that handles network communication:
yamlspec: containers: - name: app image: myapp:v1 # App connects to localhost:5432 - name: cloud-sql-proxy # Ambassador image: gcr.io/cloud-sql-connectors/cloud-sql-proxy:2 args: - "--port=5432" - "project:region:instance"
Adapter Pattern
Transforms output to a standard format:
yamlspec: containers: - name: app image: legacy-app:v1 # Produces custom metrics format - name: prometheus-adapter # Adapter image: metrics-adapter:v1 # Converts to Prometheus format ports: - containerPort: 9090
Pod Lifecycle
Pending → Running → Succeeded/Failed
│ │
│ └─ Containers running
│
└─ Scheduling, image pulling, init containers
| Phase | Description |
|---|---|
| Pending | Accepted but not running (scheduling, pulling images) |
| Running | At least one container running |
| Succeeded | All containers exited with code 0 |
| Failed | At least one container exited with non-zero code |
| Unknown | Pod status can't be determined (node communication issue) |
kubectl Pod Commands
bash# Create a pod kubectl run nginx --image=nginx:alpine # List pods kubectl get pods kubectl get pods -o wide # More details kubectl get pods -l app=myapp # Filter by label kubectl get pods --all-namespaces # All namespaces # Describe (events, conditions, details) kubectl describe pod my-pod # Logs kubectl logs my-pod kubectl logs my-pod -c sidecar # Specific container kubectl logs my-pod --previous # Previous crashed container kubectl logs -f my-pod # Stream logs kubectl logs -l app=myapp # All pods with label # Exec kubectl exec -it my-pod -- bash kubectl exec -it my-pod -c app -- bash # Specific container # Port forward kubectl port-forward my-pod 8080:80 # Copy files kubectl cp my-pod:/app/logs ./logs kubectl cp ./file.txt my-pod:/app/ # Delete kubectl delete pod my-pod kubectl delete pod my-pod --grace-period=0 --force # Immediate
FAANG Interview Angle
Common questions:
- "What's a Pod and why not just run containers directly?"
- "Explain the three types of probes"
- "What happens when a Pod exceeds its memory limit?"
- "Describe the sidecar pattern and when you'd use it"
- "What are QoS classes and how do they affect eviction?"
Key answers:
- Pod groups tightly coupled containers sharing network/storage; K8s schedules and manages pods, not containers
- Startup (wait for init), Liveness (is it alive → restart), Readiness (can it serve → remove from LB)
- Memory limit exceeded → OOM Kill → container restarted based on restartPolicy
- Sidecar: log shipping, service mesh proxy, config sync
- Guaranteed (requests=limits, last evicted), Burstable (some resources), BestEffort (first evicted)