11 - Pods & Workloads

What Is a Pod?

A Pod is the smallest deployable unit in Kubernetes. It's a wrapper around one or more containers that:

  • Share the same network namespace (same IP, same localhost)
  • Share the same IPC namespace
  • Can share volumes
  • Are scheduled together on the same node
  • Have a shared lifecycle
┌─── Pod (IP: 10.244.1.5) ───────────────────┐
│                                            │
│  ┌─────────────┐    ┌─────────────┐        │
│  │ Container 1 │    │ Container 2 │        │
│  │ (main app)  │◄──►│ (sidecar)   │        │
│  │ :8080       │    │ :9090       │        │
│  └──────┬──────┘    └──────┬──────┘        │
│         │                  │               │
│         └──── localhost ───┘               │
│                                            │
│  ┌─────────────────────────────────┐       │
│  │ Shared Volume                   │       │
│  └─────────────────────────────────┘       │
└────────────────────────────────────────────┘

Pod Manifest

yaml
apiVersion: v1 kind: Pod metadata: name: my-app namespace: default labels: app: my-app version: v1 annotations: description: "Main application pod" spec: # --- Init Containers (run before main containers, sequentially) --- initContainers: - name: init-db image: busybox:1.36 command: ['sh', '-c', 'until nc -z db-service 5432; do echo waiting for db; sleep 2; done'] - name: init-migrations image: myapp:v1 command: ['./migrate', 'up'] # --- Main Containers --- containers: - name: app image: myapp:v1 ports: - containerPort: 8080 name: http protocol: TCP # --- Environment --- env: - name: DATABASE_URL valueFrom: secretKeyRef: name: db-credentials key: url - name: NODE_ENV value: "production" - name: POD_NAME valueFrom: fieldRef: fieldPath: metadata.name # Downward API - name: CPU_LIMIT valueFrom: resourceFieldRef: containerName: app resource: limits.cpu envFrom: - configMapRef: name: app-config - secretRef: name: app-secrets # --- Resources --- resources: requests: # Minimum guaranteed cpu: "250m" # 0.25 CPU cores memory: "256Mi" # 256 MiB limits: # Maximum allowed cpu: "1" # 1 CPU core memory: "512Mi" # OOM killed if exceeded # --- Probes --- startupProbe: httpGet: path: /health port: 8080 failureThreshold: 30 periodSeconds: 10 # App has 300s to start before being killed livenessProbe: httpGet: path: /health port: 8080 initialDelaySeconds: 0 periodSeconds: 10 timeoutSeconds: 3 failureThreshold: 3 # If 3 consecutive failures: restart container readinessProbe: httpGet: path: /ready port: 8080 periodSeconds: 5 timeoutSeconds: 2 failureThreshold: 3 # If failing: remove from Service endpoints (no traffic) # --- Volume Mounts --- volumeMounts: - name: data mountPath: /app/data - name: config mountPath: /etc/app/config readOnly: true - name: tmp mountPath: /tmp # --- Security --- securityContext: runAsNonRoot: true runAsUser: 1001 readOnlyRootFilesystem: true allowPrivilegeEscalation: false capabilities: drop: ["ALL"] # --- Sidecar Container --- - name: log-shipper image: fluent/fluent-bit:2.2 volumeMounts: - name: data mountPath: /app/data readOnly: true # --- Volumes --- volumes: - name: data persistentVolumeClaim: claimName: app-data-pvc - name: config configMap: name: app-config - name: tmp emptyDir: {} # --- Pod-level Settings --- restartPolicy: Always # Always | OnFailure | Never terminationGracePeriodSeconds: 30 serviceAccountName: my-app-sa # --- Scheduling --- nodeSelector: disktype: ssd tolerations: - key: "dedicated" operator: "Equal" value: "high-memory" effect: "NoSchedule" affinity: podAntiAffinity: preferredDuringSchedulingIgnoredDuringExecution: - weight: 100 podAffinityTerm: labelSelector: matchExpressions: - key: app operator: In values: ["my-app"] topologyKey: kubernetes.io/hostname

Probes Deep Dive

Three Types of Probes

ProbePurposeOn Failure
StartupIs the app finished starting?Kill and restart
LivenessIs the app alive and healthy?Kill and restart
ReadinessCan the app serve traffic?Remove from Service

Probe Methods

yaml
# HTTP GET (most common for web apps) livenessProbe: httpGet: path: /health port: 8080 httpHeaders: - name: Accept value: application/json # TCP Socket (for non-HTTP services) livenessProbe: tcpSocket: port: 5432 # Exec Command (custom check) livenessProbe: exec: command: - cat - /tmp/healthy # gRPC (for gRPC services) livenessProbe: grpc: port: 50051

Probe Timeline

Container Start
     │
     ▼
┌─ Startup Probe ─────────────────┐
│  Runs until success or timeout  │
│  (other probes are disabled)    │
└──────────────┬──────────────────┘
               │ success
               ▼
┌─ Liveness Probe (periodic) ─────┐
│  Is the container healthy?      │
│  Failure → restart container    │
└─────────────────────────────────┘
               +
┌─ Readiness Probe (periodic) ────┐
│  Can it handle traffic?         │
│  Failure → remove from Service  │
└─────────────────────────────────┘

Resource Requests and Limits

yaml
resources: requests: # Scheduler uses this to find a node cpu: "250m" # 250 millicores = 0.25 CPU memory: "256Mi" # 256 Mebibytes limits: # Maximum the container can use cpu: "1" # 1 full CPU core memory: "512Mi" # OOM killed if exceeded
UnitCPUMemory
Notationmillicores (m)Mi, Gi (binary) or M, G (decimal)
Example500m = 0.5 CPU256Mi = ~268 MB
What happens on exceedThrottledOOM Killed

QoS Classes (determined by requests/limits):

ClassConditionEviction Priority
Guaranteedrequests == limits for all containersLast to be evicted
BurstableAt least one request set, not GuaranteedMiddle
BestEffortNo requests or limits setFirst to be evicted

Workload Types

1. Deployment (Stateless Apps)

Most common workload. Manages ReplicaSets and rolling updates.

yaml
apiVersion: apps/v1 kind: Deployment metadata: name: web-app spec: replicas: 3 selector: matchLabels: app: web-app template: metadata: labels: app: web-app spec: containers: - name: app image: myapp:v2 ports: - containerPort: 8080

2. ReplicaSet

Ensures N identical pods are running. Managed by Deployments -- rarely used directly.

3. Job (One-Time Tasks)

yaml
apiVersion: batch/v1 kind: Job metadata: name: data-migration spec: backoffLimit: 3 # Retry up to 3 times activeDeadlineSeconds: 600 # Timeout after 10 minutes template: spec: containers: - name: migrate image: myapp:v1 command: ["./migrate", "up"] restartPolicy: OnFailure

4. CronJob (Scheduled Tasks)

yaml
apiVersion: batch/v1 kind: CronJob metadata: name: nightly-backup spec: schedule: "0 2 * * *" # 2 AM daily concurrencyPolicy: Forbid # Don't overlap successfulJobsHistoryLimit: 3 failedJobsHistoryLimit: 1 jobTemplate: spec: template: spec: containers: - name: backup image: backup-tool:v1 command: ["./backup.sh"] restartPolicy: OnFailure

5. DaemonSet (One Pod Per Node)

See 16 - StatefulSets & DaemonSets

6. StatefulSet (Stateful Apps)

See 16 - StatefulSets & DaemonSets

Multi-Container Patterns

Sidecar Pattern

A helper container that extends the main container:

yaml
spec: containers: - name: app image: myapp:v1 volumeMounts: - name: logs mountPath: /var/log/app - name: log-shipper # Sidecar image: fluent-bit:2.2 volumeMounts: - name: logs mountPath: /var/log/app readOnly: true volumes: - name: logs emptyDir: {}

Ambassador Pattern

A proxy that handles network communication:

yaml
spec: containers: - name: app image: myapp:v1 # App connects to localhost:5432 - name: cloud-sql-proxy # Ambassador image: gcr.io/cloud-sql-connectors/cloud-sql-proxy:2 args: - "--port=5432" - "project:region:instance"

Adapter Pattern

Transforms output to a standard format:

yaml
spec: containers: - name: app image: legacy-app:v1 # Produces custom metrics format - name: prometheus-adapter # Adapter image: metrics-adapter:v1 # Converts to Prometheus format ports: - containerPort: 9090

Pod Lifecycle

Pending → Running → Succeeded/Failed
   │         │
   │         └─ Containers running
   │
   └─ Scheduling, image pulling, init containers
PhaseDescription
PendingAccepted but not running (scheduling, pulling images)
RunningAt least one container running
SucceededAll containers exited with code 0
FailedAt least one container exited with non-zero code
UnknownPod status can't be determined (node communication issue)

kubectl Pod Commands

bash
# Create a pod kubectl run nginx --image=nginx:alpine # List pods kubectl get pods kubectl get pods -o wide # More details kubectl get pods -l app=myapp # Filter by label kubectl get pods --all-namespaces # All namespaces # Describe (events, conditions, details) kubectl describe pod my-pod # Logs kubectl logs my-pod kubectl logs my-pod -c sidecar # Specific container kubectl logs my-pod --previous # Previous crashed container kubectl logs -f my-pod # Stream logs kubectl logs -l app=myapp # All pods with label # Exec kubectl exec -it my-pod -- bash kubectl exec -it my-pod -c app -- bash # Specific container # Port forward kubectl port-forward my-pod 8080:80 # Copy files kubectl cp my-pod:/app/logs ./logs kubectl cp ./file.txt my-pod:/app/ # Delete kubectl delete pod my-pod kubectl delete pod my-pod --grace-period=0 --force # Immediate

FAANG Interview Angle

Common questions:

  1. "What's a Pod and why not just run containers directly?"
  2. "Explain the three types of probes"
  3. "What happens when a Pod exceeds its memory limit?"
  4. "Describe the sidecar pattern and when you'd use it"
  5. "What are QoS classes and how do they affect eviction?"

Key answers:

  • Pod groups tightly coupled containers sharing network/storage; K8s schedules and manages pods, not containers
  • Startup (wait for init), Liveness (is it alive → restart), Readiness (can it serve → remove from LB)
  • Memory limit exceeded → OOM Kill → container restarted based on restartPolicy
  • Sidecar: log shipping, service mesh proxy, config sync
  • Guaranteed (requests=limits, last evicted), Burstable (some resources), BestEffort (first evicted)

Official Links