16 - StatefulSets & DaemonSets
StatefulSets
For stateful applications that need:
- Stable, unique network identities (pod-0, pod-1, pod-2)
- Stable persistent storage (each pod gets its own PVC)
- Ordered deployment and scaling (pod-0 before pod-1 before pod-2)
- Ordered rolling updates (pod-2 before pod-1 before pod-0)
Deployment vs StatefulSet
| Feature | Deployment | StatefulSet |
|---|---|---|
| Pod names | Random (web-abc123) | Ordinal (web-0, web-1) |
| Pod identity | Interchangeable | Unique, stable |
| Storage | Shared or no persistence | Dedicated PVC per pod |
| Scaling order | Parallel | Sequential (0, 1, 2...) |
| Update order | Any order | Reverse order (...2, 1, 0) |
| Use case | Stateless apps | Databases, queues |
StatefulSet Manifest
yamlapiVersion: apps/v1 kind: StatefulSet metadata: name: postgres spec: serviceName: postgres-headless # Required: headless service name replicas: 3 selector: matchLabels: app: postgres # Pod template template: metadata: labels: app: postgres spec: containers: - name: postgres image: postgres:16 ports: - containerPort: 5432 env: - name: POSTGRES_PASSWORD valueFrom: secretKeyRef: name: pg-secret key: password volumeMounts: - name: data mountPath: /var/lib/postgresql/data # Each pod gets its own PVC (not shared) volumeClaimTemplates: - metadata: name: data spec: accessModes: ["ReadWriteOnce"] storageClassName: fast-ssd resources: requests: storage: 50Gi --- # Required headless service apiVersion: v1 kind: Service metadata: name: postgres-headless spec: clusterIP: None # Headless! selector: app: postgres ports: - port: 5432
How StatefulSet Pods Are Named
StatefulSet: postgres, replicas: 3
Pod names: DNS entries:
postgres-0 postgres-0.postgres-headless.default.svc.cluster.local
postgres-1 postgres-1.postgres-headless.default.svc.cluster.local
postgres-2 postgres-2.postgres-headless.default.svc.cluster.local
PVCs (auto-created):
data-postgres-0 # Bound to postgres-0
data-postgres-1 # Bound to postgres-1
data-postgres-2 # Bound to postgres-2
Scaling Behavior
Scale up (3 → 5):
postgres-0 ✓ (exists)
postgres-1 ✓ (exists)
postgres-2 ✓ (exists)
postgres-3 ← created (waits for postgres-2 to be ready)
postgres-4 ← created (waits for postgres-3 to be ready)
Scale down (5 → 3):
postgres-4 ← deleted first
postgres-3 ← deleted second
(PVCs are NOT deleted -- data preserved)
Update Strategies
yamlspec: updateStrategy: type: RollingUpdate # Default rollingUpdate: partition: 1 # Only update pods >= 1 (canary)
- RollingUpdate: Updates pods in reverse ordinal order (2, 1, 0)
- OnDelete: Only updates when you manually delete a pod
- Partition: Only updates pods with ordinal >= partition value (canary updates)
Real-World Example: MongoDB Replica Set
yamlapiVersion: apps/v1 kind: StatefulSet metadata: name: mongo spec: serviceName: mongo-headless replicas: 3 selector: matchLabels: app: mongo template: metadata: labels: app: mongo spec: containers: - name: mongo image: mongo:7 command: ["mongod", "--replSet", "rs0", "--bind_ip_all"] ports: - containerPort: 27017 volumeMounts: - name: data mountPath: /data/db initContainers: - name: init-rs image: mongo:7 command: - bash - -c - | # Initialize replica set on first pod only if $(hostname) == "mongo-0" ; then mongosh --eval ' rs.initiate({ _id: "rs0", members: [ {_id: 0, host: "mongo-0.mongo-headless:27017"}, {_id: 1, host: "mongo-1.mongo-headless:27017"}, {_id: 2, host: "mongo-2.mongo-headless:27017"} ] }) ' fi volumeClaimTemplates: - metadata: name: data spec: accessModes: ["ReadWriteOnce"] resources: requests: storage: 100Gi
DaemonSets
Run exactly one pod on every node (or a subset of nodes). Used for:
- Log collection (Fluentd, Filebeat)
- Monitoring agents (Prometheus Node Exporter, Datadog)
- Network plugins (Calico, Cilium)
- Storage daemons (Ceph, GlusterFS)
DaemonSet Manifest
yamlapiVersion: apps/v1 kind: DaemonSet metadata: name: node-exporter namespace: monitoring spec: selector: matchLabels: app: node-exporter template: metadata: labels: app: node-exporter spec: containers: - name: node-exporter image: prom/node-exporter:v1.7.0 ports: - containerPort: 9100 hostPort: 9100 resources: requests: cpu: 100m memory: 128Mi limits: cpu: 250m memory: 256Mi volumeMounts: - name: proc mountPath: /host/proc readOnly: true - name: sys mountPath: /host/sys readOnly: true # Tolerate all taints (run on ALL nodes, including control plane) tolerations: - operator: Exists volumes: - name: proc hostPath: path: /proc - name: sys hostPath: path: /sys
Running DaemonSet on Specific Nodes
yamlspec: template: spec: # Only on nodes with this label nodeSelector: disktype: ssd # Or use affinity for more complex rules affinity: nodeAffinity: requiredDuringSchedulingIgnoredDuringExecution: nodeSelectorTerms: - matchExpressions: - key: kubernetes.io/os operator: In values: ["linux"]
DaemonSet Update Strategy
yamlspec: updateStrategy: type: RollingUpdate # or OnDelete rollingUpdate: maxUnavailable: 1 # Update 1 node at a time maxSurge: 0 # Don't create extra pods
Real-World Example: Fluentd Log Collector
yamlapiVersion: apps/v1 kind: DaemonSet metadata: name: fluentd namespace: logging spec: selector: matchLabels: app: fluentd template: metadata: labels: app: fluentd spec: serviceAccountName: fluentd tolerations: - key: node-role.kubernetes.io/control-plane effect: NoSchedule containers: - name: fluentd image: fluent/fluentd-kubernetes-daemonset:v1.16 env: - name: FLUENT_ELASTICSEARCH_HOST value: "elasticsearch.logging" volumeMounts: - name: varlog mountPath: /var/log - name: containers mountPath: /var/lib/docker/containers readOnly: true volumes: - name: varlog hostPath: path: /var/log - name: containers hostPath: path: /var/lib/docker/containers
kubectl Commands
bash# StatefulSets kubectl get statefulsets kubectl get sts kubectl describe sts postgres kubectl scale sts postgres --replicas=5 kubectl rollout status sts postgres kubectl rollout undo sts postgres # DaemonSets kubectl get daemonsets kubectl get ds kubectl describe ds node-exporter kubectl rollout status ds node-exporter # See which pods are on which nodes kubectl get pods -o wide
FAANG Interview Angle
Common questions:
- "When would you use a StatefulSet vs Deployment?"
- "How does persistent storage work with StatefulSets?"
- "What guarantees does a StatefulSet provide?"
- "What is a DaemonSet and when would you use it?"
- "How do you update a StatefulSet safely?"
Key answers:
- StatefulSet for databases, message queues -- anything needing stable identity and storage
- Each pod gets its own PVC via volumeClaimTemplates; PVCs survive pod deletion
- Ordered deploy/scale/update, stable DNS names, dedicated persistent storage
- DaemonSet ensures one pod per node; used for logging, monitoring, networking
- Partition-based canary: set partition=N to only update pods with ordinal >= N