14 - Storage in Kubernetes
The Storage Problem
Pods are ephemeral. When a Pod dies, its local filesystem is gone. Kubernetes needs a way to:
- Persist data across Pod restarts
- Share data between containers in a Pod
- Provide different storage types (SSD, NFS, cloud disks)
Volume Types
emptyDir (Temporary Shared Storage)
Created when Pod starts, deleted when Pod is removed:
yamlapiVersion: v1 kind: Pod metadata: name: shared-data spec: containers: - name: writer image: alpine command: [sh, -c, "while true; do date >> /data/log.txt; sleep 1; done"] volumeMounts: - name: shared mountPath: /data - name: reader image: alpine command: [sh, -c, "tail -f /data/log.txt"] volumeMounts: - name: shared mountPath: /data volumes: - name: shared emptyDir: {} # Regular disk # emptyDir: # medium: Memory # Uses RAM (tmpfs) -- faster, limited size # sizeLimit: 100Mi
hostPath (Node's Filesystem)
Maps a directory from the host node:
yamlvolumes: - name: host-data hostPath: path: /var/log/app type: DirectoryOrCreate # Directory | File | DirectoryOrCreate | FileOrCreate
Warning: hostPath ties Pods to specific nodes. Avoid in production unless necessary (log collection, node monitoring).
Persistent Volumes (PV) and Persistent Volume Claims (PVC)
The standard way to manage persistent storage:
Administrator creates Developer requests
┌─────────────┐ ┌─────────────┐
│ Persistent │ ◄── binds ──► │ Persistent │
│ Volume (PV) │ │ Volume │
│ │ │ Claim (PVC) │
│ 100Gi SSD │ │ "I need │
│ on AWS EBS │ │ 50Gi SSD" │
└─────────────┘ └──────┬──────┘
│
┌──────┴──────┐
│ Pod │
│ mounts PVC │
└─────────────┘
Persistent Volume
yamlapiVersion: v1 kind: PersistentVolume metadata: name: pv-ssd-100g spec: capacity: storage: 100Gi accessModes: - ReadWriteOnce # Can be mounted by one node persistentVolumeReclaimPolicy: Retain # Keep data after PVC deletion storageClassName: fast-ssd # Storage backend (one of these): hostPath: # Local (dev only) path: /data/pv-ssd # nfs: # NFS # server: nfs.example.com # path: /shared # awsElasticBlockStore: # AWS EBS # volumeID: vol-abc123 # fsType: ext4 # gcePersistentDisk: # GCP PD # pdName: my-disk
Persistent Volume Claim
yamlapiVersion: v1 kind: PersistentVolumeClaim metadata: name: app-data spec: accessModes: - ReadWriteOnce resources: requests: storage: 50Gi storageClassName: fast-ssd # Must match PV's storageClass
Using PVC in a Pod
yamlapiVersion: v1 kind: Pod metadata: name: my-app spec: containers: - name: app image: myapp:v1 volumeMounts: - name: data mountPath: /app/data volumes: - name: data persistentVolumeClaim: claimName: app-data
Access Modes
| Mode | Abbreviation | Description |
|---|---|---|
| ReadWriteOnce | RWO | One node can mount read-write |
| ReadOnlyMany | ROX | Many nodes can mount read-only |
| ReadWriteMany | RWX | Many nodes can mount read-write |
| ReadWriteOncePod | RWOP | Single pod can mount read-write |
Reclaim Policies
| Policy | Behavior |
|---|---|
| Retain | PV data kept after PVC deletion (manual cleanup) |
| Delete | PV and storage deleted when PVC is deleted |
| Recycle | Deprecated. Use Delete with dynamic provisioning |
StorageClasses (Dynamic Provisioning)
Instead of pre-creating PVs, let Kubernetes create them automatically:
yamlapiVersion: storage.k8s.io/v1 kind: StorageClass metadata: name: fast-ssd annotations: storageclass.kubernetes.io/is-default-class: "true" provisioner: kubernetes.io/aws-ebs # Or: ebs.csi.aws.com parameters: type: gp3 iopsPerGB: "3000" encrypted: "true" reclaimPolicy: Delete allowVolumeExpansion: true volumeBindingMode: WaitForFirstConsumer # Bind when Pod is scheduled
yaml# PVC referencing StorageClass -- PV created automatically! apiVersion: v1 kind: PersistentVolumeClaim metadata: name: auto-provisioned spec: accessModes: - ReadWriteOnce storageClassName: fast-ssd resources: requests: storage: 50Gi # K8s automatically creates a 50Gi gp3 EBS volume
Common StorageClass Provisioners
| Cloud | Provisioner | Volume Type |
|---|---|---|
| AWS | ebs.csi.aws.com | EBS (block) |
| AWS | efs.csi.aws.com | EFS (file, RWX) |
| GCP | pd.csi.storage.gke.io | Persistent Disk |
| Azure | disk.csi.azure.com | Azure Disk |
| Azure | file.csi.azure.com | Azure Files (RWX) |
| Local | kubernetes.io/no-provisioner | Manual |
CSI (Container Storage Interface)
The standard plugin interface for storage in K8s:
┌──────────┐ ┌─────────┐ ┌──────────────┐
│ kubelet │ ──► │ CSI │ ──► │ Storage │
│ │ │ Driver │ │ Backend │
│ │ │ (plugin)│ │ (AWS EBS, │
│ │ │ │ │ NFS, etc.) │
└──────────┘ └─────────┘ └──────────────┘
CSI replaces in-tree volume plugins with external, vendor-maintained drivers.
Volume Snapshots
Create point-in-time copies of volumes:
yaml# Create a snapshot apiVersion: snapshot.storage.k8s.io/v1 kind: VolumeSnapshot metadata: name: db-snapshot-2024-01 spec: volumeSnapshotClassName: csi-aws-vsc source: persistentVolumeClaimName: db-data --- # Restore from snapshot apiVersion: v1 kind: PersistentVolumeClaim metadata: name: db-data-restored spec: accessModes: - ReadWriteOnce storageClassName: fast-ssd resources: requests: storage: 100Gi dataSource: name: db-snapshot-2024-01 kind: VolumeSnapshot apiGroup: snapshot.storage.k8s.io
Volume Expansion
Grow PVCs without downtime (if StorageClass allows):
bash# Edit PVC to increase size kubectl patch pvc app-data -p '{"spec":{"resources":{"requests":{"storage":"100Gi"}}}}' # Or edit the YAML kubectl edit pvc app-data # Change storage from 50Gi to 100Gi
kubectl Storage Commands
bash# List persistent volumes kubectl get pv # List persistent volume claims kubectl get pvc # List storage classes kubectl get sc # Describe for details kubectl describe pv pv-ssd-100g kubectl describe pvc app-data # See volume snapshots kubectl get volumesnapshots
FAANG Interview Angle
Common questions:
- "Explain the PV/PVC model in Kubernetes"
- "What's the difference between static and dynamic provisioning?"
- "How would you handle persistent storage for a database in K8s?"
- "What access modes are available and when would you use each?"
- "How do you back up data in Kubernetes?"
Key answers:
- PV is the actual storage, PVC is a request for storage; decouples storage management from consumption
- Static: admin creates PVs manually. Dynamic: StorageClass auto-creates PVs on PVC creation
- StatefulSet with PVC template, StorageClass for dynamic provisioning, volume snapshots for backup
- RWO for single-node DBs, ROX for shared configs, RWX for shared file storage (NFS/EFS)
- Volume snapshots, backup tools (Velero), or application-level dumps