14 - Storage in Kubernetes

The Storage Problem

Pods are ephemeral. When a Pod dies, its local filesystem is gone. Kubernetes needs a way to:

  • Persist data across Pod restarts
  • Share data between containers in a Pod
  • Provide different storage types (SSD, NFS, cloud disks)

Volume Types

emptyDir (Temporary Shared Storage)

Created when Pod starts, deleted when Pod is removed:

yaml
apiVersion: v1 kind: Pod metadata: name: shared-data spec: containers: - name: writer image: alpine command: [sh, -c, "while true; do date >> /data/log.txt; sleep 1; done"] volumeMounts: - name: shared mountPath: /data - name: reader image: alpine command: [sh, -c, "tail -f /data/log.txt"] volumeMounts: - name: shared mountPath: /data volumes: - name: shared emptyDir: {} # Regular disk # emptyDir: # medium: Memory # Uses RAM (tmpfs) -- faster, limited size # sizeLimit: 100Mi

hostPath (Node's Filesystem)

Maps a directory from the host node:

yaml
volumes: - name: host-data hostPath: path: /var/log/app type: DirectoryOrCreate # Directory | File | DirectoryOrCreate | FileOrCreate

Warning: hostPath ties Pods to specific nodes. Avoid in production unless necessary (log collection, node monitoring).

Persistent Volumes (PV) and Persistent Volume Claims (PVC)

The standard way to manage persistent storage:

Administrator creates          Developer requests
┌─────────────┐               ┌─────────────┐
│ Persistent  │ ◄── binds ──► │ Persistent  │
│ Volume (PV) │               │ Volume      │
│             │               │ Claim (PVC) │
│ 100Gi SSD   │               │ "I need     │
│ on AWS EBS  │               │  50Gi SSD"  │
└─────────────┘               └──────┬──────┘
                                     │
                              ┌──────┴──────┐
                              │    Pod      │
                              │ mounts PVC  │
                              └─────────────┘

Persistent Volume

yaml
apiVersion: v1 kind: PersistentVolume metadata: name: pv-ssd-100g spec: capacity: storage: 100Gi accessModes: - ReadWriteOnce # Can be mounted by one node persistentVolumeReclaimPolicy: Retain # Keep data after PVC deletion storageClassName: fast-ssd # Storage backend (one of these): hostPath: # Local (dev only) path: /data/pv-ssd # nfs: # NFS # server: nfs.example.com # path: /shared # awsElasticBlockStore: # AWS EBS # volumeID: vol-abc123 # fsType: ext4 # gcePersistentDisk: # GCP PD # pdName: my-disk

Persistent Volume Claim

yaml
apiVersion: v1 kind: PersistentVolumeClaim metadata: name: app-data spec: accessModes: - ReadWriteOnce resources: requests: storage: 50Gi storageClassName: fast-ssd # Must match PV's storageClass

Using PVC in a Pod

yaml
apiVersion: v1 kind: Pod metadata: name: my-app spec: containers: - name: app image: myapp:v1 volumeMounts: - name: data mountPath: /app/data volumes: - name: data persistentVolumeClaim: claimName: app-data

Access Modes

ModeAbbreviationDescription
ReadWriteOnceRWOOne node can mount read-write
ReadOnlyManyROXMany nodes can mount read-only
ReadWriteManyRWXMany nodes can mount read-write
ReadWriteOncePodRWOPSingle pod can mount read-write

Reclaim Policies

PolicyBehavior
RetainPV data kept after PVC deletion (manual cleanup)
DeletePV and storage deleted when PVC is deleted
RecycleDeprecated. Use Delete with dynamic provisioning

StorageClasses (Dynamic Provisioning)

Instead of pre-creating PVs, let Kubernetes create them automatically:

yaml
apiVersion: storage.k8s.io/v1 kind: StorageClass metadata: name: fast-ssd annotations: storageclass.kubernetes.io/is-default-class: "true" provisioner: kubernetes.io/aws-ebs # Or: ebs.csi.aws.com parameters: type: gp3 iopsPerGB: "3000" encrypted: "true" reclaimPolicy: Delete allowVolumeExpansion: true volumeBindingMode: WaitForFirstConsumer # Bind when Pod is scheduled
yaml
# PVC referencing StorageClass -- PV created automatically! apiVersion: v1 kind: PersistentVolumeClaim metadata: name: auto-provisioned spec: accessModes: - ReadWriteOnce storageClassName: fast-ssd resources: requests: storage: 50Gi # K8s automatically creates a 50Gi gp3 EBS volume

Common StorageClass Provisioners

CloudProvisionerVolume Type
AWSebs.csi.aws.comEBS (block)
AWSefs.csi.aws.comEFS (file, RWX)
GCPpd.csi.storage.gke.ioPersistent Disk
Azuredisk.csi.azure.comAzure Disk
Azurefile.csi.azure.comAzure Files (RWX)
Localkubernetes.io/no-provisionerManual

CSI (Container Storage Interface)

The standard plugin interface for storage in K8s:

┌──────────┐     ┌─────────┐     ┌──────────────┐
│ kubelet  │ ──► │ CSI     │ ──► │ Storage      │
│          │     │ Driver  │     │ Backend      │
│          │     │ (plugin)│     │ (AWS EBS,    │
│          │     │         │     │  NFS, etc.)  │
└──────────┘     └─────────┘     └──────────────┘

CSI replaces in-tree volume plugins with external, vendor-maintained drivers.

Volume Snapshots

Create point-in-time copies of volumes:

yaml
# Create a snapshot apiVersion: snapshot.storage.k8s.io/v1 kind: VolumeSnapshot metadata: name: db-snapshot-2024-01 spec: volumeSnapshotClassName: csi-aws-vsc source: persistentVolumeClaimName: db-data --- # Restore from snapshot apiVersion: v1 kind: PersistentVolumeClaim metadata: name: db-data-restored spec: accessModes: - ReadWriteOnce storageClassName: fast-ssd resources: requests: storage: 100Gi dataSource: name: db-snapshot-2024-01 kind: VolumeSnapshot apiGroup: snapshot.storage.k8s.io

Volume Expansion

Grow PVCs without downtime (if StorageClass allows):

bash
# Edit PVC to increase size kubectl patch pvc app-data -p '{"spec":{"resources":{"requests":{"storage":"100Gi"}}}}' # Or edit the YAML kubectl edit pvc app-data # Change storage from 50Gi to 100Gi

kubectl Storage Commands

bash
# List persistent volumes kubectl get pv # List persistent volume claims kubectl get pvc # List storage classes kubectl get sc # Describe for details kubectl describe pv pv-ssd-100g kubectl describe pvc app-data # See volume snapshots kubectl get volumesnapshots

FAANG Interview Angle

Common questions:

  1. "Explain the PV/PVC model in Kubernetes"
  2. "What's the difference between static and dynamic provisioning?"
  3. "How would you handle persistent storage for a database in K8s?"
  4. "What access modes are available and when would you use each?"
  5. "How do you back up data in Kubernetes?"

Key answers:

  • PV is the actual storage, PVC is a request for storage; decouples storage management from consumption
  • Static: admin creates PVs manually. Dynamic: StorageClass auto-creates PVs on PVC creation
  • StatefulSet with PVC template, StorageClass for dynamic provisioning, volume snapshots for backup
  • RWO for single-node DBs, ROX for shared configs, RWX for shared file storage (NFS/EFS)
  • Volume snapshots, backup tools (Velero), or application-level dumps

Official Links