06 - Docker Volumes & Storage
The Problem: Container Data Is Ephemeral
By default, data written inside a container is lost when the container is removed:
bash# Write data in a container docker run --name test ubuntu bash -c "echo 'important data' > /data.txt" # Data exists while container exists docker start test docker exec test cat /data.txt # "important data" # Remove the container = data gone forever docker rm test
Three Types of Storage
Host Machine
┌──────────────────────────────────────────────┐
│ │
│ 1. Volumes (managed by Docker) │
│ /var/lib/docker/volumes/my-vol/_data/ │
│ │ │
│ 2. Bind Mounts (any host path) │
│ /home/user/project/ │
│ │ │
│ 3. tmpfs (RAM only, Linux) │
│ (in memory) │
│ │ │
│ ┌─── Container ──┴─────────────────┐ │
│ │ /app/data ← mounted here │ │
│ └──────────────────────────────────┘ │
└──────────────────────────────────────────────┘
| Feature | Volumes | Bind Mounts | tmpfs |
|---|---|---|---|
| Location | Docker-managed dir | Any host path | RAM |
| Created by | docker volume create | User specifies path | Docker |
| Portable | Yes | No (host-dependent) | No |
| Backup | Easy | Manual | N/A |
| Performance | Native | Native | Fastest |
| Use case | Production data | Dev (live code reload) | Secrets, temp data |
| Survives container removal | Yes | Yes (on host) | No |
Volumes (Recommended for Production)
Creating and Managing Volumes
bash# Create a named volume docker volume create my-data # List volumes docker volume ls # Inspect a volume docker volume inspect my-data # Shows: Mountpoint: /var/lib/docker/volumes/my-data/_data # Remove a volume docker volume rm my-data # Remove all unused volumes docker volume prune
Using Volumes with Containers
bash# Mount a named volume docker run -d --name db \ -v my-data:/var/lib/postgresql/data \ postgres:16 # Same with --mount syntax (more explicit, recommended) docker run -d --name db \ --mount type=volume,source=my-data,target=/var/lib/postgresql/data \ postgres:16 # Anonymous volume (Docker generates a random name) docker run -d -v /var/lib/postgresql/data postgres:16 # Read-only volume docker run -d \ -v config-vol:/etc/app/config:ro \ myapp
Volume with Multiple Containers
bash# Shared volume between containers docker volume create shared-data docker run -d --name writer -v shared-data:/data alpine \ sh -c "while true; do date >> /data/log.txt; sleep 1; done" docker run -d --name reader -v shared-data:/data:ro alpine \ sh -c "while true; do cat /data/log.txt; sleep 5; done"
Volume Drivers
bash# Local driver (default) docker volume create --driver local my-vol # NFS volume docker volume create --driver local \ --opt type=nfs \ --opt o=addr=192.168.1.10,rw \ --opt device=:/shared/data \ nfs-vol # Cloud storage plugins (examples) # AWS EBS, Azure File, GCP Persistent Disk docker plugin install rexray/ebs docker volume create --driver rexray/ebs --opt size=100 ebs-vol
Bind Mounts (Best for Development)
Map a specific host directory into the container:
bash# Bind mount with -v docker run -d --name dev \ -v /Users/zineddine/project:/app \ -v /Users/zineddine/project/node_modules:/app/node_modules \ node:20-alpine npm run dev # Bind mount with --mount (more explicit) docker run -d --name dev \ --mount type=bind,source=/Users/zineddine/project,target=/app \ node:20-alpine npm run dev # Read-only bind mount docker run -d \ --mount type=bind,source=$(pwd)/config,target=/etc/app,readonly \ myapp # Bind mount a single file docker run -d \ -v $(pwd)/nginx.conf:/etc/nginx/nginx.conf:ro \ nginx
Live Code Reload Example
bash# Development setup with hot-reload docker run -d --name frontend \ -v $(pwd)/src:/app/src \ -v $(pwd)/public:/app/public \ -p 3000:3000 \ node:20-alpine \ sh -c "cd /app && npm run dev" # Changes to src/ on your host immediately reflect in the container
tmpfs Mounts (In-Memory)
Data stored in RAM, never written to disk:
bash# tmpfs mount docker run -d --name secure \ --tmpfs /tmp:rw,size=100m \ myapp # With --mount syntax docker run -d --name secure \ --mount type=tmpfs,target=/tmp,tmpfs-size=100m \ myapp
Use cases:
- Temporary files that shouldn't persist
- Sensitive data (credentials) during processing
- Performance-critical temporary storage
Storage Drivers (Image/Container Layers)
How Docker stores image layers and the container writable layer:
| Driver | Backing Filesystem | Maturity | Performance |
|---|---|---|---|
| overlay2 | xfs, ext4 | Production-ready | Excellent |
| btrfs | btrfs | Mature | Good |
| zfs | zfs | Mature | Good |
| devicemapper | direct-lvm | Deprecated | Fair |
| vfs | Any | Slow (no CoW) | Poor |
bash# Check current storage driver docker info | grep "Storage Driver" # Storage Driver: overlay2 # Configure in daemon.json # /etc/docker/daemon.json { "storage-driver": "overlay2" }
How overlay2 Works
┌─────────────────────────┐
│ Container Layer │ ← UpperDir (read-write)
│ (writable) │
├─────────────────────────┤
│ Image Layer N │ ← LowerDir (read-only)
├─────────────────────────┤
│ Image Layer 2 │ ← LowerDir (read-only)
├─────────────────────────┤
│ Image Layer 1 │ ← LowerDir (read-only)
├─────────────────────────┤
│ Merged View │ ← MergedDir (union of all layers)
└─────────────────────────┘
- Copy-on-Write (CoW): When container modifies a file from a lower layer, it's copied to the upper (writable) layer first
- This is why write-heavy workloads should use volumes, not the container layer
Backup and Restore
Volume Backup
bash# Backup a volume to a tar file docker run --rm \ -v my-data:/source:ro \ -v $(pwd):/backup \ alpine tar czf /backup/my-data-backup.tar.gz -C /source . # Restore a volume from backup docker volume create my-data-restored docker run --rm \ -v my-data-restored:/target \ -v $(pwd):/backup:ro \ alpine tar xzf /backup/my-data-backup.tar.gz -C /target
Database Backup Example
bash# PostgreSQL backup docker exec db pg_dump -U postgres mydb > backup.sql # MySQL backup docker exec db mysqldump -u root -p"$MYSQL_ROOT_PASSWORD" mydb > backup.sql # Restore docker exec -i db psql -U postgres mydb < backup.sql
Common Patterns
Named Volume for Database
bashdocker volume create postgres-data docker run -d --name db \ -v postgres-data:/var/lib/postgresql/data \ -e POSTGRES_PASSWORD=secret \ postgres:16 # Database data persists across container restarts and recreations docker rm -f db docker run -d --name db \ -v postgres-data:/var/lib/postgresql/data \ -e POSTGRES_PASSWORD=secret \ postgres:16 # Data is still there!
Bind Mount for Config Files
bashdocker run -d --name web \ -v $(pwd)/nginx.conf:/etc/nginx/conf.d/default.conf:ro \ -v $(pwd)/ssl:/etc/nginx/ssl:ro \ -p 443:443 \ nginx
Volume for Shared Data Between Services
bashdocker volume create uploads docker run -d --name api \ -v uploads:/app/uploads \ myapi docker run -d --name worker \ -v uploads:/app/uploads:ro \ myworker
FAANG Interview Angle
Common questions:
- "How does data persist in Docker containers?"
- "What's the difference between volumes and bind mounts?"
- "When would you use tmpfs?"
- "How does copy-on-write work in Docker?"
- "How would you back up data from a Docker volume?"
Key answers:
- Container filesystem is ephemeral; use volumes for persistence
- Volumes are Docker-managed (portable), bind mounts are host-path dependent
- tmpfs for sensitive temp data -- never hits disk
- CoW copies files from read-only layers to writable layer on first write
- Backup: mount volume to a utility container and tar it