06 - Docker Volumes & Storage

The Problem: Container Data Is Ephemeral

By default, data written inside a container is lost when the container is removed:

bash
# Write data in a container docker run --name test ubuntu bash -c "echo 'important data' > /data.txt" # Data exists while container exists docker start test docker exec test cat /data.txt # "important data" # Remove the container = data gone forever docker rm test

Three Types of Storage

Host Machine
┌──────────────────────────────────────────────┐
│                                              │
│  1. Volumes (managed by Docker)              │
│     /var/lib/docker/volumes/my-vol/_data/    │
│                    │                         │
│  2. Bind Mounts (any host path)              │
│     /home/user/project/                      │
│                    │                         │
│  3. tmpfs (RAM only, Linux)                  │
│     (in memory)                              │
│                    │                         │
│  ┌─── Container ──┴─────────────────┐        │
│  │  /app/data  ← mounted here       │        │
│  └──────────────────────────────────┘        │
└──────────────────────────────────────────────┘
FeatureVolumesBind Mountstmpfs
LocationDocker-managed dirAny host pathRAM
Created bydocker volume createUser specifies pathDocker
PortableYesNo (host-dependent)No
BackupEasyManualN/A
PerformanceNativeNativeFastest
Use caseProduction dataDev (live code reload)Secrets, temp data
Survives container removalYesYes (on host)No

Volumes (Recommended for Production)

Creating and Managing Volumes

bash
# Create a named volume docker volume create my-data # List volumes docker volume ls # Inspect a volume docker volume inspect my-data # Shows: Mountpoint: /var/lib/docker/volumes/my-data/_data # Remove a volume docker volume rm my-data # Remove all unused volumes docker volume prune

Using Volumes with Containers

bash
# Mount a named volume docker run -d --name db \ -v my-data:/var/lib/postgresql/data \ postgres:16 # Same with --mount syntax (more explicit, recommended) docker run -d --name db \ --mount type=volume,source=my-data,target=/var/lib/postgresql/data \ postgres:16 # Anonymous volume (Docker generates a random name) docker run -d -v /var/lib/postgresql/data postgres:16 # Read-only volume docker run -d \ -v config-vol:/etc/app/config:ro \ myapp

Volume with Multiple Containers

bash
# Shared volume between containers docker volume create shared-data docker run -d --name writer -v shared-data:/data alpine \ sh -c "while true; do date >> /data/log.txt; sleep 1; done" docker run -d --name reader -v shared-data:/data:ro alpine \ sh -c "while true; do cat /data/log.txt; sleep 5; done"

Volume Drivers

bash
# Local driver (default) docker volume create --driver local my-vol # NFS volume docker volume create --driver local \ --opt type=nfs \ --opt o=addr=192.168.1.10,rw \ --opt device=:/shared/data \ nfs-vol # Cloud storage plugins (examples) # AWS EBS, Azure File, GCP Persistent Disk docker plugin install rexray/ebs docker volume create --driver rexray/ebs --opt size=100 ebs-vol

Bind Mounts (Best for Development)

Map a specific host directory into the container:

bash
# Bind mount with -v docker run -d --name dev \ -v /Users/zineddine/project:/app \ -v /Users/zineddine/project/node_modules:/app/node_modules \ node:20-alpine npm run dev # Bind mount with --mount (more explicit) docker run -d --name dev \ --mount type=bind,source=/Users/zineddine/project,target=/app \ node:20-alpine npm run dev # Read-only bind mount docker run -d \ --mount type=bind,source=$(pwd)/config,target=/etc/app,readonly \ myapp # Bind mount a single file docker run -d \ -v $(pwd)/nginx.conf:/etc/nginx/nginx.conf:ro \ nginx

Live Code Reload Example

bash
# Development setup with hot-reload docker run -d --name frontend \ -v $(pwd)/src:/app/src \ -v $(pwd)/public:/app/public \ -p 3000:3000 \ node:20-alpine \ sh -c "cd /app && npm run dev" # Changes to src/ on your host immediately reflect in the container

tmpfs Mounts (In-Memory)

Data stored in RAM, never written to disk:

bash
# tmpfs mount docker run -d --name secure \ --tmpfs /tmp:rw,size=100m \ myapp # With --mount syntax docker run -d --name secure \ --mount type=tmpfs,target=/tmp,tmpfs-size=100m \ myapp

Use cases:

  • Temporary files that shouldn't persist
  • Sensitive data (credentials) during processing
  • Performance-critical temporary storage

Storage Drivers (Image/Container Layers)

How Docker stores image layers and the container writable layer:

DriverBacking FilesystemMaturityPerformance
overlay2xfs, ext4Production-readyExcellent
btrfsbtrfsMatureGood
zfszfsMatureGood
devicemapperdirect-lvmDeprecatedFair
vfsAnySlow (no CoW)Poor
bash
# Check current storage driver docker info | grep "Storage Driver" # Storage Driver: overlay2 # Configure in daemon.json # /etc/docker/daemon.json { "storage-driver": "overlay2" }

How overlay2 Works

┌─────────────────────────┐
│   Container Layer       │ ← UpperDir (read-write)
│   (writable)            │
├─────────────────────────┤
│   Image Layer N         │ ← LowerDir (read-only)
├─────────────────────────┤
│   Image Layer 2         │ ← LowerDir (read-only)
├─────────────────────────┤
│   Image Layer 1         │ ← LowerDir (read-only)
├─────────────────────────┤
│   Merged View           │ ← MergedDir (union of all layers)
└─────────────────────────┘
  • Copy-on-Write (CoW): When container modifies a file from a lower layer, it's copied to the upper (writable) layer first
  • This is why write-heavy workloads should use volumes, not the container layer

Backup and Restore

Volume Backup

bash
# Backup a volume to a tar file docker run --rm \ -v my-data:/source:ro \ -v $(pwd):/backup \ alpine tar czf /backup/my-data-backup.tar.gz -C /source . # Restore a volume from backup docker volume create my-data-restored docker run --rm \ -v my-data-restored:/target \ -v $(pwd):/backup:ro \ alpine tar xzf /backup/my-data-backup.tar.gz -C /target

Database Backup Example

bash
# PostgreSQL backup docker exec db pg_dump -U postgres mydb > backup.sql # MySQL backup docker exec db mysqldump -u root -p"$MYSQL_ROOT_PASSWORD" mydb > backup.sql # Restore docker exec -i db psql -U postgres mydb < backup.sql

Common Patterns

Named Volume for Database

bash
docker volume create postgres-data docker run -d --name db \ -v postgres-data:/var/lib/postgresql/data \ -e POSTGRES_PASSWORD=secret \ postgres:16 # Database data persists across container restarts and recreations docker rm -f db docker run -d --name db \ -v postgres-data:/var/lib/postgresql/data \ -e POSTGRES_PASSWORD=secret \ postgres:16 # Data is still there!

Bind Mount for Config Files

bash
docker run -d --name web \ -v $(pwd)/nginx.conf:/etc/nginx/conf.d/default.conf:ro \ -v $(pwd)/ssl:/etc/nginx/ssl:ro \ -p 443:443 \ nginx

Volume for Shared Data Between Services

bash
docker volume create uploads docker run -d --name api \ -v uploads:/app/uploads \ myapi docker run -d --name worker \ -v uploads:/app/uploads:ro \ myworker

FAANG Interview Angle

Common questions:

  1. "How does data persist in Docker containers?"
  2. "What's the difference between volumes and bind mounts?"
  3. "When would you use tmpfs?"
  4. "How does copy-on-write work in Docker?"
  5. "How would you back up data from a Docker volume?"

Key answers:

  • Container filesystem is ephemeral; use volumes for persistence
  • Volumes are Docker-managed (portable), bind mounts are host-path dependent
  • tmpfs for sensitive temp data -- never hits disk
  • CoW copies files from read-only layers to writable layer on first write
  • Backup: mount volume to a utility container and tar it

Official Links