03 - Docker Images & Dockerfile

What Is a Docker Image?

An image is a read-only template containing:

  • A minimal OS filesystem (e.g., Alpine, Debian)
  • Your application code
  • Dependencies and libraries
  • Environment variables and configuration
  • A default command to run

Think of it as a snapshot of everything your app needs to run.

Image Layers

Every instruction in a Dockerfile creates a layer:

┌─────────────────────────────┐
│  CMD ["node", "server.js"]  │  Layer 5 (metadata only)
├─────────────────────────────┤
│  COPY . /app                │  Layer 4 (~5 MB)
├─────────────────────────────┤
│  RUN npm install            │  Layer 3 (~50 MB)
├─────────────────────────────┤
│  COPY package*.json /app/   │  Layer 2 (~1 KB)
├─────────────────────────────┤
│  FROM node:20-alpine        │  Layer 1 (~180 MB) - base image
└─────────────────────────────┘

Key concepts:

  • Layers are cached -- unchanged layers are reused
  • Layers are shared between images (saves disk space)
  • Only the top container layer is writable (copy-on-write)
bash
# See layers of an image docker history nginx # See detailed layer info docker inspect nginx | jq '.[0].RootFS.Layers'

Image Naming Convention

[registry/][namespace/]repository[:tag|@digest]

Examples:
nginx                          → docker.io/library/nginx:latest
nginx:1.25-alpine              → docker.io/library/nginx:1.25-alpine
myuser/myapp:v2.1              → docker.io/myuser/myapp:v2.1
gcr.io/project/api:abc123      → gcr.io/project/api:abc123
ghcr.io/org/service@sha256:... → pinned by content digest

Best practice: Always use specific tags, never rely on :latest in production.

Dockerfile Basics

A Dockerfile is a text file with instructions to build an image.

Complete Instruction Reference

dockerfile
# ---------- Base Image ---------- FROM node:20-alpine AS builder # Always start with FROM. Use specific tags, not :latest. # AS names this stage for multi-stage builds. # ---------- Metadata ---------- LABEL maintainer="you@example.com" LABEL version="1.0" LABEL description="My Node.js application" # ---------- Arguments ---------- ARG NODE_ENV=production # Build-time variables. NOT available at runtime. # Override with: docker build --build-arg NODE_ENV=development # ---------- Environment ---------- ENV NODE_ENV=${NODE_ENV} ENV PORT=3000 # Available at build time AND runtime. # Override at runtime: docker run -e PORT=8080 # ---------- Working Directory ---------- WORKDIR /app # Sets the working directory for subsequent commands. # Creates it if it doesn't exist. Prefer over "RUN mkdir && cd". # ---------- Copy Files ---------- COPY package*.json ./ # Copies from build context to image. # Respects .dockerignore ADD https://example.com/file.tar.gz /tmp/ # Like COPY but can also: # - Extract tar archives automatically # - Download URLs # Prefer COPY unless you need these features. # ---------- Run Commands ---------- RUN npm ci --only=production # Executes during build. Each RUN creates a new layer. # Shell form (runs in /bin/sh -c): RUN echo "hello" # Exec form (no shell, preferred): RUN ["npm", "ci", "--only=production"] # ---------- Expose Ports ---------- EXPOSE 3000 # Documentation only! Doesn't actually publish the port. # You still need -p 3000:3000 at runtime. # ---------- Volumes ---------- VOLUME ["/data"] # Creates a mount point. Data here persists beyond container life. # ---------- User ---------- USER node # Switch to non-root user. Critical for security. # ---------- Health Check ---------- HEALTHCHECK --interval=30s --timeout=3s --retries=3 \ CMD curl -f http://localhost:3000/health || exit 1 # Docker will check container health periodically. # ---------- Entry Point ---------- ENTRYPOINT ["node"] # The main executable. Hard to override (need --entrypoint flag). # ---------- Default Command ---------- CMD ["server.js"] # Default arguments to ENTRYPOINT. # Easy to override: docker run myimage other.js # Combined: ENTRYPOINT + CMD = node server.js

ENTRYPOINT vs CMD

dockerfile
# CMD only -- easy to override entirely CMD ["node", "server.js"] # docker run myimage → node server.js # docker run myimage bash → bash (CMD replaced) # ENTRYPOINT only -- always runs this executable ENTRYPOINT ["node"] # docker run myimage → node # docker run myimage server.js → node server.js # ENTRYPOINT + CMD -- best of both worlds ENTRYPOINT ["node"] CMD ["server.js"] # docker run myimage → node server.js # docker run myimage test.js → node test.js (CMD replaced) # docker run --entrypoint bash myimage → bash (override ENTRYPOINT)

Real-World Dockerfile Examples

Node.js Application

dockerfile
FROM node:20-alpine AS deps WORKDIR /app COPY package.json package-lock.json ./ RUN npm ci --only=production FROM node:20-alpine AS builder WORKDIR /app COPY package.json package-lock.json ./ RUN npm ci COPY . . RUN npm run build FROM node:20-alpine AS runner WORKDIR /app ENV NODE_ENV=production # Don't run as root RUN addgroup --system --gid 1001 appgroup && \ adduser --system --uid 1001 appuser COPY --from=deps /app/node_modules ./node_modules COPY --from=builder /app/dist ./dist COPY --from=builder /app/package.json ./ USER appuser EXPOSE 3000 HEALTHCHECK --interval=30s --timeout=3s \ CMD wget --no-verbose --tries=1 --spider http://localhost:3000/health || exit 1 CMD ["node", "dist/server.js"]

Python Application

dockerfile
FROM python:3.12-slim AS builder WORKDIR /app # Install dependencies in a virtual env RUN python -m venv /opt/venv ENV PATH="/opt/venv/bin:$PATH" COPY requirements.txt . RUN pip install --no-cache-dir -r requirements.txt FROM python:3.12-slim WORKDIR /app # Copy virtual env from builder COPY --from=builder /opt/venv /opt/venv ENV PATH="/opt/venv/bin:$PATH" COPY . . RUN useradd --create-home appuser USER appuser EXPOSE 8000 CMD ["gunicorn", "app:create_app()", "--bind", "0.0.0.0:8000"]

Go Application

dockerfile
FROM golang:1.22-alpine AS builder WORKDIR /app COPY go.mod go.sum ./ RUN go mod download COPY . . RUN CGO_ENABLED=0 GOOS=linux go build -ldflags="-s -w" -o /server ./cmd/server # Scratch = empty image (smallest possible) FROM scratch COPY --from=builder /etc/ssl/certs/ca-certificates.crt /etc/ssl/certs/ COPY --from=builder /server /server EXPOSE 8080 ENTRYPOINT ["/server"]

Java/Spring Boot

dockerfile
FROM eclipse-temurin:21-jdk-alpine AS builder WORKDIR /app COPY . . RUN ./gradlew bootJar --no-daemon FROM eclipse-temurin:21-jre-alpine WORKDIR /app RUN addgroup -S app && adduser -S app -G app COPY --from=builder /app/build/libs/*.jar app.jar USER app EXPOSE 8080 ENTRYPOINT ["java", "-jar", "app.jar"]

Multi-Stage Builds

Multi-stage builds keep final images small by separating build-time dependencies from runtime:

dockerfile
# Stage 1: Build (has compilers, build tools) FROM node:20 AS builder WORKDIR /app COPY . . RUN npm ci && npm run build # This stage might be 1 GB+ # Stage 2: Production (only runtime) FROM node:20-alpine WORKDIR /app COPY --from=builder /app/dist ./dist COPY --from=builder /app/node_modules ./node_modules CMD ["node", "dist/server.js"] # This stage might be 150 MB

You can also copy from external images:

dockerfile
COPY --from=nginx:alpine /etc/nginx/nginx.conf /etc/nginx/

.dockerignore

Like .gitignore but for Docker build context:

# .dockerignore
node_modules
npm-debug.log
.git
.gitignore
.env
.env.*
Dockerfile
docker-compose*.yml
.dockerignore
README.md
.vscode
.idea
coverage
dist
*.md

Why it matters:

  • Reduces build context size (faster builds)
  • Prevents copying secrets into images
  • Avoids invalidating cache unnecessarily

Building Images

bash
# Basic build docker build -t myapp:v1 . # Build with specific Dockerfile docker build -t myapp:v1 -f Dockerfile.prod . # Build with build arguments docker build --build-arg NODE_ENV=development -t myapp:dev . # Build specific stage only docker build --target builder -t myapp:builder . # Build without cache docker build --no-cache -t myapp:v1 . # Build and push in one command (BuildKit) docker buildx build --push -t myuser/myapp:v1 . # Multi-platform build docker buildx build --platform linux/amd64,linux/arm64 -t myapp:v1 .

Managing Images

bash
# List local images docker images docker image ls # Pull an image docker pull nginx:1.25-alpine # Push to a registry docker tag myapp:v1 myuser/myapp:v1 docker push myuser/myapp:v1 # Remove an image docker rmi myapp:v1 docker image rm myapp:v1 # Remove all unused images docker image prune -a # Inspect image details docker inspect nginx:latest # Save/load images (for air-gapped environments) docker save myapp:v1 -o myapp.tar docker load -i myapp.tar # See image disk usage docker system df

Layer Caching Strategy

Order Dockerfile instructions from least changed to most changed:

dockerfile
# GOOD: Dependencies first, code last FROM node:20-alpine WORKDIR /app # 1. System deps (rarely change) RUN apk add --no-cache curl # 2. Package manifests (change occasionally) COPY package.json package-lock.json ./ # 3. Install deps (cached if manifests unchanged) RUN npm ci # 4. App code (changes frequently -- only this rebuilds) COPY . . RUN npm run build CMD ["node", "dist/server.js"]
dockerfile
# BAD: Copying everything first busts cache on any file change FROM node:20-alpine WORKDIR /app COPY . . # ← Any file change invalidates ALL layers below RUN npm ci RUN npm run build CMD ["node", "dist/server.js"]

FAANG Interview Angle

Common questions:

  1. "How do Docker image layers work?"
  2. "What's the difference between COPY and ADD?"
  3. "Explain multi-stage builds and why they matter"
  4. "How would you optimize a Docker image for production?"
  5. "What's the difference between CMD and ENTRYPOINT?"

Key answers:

  • Layers are cached and shared; order matters for cache efficiency
  • Use COPY (not ADD) unless you need tar extraction
  • Multi-stage builds separate build deps from runtime, reducing image size
  • Use alpine/slim bases, non-root user, specific tags, .dockerignore

Official Links