03 - Docker Images & Dockerfile

What Is a Docker Image?

An image is a read-only template containing:

A minimal OS filesystem (e.g., Alpine, Debian)
Your application code
Dependencies and libraries
Environment variables and configuration
A default command to run

Think of it as a snapshot of everything your app needs to run.

Image Layers

Every instruction in a Dockerfile creates a layer:

┌─────────────────────────────┐
│  CMD ["node", "server.js"]  │  Layer 5 (metadata only)
├─────────────────────────────┤
│  COPY . /app                │  Layer 4 (~5 MB)
├─────────────────────────────┤
│  RUN npm install            │  Layer 3 (~50 MB)
├─────────────────────────────┤
│  COPY package*.json /app/   │  Layer 2 (~1 KB)
├─────────────────────────────┤
│  FROM node:20-alpine        │  Layer 1 (~180 MB) - base image
└─────────────────────────────┘

Key concepts:

Layers are cached -- unchanged layers are reused
Layers are shared between images (saves disk space)
Only the top container layer is writable (copy-on-write)

bash
# See layers of an image
docker history nginx

# See detailed layer info
docker inspect nginx | jq '.[0].RootFS.Layers'

Image Naming Convention

[registry/][namespace/]repository[:tag|@digest]

Examples:
nginx                          → docker.io/library/nginx:latest
nginx:1.25-alpine              → docker.io/library/nginx:1.25-alpine
myuser/myapp:v2.1              → docker.io/myuser/myapp:v2.1
gcr.io/project/api:abc123      → gcr.io/project/api:abc123
ghcr.io/org/service@sha256:... → pinned by content digest

Best practice: Always use specific tags, never rely on :latest in production.

Dockerfile Basics

A Dockerfile is a text file with instructions to build an image.

Complete Instruction Reference

dockerfile
# ---------- Base Image ----------
FROM node:20-alpine AS builder
# Always start with FROM. Use specific tags, not :latest.
# AS names this stage for multi-stage builds.

# ---------- Metadata ----------
LABEL maintainer="you@example.com"
LABEL version="1.0"
LABEL description="My Node.js application"

# ---------- Arguments ----------
ARG NODE_ENV=production
# Build-time variables. NOT available at runtime.
# Override with: docker build --build-arg NODE_ENV=development

# ---------- Environment ----------
ENV NODE_ENV=${NODE_ENV}
ENV PORT=3000
# Available at build time AND runtime.
# Override at runtime: docker run -e PORT=8080

# ---------- Working Directory ----------
WORKDIR /app
# Sets the working directory for subsequent commands.
# Creates it if it doesn't exist. Prefer over "RUN mkdir && cd".

# ---------- Copy Files ----------
COPY package*.json ./
# Copies from build context to image.
# Respects .dockerignore

ADD https://example.com/file.tar.gz /tmp/
# Like COPY but can also:
# - Extract tar archives automatically
# - Download URLs
# Prefer COPY unless you need these features.

# ---------- Run Commands ----------
RUN npm ci --only=production
# Executes during build. Each RUN creates a new layer.

# Shell form (runs in /bin/sh -c):
RUN echo "hello"

# Exec form (no shell, preferred):
RUN ["npm", "ci", "--only=production"]

# ---------- Expose Ports ----------
EXPOSE 3000
# Documentation only! Doesn't actually publish the port.
# You still need -p 3000:3000 at runtime.

# ---------- Volumes ----------
VOLUME ["/data"]
# Creates a mount point. Data here persists beyond container life.

# ---------- User ----------
USER node
# Switch to non-root user. Critical for security.

# ---------- Health Check ----------
HEALTHCHECK --interval=30s --timeout=3s --retries=3 \
  CMD curl -f http://localhost:3000/health || exit 1
# Docker will check container health periodically.

# ---------- Entry Point ----------
ENTRYPOINT ["node"]
# The main executable. Hard to override (need --entrypoint flag).

# ---------- Default Command ----------
CMD ["server.js"]
# Default arguments to ENTRYPOINT.
# Easy to override: docker run myimage other.js

# Combined: ENTRYPOINT + CMD = node server.js

ENTRYPOINT vs CMD

dockerfile
# CMD only -- easy to override entirely
CMD ["node", "server.js"]
# docker run myimage             → node server.js
# docker run myimage bash        → bash (CMD replaced)

# ENTRYPOINT only -- always runs this executable
ENTRYPOINT ["node"]
# docker run myimage             → node
# docker run myimage server.js   → node server.js

# ENTRYPOINT + CMD -- best of both worlds
ENTRYPOINT ["node"]
CMD ["server.js"]
# docker run myimage             → node server.js
# docker run myimage test.js     → node test.js (CMD replaced)
# docker run --entrypoint bash myimage  → bash (override ENTRYPOINT)

Real-World Dockerfile Examples

Node.js Application

dockerfile
FROM node:20-alpine AS deps
WORKDIR /app
COPY package.json package-lock.json ./
RUN npm ci --only=production

FROM node:20-alpine AS builder
WORKDIR /app
COPY package.json package-lock.json ./
RUN npm ci
COPY . .
RUN npm run build

FROM node:20-alpine AS runner
WORKDIR /app
ENV NODE_ENV=production

# Don't run as root
RUN addgroup --system --gid 1001 appgroup && \
    adduser --system --uid 1001 appuser
    
COPY --from=deps /app/node_modules ./node_modules
COPY --from=builder /app/dist ./dist
COPY --from=builder /app/package.json ./

USER appuser
EXPOSE 3000

HEALTHCHECK --interval=30s --timeout=3s \
  CMD wget --no-verbose --tries=1 --spider http://localhost:3000/health || exit 1

CMD ["node", "dist/server.js"]

Python Application

dockerfile
FROM python:3.12-slim AS builder
WORKDIR /app

# Install dependencies in a virtual env
RUN python -m venv /opt/venv
ENV PATH="/opt/venv/bin:$PATH"

COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt

FROM python:3.12-slim
WORKDIR /app

# Copy virtual env from builder
COPY --from=builder /opt/venv /opt/venv
ENV PATH="/opt/venv/bin:$PATH"

COPY . .

RUN useradd --create-home appuser
USER appuser

EXPOSE 8000
CMD ["gunicorn", "app:create_app()", "--bind", "0.0.0.0:8000"]

Go Application

dockerfile
FROM golang:1.22-alpine AS builder
WORKDIR /app

COPY go.mod go.sum ./
RUN go mod download

COPY . .
RUN CGO_ENABLED=0 GOOS=linux go build -ldflags="-s -w" -o /server ./cmd/server

# Scratch = empty image (smallest possible)
FROM scratch
COPY --from=builder /etc/ssl/certs/ca-certificates.crt /etc/ssl/certs/
COPY --from=builder /server /server

EXPOSE 8080
ENTRYPOINT ["/server"]

Java/Spring Boot

dockerfile
FROM eclipse-temurin:21-jdk-alpine AS builder
WORKDIR /app
COPY . .
RUN ./gradlew bootJar --no-daemon

FROM eclipse-temurin:21-jre-alpine
WORKDIR /app

RUN addgroup -S app && adduser -S app -G app
COPY --from=builder /app/build/libs/*.jar app.jar

USER app
EXPOSE 8080

ENTRYPOINT ["java", "-jar", "app.jar"]

Multi-Stage Builds

Multi-stage builds keep final images small by separating build-time dependencies from runtime:

dockerfile
# Stage 1: Build (has compilers, build tools)
FROM node:20 AS builder
WORKDIR /app
COPY . .
RUN npm ci && npm run build
# This stage might be 1 GB+

# Stage 2: Production (only runtime)
FROM node:20-alpine
WORKDIR /app
COPY --from=builder /app/dist ./dist
COPY --from=builder /app/node_modules ./node_modules
CMD ["node", "dist/server.js"]
# This stage might be 150 MB

You can also copy from external images:

dockerfile
COPY --from=nginx:alpine /etc/nginx/nginx.conf /etc/nginx/

.dockerignore

Like .gitignore but for Docker build context:

# .dockerignore
node_modules
npm-debug.log
.git
.gitignore
.env
.env.*
Dockerfile
docker-compose*.yml
.dockerignore
README.md
.vscode
.idea
coverage
dist
*.md

Why it matters:

Reduces build context size (faster builds)
Prevents copying secrets into images
Avoids invalidating cache unnecessarily

Building Images

bash
# Basic build
docker build -t myapp:v1 .

# Build with specific Dockerfile
docker build -t myapp:v1 -f Dockerfile.prod .

# Build with build arguments
docker build --build-arg NODE_ENV=development -t myapp:dev .

# Build specific stage only
docker build --target builder -t myapp:builder .

# Build without cache
docker build --no-cache -t myapp:v1 .

# Build and push in one command (BuildKit)
docker buildx build --push -t myuser/myapp:v1 .

# Multi-platform build
docker buildx build --platform linux/amd64,linux/arm64 -t myapp:v1 .

Managing Images

bash
# List local images
docker images
docker image ls

# Pull an image
docker pull nginx:1.25-alpine

# Push to a registry
docker tag myapp:v1 myuser/myapp:v1
docker push myuser/myapp:v1

# Remove an image
docker rmi myapp:v1
docker image rm myapp:v1

# Remove all unused images
docker image prune -a

# Inspect image details
docker inspect nginx:latest

# Save/load images (for air-gapped environments)
docker save myapp:v1 -o myapp.tar
docker load -i myapp.tar

# See image disk usage
docker system df

Layer Caching Strategy

Order Dockerfile instructions from least changed to most changed:

dockerfile
# GOOD: Dependencies first, code last
FROM node:20-alpine
WORKDIR /app

# 1. System deps (rarely change)
RUN apk add --no-cache curl

# 2. Package manifests (change occasionally)
COPY package.json package-lock.json ./

# 3. Install deps (cached if manifests unchanged)
RUN npm ci

# 4. App code (changes frequently -- only this rebuilds)
COPY . .

RUN npm run build
CMD ["node", "dist/server.js"]

dockerfile
# BAD: Copying everything first busts cache on any file change
FROM node:20-alpine
WORKDIR /app
COPY . .              # ← Any file change invalidates ALL layers below
RUN npm ci
RUN npm run build
CMD ["node", "dist/server.js"]

FAANG Interview Angle

Common questions:

"How do Docker image layers work?"
"What's the difference between COPY and ADD?"
"Explain multi-stage builds and why they matter"
"How would you optimize a Docker image for production?"
"What's the difference between CMD and ENTRYPOINT?"

Key answers:

Layers are cached and shared; order matters for cache efficiency
Use COPY (not ADD) unless you need tar extraction
Multi-stage builds separate build deps from runtime, reducing image size
Use alpine/slim bases, non-root user, specific tags, .dockerignore