Docker and Containerization: The Complete Production Guide

Containers have become the standard unit of deployment for modern applications. Docker made containerization accessible, but the gap between a working Dockerfile and a production-grade container strategy is substantial. Poorly built images are bloated, insecure, and slow to deploy. Container networking and orchestration decisions made early on determine how easily your infrastructure scales. This guide covers the practices that separate reliable container deployments from fragile ones, from Dockerfile authoring to production orchestration.

Dockerfile Best Practices

A Dockerfile is deceptively simple. A few lines of configuration can produce a working image, but those same lines can also produce an image that is 2 GB, takes minutes to build, and contains known vulnerabilities. Disciplined Dockerfile authoring pays dividends at every stage of the deployment pipeline.

Order instructions by change frequency. Docker caches each layer, and a change to any layer invalidates all subsequent layers. Place instructions that change rarely (installing system packages) before those that change frequently (copying application code).

# Good: dependencies before source code
FROM node:20-slim
 
WORKDIR /app
COPY package.json package-lock.json ./
RUN npm ci --production
 
COPY . .
RUN npm run build
 
CMD ["node", "dist/server.js"]

Use specific base image tags. Never use latest in production Dockerfiles. Pin to a specific version (node:20.11-slim) or, better yet, pin to a digest (node@sha256:abc123...). This prevents builds from silently breaking when a base image is updated.

Minimize layer count and image size. Combine related RUN commands with && to reduce layers. Remove package manager caches in the same RUN instruction that installs packages. Use slim or distroless base images rather than full operating system images.

RUN apt-get update && \
    apt-get install -y --no-install-recommends curl ca-certificates && \
    rm -rf /var/lib/apt/lists/*

Run as a non-root user. By default, containers run as root, which is a security risk. Create a dedicated user and switch to it before the CMD instruction.

RUN addgroup --system appgroup && adduser --system --ingroup appgroup appuser
USER appuser

Use .dockerignore aggressively. Exclude node_modules, .git, test files, documentation, and any other files not needed in the runtime image. A smaller build context means faster builds and smaller images.

Multi-Stage Builds for Minimal Production Images

Multi-stage builds are the single most impactful Dockerfile optimization. They allow you to use a full build environment (compilers, dev dependencies, build tools) in one stage and copy only the compiled output to a minimal runtime image.

# Stage 1: Build
FROM node:20-slim AS builder
WORKDIR /app
COPY package.json package-lock.json ./
RUN npm ci
COPY . .
RUN npm run build
 
# Stage 2: Production
FROM node:20-slim AS production
WORKDIR /app
RUN addgroup --system app && adduser --system --ingroup app app
 
COPY --from=builder /app/dist ./dist
COPY --from=builder /app/node_modules ./node_modules
COPY --from=builder /app/package.json ./
 
USER app
EXPOSE 3000
CMD ["node", "dist/server.js"]

For compiled languages, the difference is even more dramatic. A Go application built with a multi-stage Dockerfile can produce a final image under 20 MB using scratch or distroless as the runtime base, compared to hundreds of megabytes with a single-stage build.

FROM golang:1.22 AS builder
WORKDIR /app
COPY go.mod go.sum ./
RUN go mod download
COPY . .
RUN CGO_ENABLED=0 GOOS=linux go build -o /server ./cmd/server
 
FROM gcr.io/distroless/static-debian12
COPY --from=builder /server /server
CMD ["/server"]

Multi-stage builds also improve security by reducing the attack surface. The production image contains no compilers, no package managers, and no build tools that an attacker could exploit.

Image Security Scanning

Every container image you deploy is a snapshot of an operating system and application dependencies, each of which may contain known vulnerabilities. Scanning is not optional for production workloads.

Trivy (by Aqua Security) is the most widely adopted open-source scanner. It checks OS packages, application dependencies, and IaC misconfigurations in a single tool.

# Scan a local image
trivy image myapp:latest
 
# Fail CI if critical vulnerabilities are found
trivy image --exit-code 1 --severity CRITICAL myapp:latest

Integrate scanning into CI/CD. Run scans on every image build. Block deployment of images with critical or high-severity vulnerabilities. Most CI platforms (GitHub Actions, GitLab CI) have Trivy integrations available as pre-built actions.

Scan base images separately. When Trivy reports vulnerabilities in the base OS layer, the fix is often to update the base image version. Automate base image updates with tools like Dependabot or Renovate.

Use distroless or minimal base images to reduce the number of packages that can have vulnerabilities. Google's distroless images contain only the application runtime and its dependencies, nothing else. Alpine-based images are another good option, though their use of musl libc can cause compatibility issues with some applications.

Sign and verify images with Cosign (part of the Sigstore project) to ensure that only images built by your CI pipeline are deployed. This prevents supply chain attacks where a compromised registry serves a tampered image.

Docker Compose for Development

Docker Compose provides a declarative way to define and run multi-container development environments. It ensures that every developer runs the same set of services with the same configuration.

# docker-compose.yml
services:
  app:
    build:
      context: .
      target: development
    ports:
      - "3000:3000"
    volumes:
      - .:/app
      - /app/node_modules
    environment:
      - DATABASE_URL=postgres://user:pass@db:5432/myapp
      - REDIS_URL=redis://cache:6379
    depends_on:
      db:
        condition: service_healthy
 
  db:
    image: postgres:16
    environment:
      POSTGRES_USER: user
      POSTGRES_PASSWORD: pass
      POSTGRES_DB: myapp
    volumes:
      - pgdata:/var/lib/postgresql/data
    healthcheck:
      test: ["CMD-SHELL", "pg_isready -U user"]
      interval: 5s
      timeout: 3s
      retries: 5
 
  cache:
    image: redis:7-alpine
 
volumes:
  pgdata:

Key practices for development Compose files: use volume mounts for live code reloading, define health checks so services start in the correct order, separate development and production Compose files using profiles or override files, and store environment-specific configuration in .env files that are excluded from version control.

Container Networking and Production Orchestration

Container networking in development is straightforward: Docker Compose creates a bridge network and services reference each other by name. Production networking is more complex and depends on your orchestration platform.

Amazon ECS (Elastic Container Service) is the pragmatic choice for teams running on AWS who want container orchestration without the operational complexity of Kubernetes. ECS with Fargate provides serverless container execution where you define CPU and memory requirements and AWS handles the underlying infrastructure. Use ECS when you have a moderate number of services (under 50), your team does not have Kubernetes expertise, and you want tight integration with AWS services (ALB, CloudMap, IAM).

Kubernetes provides the most powerful and flexible container orchestration platform. It excels when you need advanced deployment strategies (canary, blue-green), run at significant scale (hundreds of services), need multi-cloud portability, or have workloads that benefit from the Kubernetes ecosystem (service mesh, custom operators, GitOps). The tradeoff is operational complexity, even managed Kubernetes (EKS, GKE, AKS) requires significant expertise to run well.

A decision framework:

Factor	ECS/Fargate	Kubernetes
Operational complexity	Low	High
Team expertise required	AWS fundamentals	K8s + cloud
Scaling flexibility	Good	Excellent
Ecosystem/tooling	AWS-native	Massive open-source
Multi-cloud	No	Yes
Cost at small scale	Lower	Higher (control plane)

For most startups and mid-size teams on AWS, ECS with Fargate is the right starting point. Graduate to Kubernetes when scale, multi-cloud requirements, or workload complexity demands it.

Registry Management

A container registry stores, distributes, and manages your container images. In production, registry management involves more than pushing and pulling images.

Use a private registry rather than Docker Hub for production images. AWS ECR, Google Artifact Registry, and GitHub Container Registry offer private registries with IAM integration, vulnerability scanning, and geographic replication.

Implement an image lifecycle policy to automatically delete old, untagged images. Without a lifecycle policy, registry storage costs grow unbounded as every CI build pushes a new image.

Tag images meaningfully. Use the git SHA for unique identification and semantic version tags for releases. Never deploy latest in production. A clear tagging strategy enables fast rollbacks: if v2.3.1 has a bug, deploy v2.3.0 with confidence that you are getting exactly the image you expect.

docker build -t myapp:abc1234 -t myapp:v2.3.1 .
docker push myapp:abc1234
docker push myapp:v2.3.1