Health Checks and Logging

A running container is not necessarily a healthy container. Your web server might be up but unable to reach the database. Your worker process might be stuck in a crash loop. Docker health checks let you define exactly what "healthy" means for your application, and orchestrators like Docker Swarm and Kubernetes use that information to make routing and restart decisions. Logging is the companion story: once your containers are running, you need a way to understand what they are doing.

The HEALTHCHECK Instruction

The HEALTHCHECK instruction in a Dockerfile defines a command Docker runs periodically to probe whether the container is working correctly. If the command exits with status 0, the container is healthy. If it exits with 1, the container is unhealthy.

Syntax

HEALTHCHECK [OPTIONS] CMD <command>

# Disable any inherited health check from the base image
HEALTHCHECK NONE

Options

Option	Default	Description
`--interval`	`30s`	How often to run the health check
`--timeout`	`30s`	Time to wait for a check to complete before marking it failed
`--start-period`	`0s`	Grace period for the container to start before checks count as failures
`--start-interval`	`5s`	How often to run the check during the start period
`--retries`	`3`	How many consecutive failures before marking unhealthy

Examples

HTTP service health check:

FROM node:20-alpine

WORKDIR /app
COPY . .
RUN npm ci --omit=dev

EXPOSE 3000

HEALTHCHECK --interval=30s --timeout=10s --start-period=30s --retries=3 \
  CMD wget -qO- http://localhost:3000/health || exit 1

CMD ["node", "src/index.js"]

PostgreSQL health check:

FROM postgres:16

HEALTHCHECK --interval=10s --timeout=5s --start-period=30s --retries=5 \
  CMD pg_isready -U "$POSTGRES_USER" -d "$POSTGRES_DB" || exit 1

Redis health check:

FROM redis:7-alpine

HEALTHCHECK --interval=5s --timeout=3s --start-period=10s --retries=3 \
  CMD redis-cli ping | grep -q PONG || exit 1

Exposing a /health endpoint in Node.js

Your application should expose a dedicated health endpoint that checks its own dependencies:

// src/health.js
const express = require('express');
const { Pool } = require('pg');

const router = express.Router();
const pool = new Pool({ connectionString: process.env.DATABASE_URL });

router.get('/health', async (req, res) => {
  try {
    await pool.query('SELECT 1');
    res.json({
      status: 'healthy',
      timestamp: new Date().toISOString(),
      uptime: process.uptime(),
    });
  } catch (err) {
    res.status(503).json({
      status: 'unhealthy',
      error: err.message,
    });
  }
});

module.exports = router;

A health endpoint that only checks if the HTTP server is up (but not whether the database is reachable) gives you a false sense of security. Check actual dependencies.

Inspecting Container Health

# See health status in docker ps
docker ps
# CONTAINER ID   IMAGE     STATUS
# abc123         my-app    Up 2 minutes (healthy)
# def456         my-app    Up 30 seconds (starting)
# ghi789         my-app    Up 5 minutes (unhealthy)

# Detailed health information via inspect
docker inspect my-app | jq '.[0].State.Health'

docker inspect JSON output for health:

{
  "Status": "healthy",
  "FailingStreak": 0,
  "Log": [
    {
      "Start": "2024-11-15T10:00:30.123Z",
      "End": "2024-11-15T10:00:30.145Z",
      "ExitCode": 0,
      "Output": "{\"status\":\"healthy\"}"
    }
  ]
}

The Log array contains the last five health check results. When debugging an unhealthy container, this is the first place to look.

# Quick one-liner to get health status
docker inspect --format '{{.State.Health.Status}}' my-app

# Watch health status change (useful during startup)
watch -n 2 'docker inspect --format "{{.State.Health.Status}}" my-app'

Health Checks in Docker Compose

services:
  api:
    build: .
    depends_on:
      db:
        condition: service_healthy
    healthcheck:
      test: ["CMD", "wget", "-qO-", "http://localhost:3000/health"]
      interval: 30s
      timeout: 10s
      start_period: 30s
      retries: 3

  db:
    image: postgres:16
    environment:
      POSTGRES_PASSWORD: secret
    healthcheck:
      test: ["CMD-SHELL", "pg_isready -U postgres"]
      interval: 10s
      timeout: 5s
      start_period: 20s
      retries: 5

With condition: service_healthy, Compose waits for db to pass its health check before starting api. This solves the classic race condition where your application tries to connect to a database before it is ready.

Docker Logs

Every container captures the stdout and stderr of its main process. The docker logs command reads those captured streams.

# Print all logs
docker logs my-app

# Stream logs (like tail -f)
docker logs -f my-app

# Last N lines
docker logs --tail 100 my-app

# Logs since a time
docker logs --since 1h my-app
docker logs --since "2024-11-15T10:00:00" my-app

# Logs until a time
docker logs --until 30m my-app

# Include timestamps
docker logs -t my-app

# Combine flags
docker logs -f -t --tail 50 my-app

Logs in Docker Compose

# All services
docker compose logs -f

# A specific service
docker compose logs -f api

# Last 50 lines from all services
docker compose logs --tail 50

Log Drivers

By default, Docker captures container logs using the json-file driver, which stores logs as JSON on disk at /var/lib/docker/containers/<id>/<id>-json.log. This is convenient for docker logs, but it has limitations: no log rotation by default, no forwarding, and disk can fill up with verbose services.

Available log drivers

Driver	Description
`json-file`	Default. Stores JSON on disk. Supports `docker logs`.
`local`	Compressed binary format. More efficient than `json-file`, supports `docker logs`.
`syslog`	Forwards to syslog daemon on the host
`journald`	Forwards to systemd journal
`gelf`	Sends to a Graylog Extended Log Format endpoint
`fluentd`	Forwards to Fluentd
`awslogs`	Sends to AWS CloudWatch Logs
`gcplogs`	Sends to Google Cloud Logging
`splunk`	Sends to Splunk
`none`	Disables logging (no `docker logs`)

Configuring the log driver globally

Edit /etc/docker/daemon.json:

{
  "log-driver": "local",
  "log-opts": {
    "max-size": "10m",
    "max-file": "3"
  }
}

This configures log rotation: each log file caps at 10 MB, and Docker keeps the last 3 files (30 MB max per container). Restart the daemon after changes:

sudo systemctl restart docker

Configuring the log driver per container

docker run -d \
  --log-driver json-file \
  --log-opt max-size=10m \
  --log-opt max-file=3 \
  --name my-app \
  my-app:latest

Sending logs to AWS CloudWatch

docker run -d \
  --log-driver awslogs \
  --log-opt awslogs-region=eu-west-1 \
  --log-opt awslogs-group=/my-app/production \
  --log-opt awslogs-stream=api-$(date +%Y%m%d) \
  --name my-app \
  my-app:latest

Note: when you use a non-json-file or non-local driver, docker logs stops working because logs are no longer stored locally. This is a common gotcha - you must use your log aggregation system (CloudWatch console, Graylog, etc.) to view them.

Structured Logging Best Practices

For logs to be useful at scale, they must be structured (machine-readable) and consistent.

Use JSON log format in your application

// Using pino (a popular Node.js structured logger)
const pino = require('pino');
const logger = pino({
  level: process.env.LOG_LEVEL || 'info',
  // In production, output plain JSON
  // In development, use pino-pretty for human-readable output
  transport: process.env.NODE_ENV === 'development'
    ? { target: 'pino-pretty' }
    : undefined,
});

// Log with structured fields
logger.info({ requestId: '123', userId: 'u-456', duration: 42 }, 'Request completed');
logger.error({ err, requestId: '123' }, 'Database query failed');

JSON output (what Docker captures):

{"level":30,"time":1700042400000,"requestId":"123","userId":"u-456","duration":42,"msg":"Request completed"}

Log to stdout/stderr, not files

Containers are ephemeral. If your application writes logs to a file inside the container, those logs are lost when the container is removed. Always log to stdout (info/debug) and stderr (errors), and let Docker's log driver handle collection and forwarding.

// Good - logs go to stdout where Docker captures them
console.log(JSON.stringify({ level: 'info', message: 'Server started' }));

// Bad - logs go to a file in the ephemeral container filesystem
fs.appendFileSync('/var/log/app.log', 'Server started\n');

Recommended fields for every log entry

Field	Description
`timestamp`	ISO 8601 timestamp
`level`	`debug`, `info`, `warn`, `error`, `fatal`
`message`	Human-readable description
`requestId`	Unique identifier to trace a request across logs
`service`	Service name (useful in aggregated logs)
`environment`	`production`, `staging`, etc.
`err.message`	Error message if applicable
`err.stack`	Stack trace if applicable

Centralised Log Aggregation

For production environments with multiple containers, centralised logging is essential. Common stacks:

EFK Stack (Elasticsearch + Fluentd + Kibana):

services:
  app:
    image: my-app:latest
    logging:
      driver: fluentd
      options:
        fluentd-address: localhost:24224
        tag: my-app.{{.Name}}

Loki + Promtail + Grafana (lightweight alternative):

services:
  promtail:
    image: grafana/promtail:latest
    volumes:
      - /var/lib/docker/containers:/var/lib/docker/containers:ro
      - /var/run/docker.sock:/var/run/docker.sock
      - ./promtail-config.yml:/etc/promtail/config.yml
    command: -config.file=/etc/promtail/config.yml

Loki with Promtail is particularly popular for Docker environments because Promtail can scrape Docker container logs directly via the Docker socket.

Summary

Goal	Command / Technique
Add a health check	`HEALTHCHECK CMD ...` in Dockerfile
See container health status	`docker ps` or `docker inspect --format '{{.State.Health.Status}}'`
Stream logs	`docker logs -f <container>`
Rotate logs automatically	Set `max-size` and `max-file` in daemon.json or per container
Ship logs to CloudWatch	`--log-driver awslogs`
Structured application logs	Log JSON to stdout using a library like pino or winston
Health-aware startup ordering	`depends_on: condition: service_healthy` in Compose

Observability is not optional in production. Health checks and structured logs are the minimum you need to know whether your application is running correctly and to diagnose problems when it is not.

The HEALTHCHECK Instruction​

Syntax​

Options​

Examples​

Exposing a /health endpoint in Node.js​

Inspecting Container Health​

Health Checks in Docker Compose​

Docker Logs​

Logs in Docker Compose​

Log Drivers​

Available log drivers​

Configuring the log driver globally​

Configuring the log driver per container​

Sending logs to AWS CloudWatch​

Structured Logging Best Practices​

Use JSON log format in your application​

Log to stdout/stderr, not files​

Recommended fields for every log entry​

Centralised Log Aggregation​

Summary​