Skip to main content

Health Checks and Logging

A running container is not necessarily a healthy container. Your web server might be up but unable to reach the database. Your worker process might be stuck in a crash loop. Docker health checks let you define exactly what "healthy" means for your application, and orchestrators like Docker Swarm and Kubernetes use that information to make routing and restart decisions. Logging is the companion story: once your containers are running, you need a way to understand what they are doing.


The HEALTHCHECK Instruction

The HEALTHCHECK instruction in a Dockerfile defines a command Docker runs periodically to probe whether the container is working correctly. If the command exits with status 0, the container is healthy. If it exits with 1, the container is unhealthy.

Syntax

HEALTHCHECK [OPTIONS] CMD <command>

# Disable any inherited health check from the base image
HEALTHCHECK NONE

Options

OptionDefaultDescription
--interval30sHow often to run the health check
--timeout30sTime to wait for a check to complete before marking it failed
--start-period0sGrace period for the container to start before checks count as failures
--start-interval5sHow often to run the check during the start period
--retries3How many consecutive failures before marking unhealthy

Examples

HTTP service health check:

FROM node:20-alpine

WORKDIR /app
COPY . .
RUN npm ci --omit=dev

EXPOSE 3000

HEALTHCHECK --interval=30s --timeout=10s --start-period=30s --retries=3 \
CMD wget -qO- http://localhost:3000/health || exit 1

CMD ["node", "src/index.js"]

PostgreSQL health check:

FROM postgres:16

HEALTHCHECK --interval=10s --timeout=5s --start-period=30s --retries=5 \
CMD pg_isready -U "$POSTGRES_USER" -d "$POSTGRES_DB" || exit 1

Redis health check:

FROM redis:7-alpine

HEALTHCHECK --interval=5s --timeout=3s --start-period=10s --retries=3 \
CMD redis-cli ping | grep -q PONG || exit 1

Exposing a /health endpoint in Node.js

Your application should expose a dedicated health endpoint that checks its own dependencies:

// src/health.js
const express = require('express');
const { Pool } = require('pg');

const router = express.Router();
const pool = new Pool({ connectionString: process.env.DATABASE_URL });

router.get('/health', async (req, res) => {
try {
await pool.query('SELECT 1');
res.json({
status: 'healthy',
timestamp: new Date().toISOString(),
uptime: process.uptime(),
});
} catch (err) {
res.status(503).json({
status: 'unhealthy',
error: err.message,
});
}
});

module.exports = router;

A health endpoint that only checks if the HTTP server is up (but not whether the database is reachable) gives you a false sense of security. Check actual dependencies.


Inspecting Container Health

# See health status in docker ps
docker ps
# CONTAINER ID IMAGE STATUS
# abc123 my-app Up 2 minutes (healthy)
# def456 my-app Up 30 seconds (starting)
# ghi789 my-app Up 5 minutes (unhealthy)

# Detailed health information via inspect
docker inspect my-app | jq '.[0].State.Health'

docker inspect JSON output for health:

{
"Status": "healthy",
"FailingStreak": 0,
"Log": [
{
"Start": "2024-11-15T10:00:30.123Z",
"End": "2024-11-15T10:00:30.145Z",
"ExitCode": 0,
"Output": "{\"status\":\"healthy\"}"
}
]
}

The Log array contains the last five health check results. When debugging an unhealthy container, this is the first place to look.

# Quick one-liner to get health status
docker inspect --format '{{.State.Health.Status}}' my-app

# Watch health status change (useful during startup)
watch -n 2 'docker inspect --format "{{.State.Health.Status}}" my-app'

Health Checks in Docker Compose

services:
api:
build: .
depends_on:
db:
condition: service_healthy
healthcheck:
test: ["CMD", "wget", "-qO-", "http://localhost:3000/health"]
interval: 30s
timeout: 10s
start_period: 30s
retries: 3

db:
image: postgres:16
environment:
POSTGRES_PASSWORD: secret
healthcheck:
test: ["CMD-SHELL", "pg_isready -U postgres"]
interval: 10s
timeout: 5s
start_period: 20s
retries: 5

With condition: service_healthy, Compose waits for db to pass its health check before starting api. This solves the classic race condition where your application tries to connect to a database before it is ready.


Docker Logs

Every container captures the stdout and stderr of its main process. The docker logs command reads those captured streams.

# Print all logs
docker logs my-app

# Stream logs (like tail -f)
docker logs -f my-app

# Last N lines
docker logs --tail 100 my-app

# Logs since a time
docker logs --since 1h my-app
docker logs --since "2024-11-15T10:00:00" my-app

# Logs until a time
docker logs --until 30m my-app

# Include timestamps
docker logs -t my-app

# Combine flags
docker logs -f -t --tail 50 my-app

Logs in Docker Compose

# All services
docker compose logs -f

# A specific service
docker compose logs -f api

# Last 50 lines from all services
docker compose logs --tail 50

Log Drivers

By default, Docker captures container logs using the json-file driver, which stores logs as JSON on disk at /var/lib/docker/containers/<id>/<id>-json.log. This is convenient for docker logs, but it has limitations: no log rotation by default, no forwarding, and disk can fill up with verbose services.

Available log drivers

DriverDescription
json-fileDefault. Stores JSON on disk. Supports docker logs.
localCompressed binary format. More efficient than json-file, supports docker logs.
syslogForwards to syslog daemon on the host
journaldForwards to systemd journal
gelfSends to a Graylog Extended Log Format endpoint
fluentdForwards to Fluentd
awslogsSends to AWS CloudWatch Logs
gcplogsSends to Google Cloud Logging
splunkSends to Splunk
noneDisables logging (no docker logs)

Configuring the log driver globally

Edit /etc/docker/daemon.json:

{
"log-driver": "local",
"log-opts": {
"max-size": "10m",
"max-file": "3"
}
}

This configures log rotation: each log file caps at 10 MB, and Docker keeps the last 3 files (30 MB max per container). Restart the daemon after changes:

sudo systemctl restart docker

Configuring the log driver per container

docker run -d \
--log-driver json-file \
--log-opt max-size=10m \
--log-opt max-file=3 \
--name my-app \
my-app:latest

Sending logs to AWS CloudWatch

docker run -d \
--log-driver awslogs \
--log-opt awslogs-region=eu-west-1 \
--log-opt awslogs-group=/my-app/production \
--log-opt awslogs-stream=api-$(date +%Y%m%d) \
--name my-app \
my-app:latest

Note: when you use a non-json-file or non-local driver, docker logs stops working because logs are no longer stored locally. This is a common gotcha — you must use your log aggregation system (CloudWatch console, Graylog, etc.) to view them.


Structured Logging Best Practices

For logs to be useful at scale, they must be structured (machine-readable) and consistent.

Use JSON log format in your application

// Using pino (a popular Node.js structured logger)
const pino = require('pino');
const logger = pino({
level: process.env.LOG_LEVEL || 'info',
// In production, output plain JSON
// In development, use pino-pretty for human-readable output
transport: process.env.NODE_ENV === 'development'
? { target: 'pino-pretty' }
: undefined,
});

// Log with structured fields
logger.info({ requestId: '123', userId: 'u-456', duration: 42 }, 'Request completed');
logger.error({ err, requestId: '123' }, 'Database query failed');

JSON output (what Docker captures):

{"level":30,"time":1700042400000,"requestId":"123","userId":"u-456","duration":42,"msg":"Request completed"}

Log to stdout/stderr, not files

Containers are ephemeral. If your application writes logs to a file inside the container, those logs are lost when the container is removed. Always log to stdout (info/debug) and stderr (errors), and let Docker's log driver handle collection and forwarding.

// Good — logs go to stdout where Docker captures them
console.log(JSON.stringify({ level: 'info', message: 'Server started' }));

// Bad — logs go to a file in the ephemeral container filesystem
fs.appendFileSync('/var/log/app.log', 'Server started\n');

Recommended fields for every log entry

FieldDescription
timestampISO 8601 timestamp
leveldebug, info, warn, error, fatal
messageHuman-readable description
requestIdUnique identifier to trace a request across logs
serviceService name (useful in aggregated logs)
environmentproduction, staging, etc.
err.messageError message if applicable
err.stackStack trace if applicable

Centralised Log Aggregation

For production environments with multiple containers, centralised logging is essential. Common stacks:

EFK Stack (Elasticsearch + Fluentd + Kibana):

services:
app:
image: my-app:latest
logging:
driver: fluentd
options:
fluentd-address: localhost:24224
tag: my-app.{{.Name}}

Loki + Promtail + Grafana (lightweight alternative):

services:
promtail:
image: grafana/promtail:latest
volumes:
- /var/lib/docker/containers:/var/lib/docker/containers:ro
- /var/run/docker.sock:/var/run/docker.sock
- ./promtail-config.yml:/etc/promtail/config.yml
command: -config.file=/etc/promtail/config.yml

Loki with Promtail is particularly popular for Docker environments because Promtail can scrape Docker container logs directly via the Docker socket.


Summary

GoalCommand / Technique
Add a health checkHEALTHCHECK CMD ... in Dockerfile
See container health statusdocker ps or docker inspect --format '{{.State.Health.Status}}'
Stream logsdocker logs -f <container>
Rotate logs automaticallySet max-size and max-file in daemon.json or per container
Ship logs to CloudWatch--log-driver awslogs
Structured application logsLog JSON to stdout using a library like pino or winston
Health-aware startup orderingdepends_on: condition: service_healthy in Compose

Observability is not optional in production. Health checks and structured logs are the minimum you need to know whether your application is running correctly and to diagnose problems when it is not.