Deploy AI Models with Docker
Complete guide to containerizing and deploying AI models with Docker
Deploy Open-Source AI Models with Docker
Docker provides a consistent, portable way to deploy AI models across any environment.
Prerequisites
- Docker installed (20.10+)
- NVIDIA Docker runtime (for GPU support)
- Basic Docker knowledge
- Sufficient disk space (50GB+ recommended)
Basic Deployment
Step 1: Pull Pre-built Image
# vLLM (recommended for LLMs)
docker pull vllm/vllm-openai:latest
# Text Generation Inference
docker pull ghcr.io/huggingface/text-generation-inference:latest
# Ollama
docker pull ollama/ollama:latest
Step 2: Run Container
# Run LLaMA with vLLM
docker run --gpus all -p 8000:8000 -v ~/.cache/huggingface:/root/.cache/huggingface vllm/vllm-openai:latest --model meta-llama/Llama-3.1-70b --tensor-parallel-size 1
Custom Dockerfile
For LLM Deployment
FROM nvidia/cuda:12.1.0-runtime-ubuntu22.04
# Install Python and dependencies
RUN apt-get update && apt-get install -y python3.10 python3-pip git
# Install vLLM
RUN pip3 install vllm
# Download model (optional, can be done at runtime)
RUN python3 -c "from huggingface_hub import snapshot_download; snapshot_download('meta-llama/Llama-3.1-8B-Instruct')"
# Expose port
EXPOSE 8000
# Run vLLM server
CMD ["python3", "-m", "vllm.entrypoints.openai.api_server", "--model", "meta-llama/Llama-3.1-8B-Instruct"]
Build and Run
# Build image
docker build -t my-llama-model .
# Run container
docker run --gpus all -p 8000:8000 my-llama-model
Docker Compose
Multi-Service Setup
version: '3.8'
services:
llama-model:
image: vllm/vllm-openai:latest
command: --model meta-llama/Llama-3.1-70b-Instruct
ports:
- "8000:8000"
volumes:
- model-cache:/root/.cache/huggingface
deploy:
resources:
reservations:
devices:
- driver: nvidia
count: 1
capabilities: [gpu]
environment:
- HUGGING_FACE_TOKEN=your-huggingface-token
nginx:
image: nginx:latest
ports:
- "80:80"
volumes:
- ./nginx.conf:/etc/nginx/nginx.conf
depends_on:
- llama-model
volumes:
model-cache:
Run with Compose
docker-compose up -d
Optimization
Multi-Stage Builds
# Build stage
FROM python:3.10 AS builder
WORKDIR /app
COPY requirements.txt .
RUN pip install --user -r requirements.txt
# Runtime stage
FROM python:3.10-slim
WORKDIR /app
COPY --from=builder /root/.local /root/.local
COPY . .
ENV PATH=/root/.local/bin:$PATH
CMD ["python", "app.py"]
Layer Caching
- Order Dockerfile commands from least to most frequently changing
- Use .dockerignore to exclude unnecessary files
- Leverage build cache with --cache-from
Resource Limits
docker run --gpus all --memory="16g" --cpus="4" --shm-size="8g" -p 8000:8000 vllm/vllm-openai:latest
Monitoring
Health Checks
HEALTHCHECK --interval=30s --timeout=10s --start-period=60s --retries=3 CMD curl -f http://localhost:8000/health || exit 1
Logging
# View logs
docker logs -f container-name
# Configure logging driver
docker run --log-driver=json-file --log-opt max-size=10m --log-opt max-file=3 ...
Resource Monitoring
# Monitor container stats
docker stats
# Inspect container
docker inspect container-name
Security
Best Practices
- Use official base images
- Run as non-root user
RUN useradd -m -u 1000 appuser
USER appuser
- Scan for vulnerabilities
docker scan my-image:latest
- Use secrets management
docker secret create hf_token token.txt
docker service create --secret hf_token my-service
Troubleshooting
GPU Not Detected
# Check NVIDIA Docker runtime
docker run --rm --gpus all nvidia/cuda:12.1.0-base-ubuntu22.04 nvidia-smi
# Reinstall NVIDIA Container Toolkit if needed
distribution=$(. /etc/os-release;echo $ID$VERSION_ID)
curl -s -L https://nvidia.github.io/nvidia-docker/gpgkey | sudo apt-key add -
curl -s -L https://nvidia.github.io/nvidia-docker/$distribution/nvidia-docker.list | sudo tee /etc/apt/sources.list.d/nvidia-docker.list
sudo apt-get update && sudo apt-get install -y nvidia-container-toolkit
sudo systemctl restart docker
Out of Memory
- Increase --shm-size
- Reduce batch size
- Use model quantization
- Add swap space
Slow Performance
- Use volume mounts for model cache
- Enable GPU support
- Optimize batch size
- Use faster storage (SSD)
Production Checklist
- [ ] Use multi-stage builds
- [ ] Implement health checks
- [ ] Configure resource limits
- [ ] Set up logging
- [ ] Scan for vulnerabilities
- [ ] Use secrets management
- [ ] Implement restart policies
- [ ] Configure networking
- [ ] Set up monitoring
- [ ] Document deployment process
Related Guides
Deploy AI Models on AWS
Complete guide to deploying open-source AI models on Amazon Web Services
Deploy AI Models on Google Cloud Platform
Complete guide to deploying open-source AI models on GCP
Deploy AI Models on Microsoft Azure
Complete guide to deploying open-source AI models on Azure
Deploy AI Models with Docker
Complete guide to containerizing and deploying AI models with Docker