πŸ“¦ Dockerizing Local LLMs with Ollama – Challenges on macOS (Part 4)

Sat Jan 17 2026

πŸ“¦ Dockerizing Local LLMs with Ollama – Challenges on macOS (Part 4)

Containerizing FastAPI and Ollama β€” Real‑World Challenges, Especially on macOS


🧠 Series Context

This is Part 4 of our AI engineering series where we are building a local LLaMA-powered AI Wish Generator.

So far we have covered:

In this post we move into one of the most misunderstood areas of AI development:

Containerizing local LLMs and preparing them for cloud deployment.


πŸ“˜ Series Roadmap


| Part       | Topic                             
| ---------- | ----------------------------------
| Part 1     | Architecture & fundamentals        
| Part 2     | Backend & prompts                  
| Part 3     | Frontend & UX                      
| Part 4     | Dockerization & cloud shipping.    
| Part 5     | AI Engineering Lessons from Building a Local LLM App


🎯 Objective of Part 4

By the end of this post you will understand:

  • Why Docker is essential for AI apps
  • How to containerize FastAPI cleanly
  • How to containerize Ollama + LLaMA
  • Why macOS causes unique problems
  • How to fix networking, IPv6 and timeout issues
  • How the same setup works flawlessly on Linux
  • How this architecture can be shipped to the cloud

🧠 Why Containerization Matters in AI

AI systems are not single applications.

They consist of:

  • Backend services
  • LLM runtimes
  • Model weights (GBs)
  • Networking layers
  • GPU / CPU dependencies

Docker provides:

  • Environment consistency
  • Easy deployment
  • Version control for infrastructure
  • Predictable runtime behavior

Without containers, AI apps become fragile.


🧱 Container Architecture

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚     Next.js Frontend     β”‚
β”‚        (optional)        β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β–²β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
             β”‚
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€-─┐
β”‚      FastAPI Backend     β”‚
β”‚        (Docker)          β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β–²β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
             β”‚ HTTP
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€-─────┐
β”‚        Ollama LLM        β”‚
β”‚     (Docker container)   β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β–²β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
             β”‚
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄-────────────┐
β”‚      LLaMA Model Files   β”‚
β”‚        (volume)          β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Each component runs independently.


🐳 Dockerizing the FastAPI Backend

Dockerfile

FROM python:3.11-slim

WORKDIR /app

RUN apt-get update \
    && apt-get install -y curl \
    && rm -rf /var/lib/apt/lists/*

COPY requirements.txt .

RUN pip install --no-cache-dir -r requirements.txt

COPY app ./app

EXPOSE 8000

CMD ["uvicorn", "app.main:app", "--host", "0.0.0.0", "--port", "8000"]

Why this works well

  • Lightweight image
  • Predictable Python version
  • No OS dependency
  • Clean startup

πŸ¦™ Dockerizing Ollama

Ollama provides an official image:

image: ollama/ollama:latest

This container includes:

  • Ollama runtime
  • Model manager
  • REST API server

Models are stored using volumes.


πŸ“„ Docker Compose Setup

version: "3.9"

services:

  ollama:
    image: ollama/ollama:latest
    container_name: ollama
    ports:
      - "11434:11434"
    volumes:
      - ollama_data:/root/.ollama
    environment:
      - OLLAMA_HOST=0.0.0.0:11434

  api:
    build: .
    container_name: wish-api
    ports:
      - "8000:8000"
    depends_on:
      - ollama

volumes:
  ollama_data:

This setup works perfectly on Linux servers.


⚠️ The macOS Problem (Important)

If you are developing on macOS, you will almost certainly face issues.

Common errors

  • Connection timeout to Ollama
  • curl http://ollama:11434 fails inside container
  • Ollama listening on [::]:11434
  • IPv6 binding only
  • Docker DNS resolution failures

Example log:

Listening on [::]:11434

This single line causes hours of debugging.


❌ Why Ollama Breaks on macOS Docker

macOS uses:

  • Docker Desktop
  • Linux VM under the hood

Problems arise because:

  1. Ollama binds to IPv6 (::)
  2. Docker bridge networking uses IPv4
  3. IPv6 β†’ IPv4 routing fails silently

Result:

Containers cannot talk to Ollama even though it is running.


βœ… Workaround That Actually Works

Instead of container-to-container communication:

Run Ollama on the host

ollama serve

Then let Docker containers access it via:

host.docker.internal

Backend configuration

OLLAMA_URL = "http://host.docker.internal:11434/api/generate"

This bypasses Docker networking completely.


🧠 Why This Is Not a Hack

This pattern is officially documented by:

  • Docker Desktop
  • LangChain
  • LlamaIndex
  • Ollama community

It is the recommended approach on macOS.


🐧 Why Linux Has No Issues

On Linux:

  • Docker runs natively
  • No VM layer
  • Proper IPv4 networking
  • GPU access supported

Containerized Ollama works perfectly.

This is why cloud deployments are stable.


☁️ Shipping to Cloud

Once containerized, deployment becomes trivial.

Supported platforms:

  • AWS EC2
  • Azure VM
  • GCP Compute Engine
  • DigitalOcean
  • Hetzner

Steps:

docker compose up -d
ollama pull llama3

That’s it.


πŸš€ GPU Acceleration (Optional)

On Linux servers with NVIDIA GPUs:

runtime: nvidia
environment:
  - NVIDIA_VISIBLE_DEVICES=all

This enables:

  • Faster inference
  • Larger models
  • Lower latency

macOS does not support GPU containers.


πŸ”„ Final Deployment Architecture

Internet
   ↓
NGINX / Load Balancer
   ↓
FastAPI (Docker)
   ↓
Ollama (Docker)
   ↓
LLaMA Model

The same design scales horizontally.


⚑ Key Engineering Learnings

  • Dockerizing AI apps is harder than web apps
  • LLM runtimes behave differently across OS
  • macOS is for development, not hosting
  • Linux is the true AI deployment platform

Understanding this saves days of debugging.