⚙️ Building AI Backends with FastAPI & Prompt Engineering (Part 2)

Wed Jan 14 2026

⚙️ Building AI Backends with FastAPI & Prompt Engineering (Part 2)

FastAPI, Local LLaMA Integration & Prompt Engineering for Reliable AI Output


🧠 Blog Series Context

This is Part 2 of our hands-on AI engineering series where we build a local LLM–powered AI Wish Generator from scratch.

If you haven’t read Part 1 yet, start here:

👉 Part 1 – Architecture, Local LLMs & AI Engineering Fundamentals


📘 Series Breakdown


| Part       | Topic                                   
| ---------- | --------------------------------------- 
| Part 1     | Architecture, local LLMs & fundamentals 
| Part 2     | Backend & prompt engineering            
| Part 3     | Frontend with Next.js                   
| Part 4     | Dockerization & cloud shipping          
| Part 5     | AI Engineering Lessons from Building a Local LLM App     


🎯 Objective of Part 2

In this post we will focus on the core intelligence layer of the application:

  • Designing a clean backend API
  • Integrating local LLaMA via Ollama
  • Engineering prompts for predictable output
  • Enforcing emotional and platform guardrails
  • Generating multiple high‑quality wish options

This is where AI engineering actually happens.


🧱 Backend Architecture Overview

Client (UI)
   ↓ JSON
FastAPI Backend
   ↓ Prompt Builder
Ollama HTTP API
   ↓
Local LLaMA Model

The backend acts as:

  • Input validator
  • Prompt compiler
  • AI behavior controller
  • Output normalizer

The LLM never talks directly to the frontend.


🐍 Why Python + FastAPI?

FastAPI is ideal for AI systems because:

  • Extremely fast (built on Starlette)
  • Async-first architecture
  • Automatic Swagger documentation
  • Strong typing with Pydantic
  • Clean separation of concerns

Most production AI platforms use Python-based services because LLM tooling, embeddings, and orchestration frameworks are Python-native.


🧩 Creating the FastAPI Application

Let’s start with a minimal but production-friendly FastAPI setup.

main.py

from fastapi import FastAPI
from app.schemas import WishRequest
from app.prompt import build_prompt
from app.llm import generate_wishes

app = FastAPI(
    title="WishCraft AI Backend",
    version="1.0.0"
)


@app.post("/generate-wish")
def generate_wish(request: WishRequest):
    prompt = build_prompt(
        category=request.category,
        tone=request.tone,
        platform=request.platform,
        with_emojis=request.with_emojis,
    )

    wishes = generate_wishes(prompt)

    return {
        "suggestions": wishes
    }

🔍 What’s happening here?

  • FastAPI exposes a clean REST endpoint
  • User intent is accepted as structured JSON
  • Prompt creation is isolated from API logic
  • LLM invocation is abstracted into a separate layer

This keeps the backend testable and scalable.


📦 Request Validation with Pydantic

Instead of accepting raw text, we define a strict schema.

schemas.py

from pydantic import BaseModel


class WishRequest(BaseModel):
    category: str
    tone: str
    platform: str
    with_emojis: bool = True

✅ Benefits

  • Prevents malformed input
  • Eliminates prompt injection risk
  • Enables frontend auto-completion
  • Documents API automatically

Strong input typing is essential in AI systems.


📁 Backend Folder Structure

backend/
│
├── app/
│   ├── main.py        # API routes
│   ├── schemas.py     # Request validation
│   ├── prompt.py      # Prompt engineering logic
│   └── llm.py         # Ollama integration
│
└── requirements.txt

Each responsibility is isolated — a key engineering principle.


🧩 Designing the API Contract

The frontend sends structured intent — not raw text.

Example request

{
  "category": "Birthday",
  "tone": "Soft",
  "platform": "WhatsApp",
  "with_emojis": true
}

This structure allows:

  • Better validation
  • Controlled prompt injection
  • Predictable output

📦 Request Schema (Pydantic)

from pydantic import BaseModel

class WishRequest(BaseModel):
    category: str
    tone: str
    platform: str
    with_emojis: bool = True

Strong typing is critical when building AI pipelines.


🦙 Integrating Local LLaMA via Ollama

Ollama exposes local LLMs through a simple HTTP interface.

POST http://localhost:11434/api/generate

Example payload

{
  "model": "llama3",
  "prompt": "Your compiled prompt here",
  "stream": false
}

This makes LLMs behave like any internal microservice.


🔌 LLM Integration Layer

llm.py

import requests

OLLAMA_URL = "http://localhost:11434/api/generate"
MODEL = "llama3"


def generate_wishes(prompt: str) -> list[str]:
    response = requests.post(
        OLLAMA_URL,
        json={
            "model": MODEL,
            "prompt": prompt,
            "stream": False
        },
        timeout=120
    )

    output = response.json()["response"]

    wishes = [
        line.strip()
        for line in output.split("~")
        if line.strip()
    ]

    return wishes[:3]

🔍 Explanation

  • Ollama runs fully locally
  • No SDK or cloud dependency
  • Simple HTTP-based inference
  • Output is normalized before returning

This abstraction allows you to swap models easily later.


🧠 Prompt Engineering Is Not a String

Most beginner AI apps fail here.

❌ Bad prompt engineering:

Write a birthday wish.

This produces:

  • Inconsistent output
  • Uncontrolled tone
  • Emoji overload
  • Formatting breaks

✅ Structured Prompt Engineering

A production-grade prompt must behave like configuration.

prompt.py

def build_prompt(category, tone, platform, with_emojis):

    emoji_rule = (
        "Use platform-specific emoticons."
        if with_emojis
        else "Do not use emoticons."
    )

    return f"""
You are a professional human message writer.

Generate exactly 3 multi-line wishes.

Each wish must:
- Follow the selected tone
- Respect the platform writing style
- Use correct emoticon syntax

Platform rules:
- Slack → :tada: :rocket:
- WhatsApp → 😊🎉
- Instagram → ✨🌸

Occasion: {category}
Tone: {tone}
Platform: {platform}
Emoji rule: {emoji_rule}

Return only the wishes separated by ~ lines.
"""

🧠 Why this works

The prompt explicitly defines:

  • Output count
  • Formatting rules
  • Emoji governance
  • Platform context

This transforms the LLM from a chatbot into a deterministic generator.


🧩 Prompt as a Control System

Our prompt contains:

  • Output contract
  • Line limits
  • Tone rules
  • Platform emoji rules
  • Safety guardrails

Example conceptually:

Generate exactly 3 wishes
Each must be multiline
Use Slack emoticons only
Never mix emoji formats
Max one emoji per line

The LLM becomes deterministic.


🎭 Platform-Specific Emoticon Control

One of the most powerful improvements:

| Platform  | Emoticon Style  |
| --------- | --------------- |
| WhatsApp  | Unicode 😊🎉    |
| Instagram | Aesthetic ✨🌸   |
| Facebook  | Simple 🙂🎈     |
| Slack     | :tada: :rocket: |
| Teams     | (y) (clap)      |

This dramatically improves output realism.

🧠 Multi-Option Wish Generation

Instead of returning one answer, the backend generates:

  • Option 1 – concise
  • Option 2 – alternate wording
  • Option 3 – expressive

Why?

Because creative output is subjective.

Multiple options increase:

  • User satisfaction
  • Perceived intelligence
  • Product quality

🛡️ Guardrails Matter

Without rules, LLMs will:

  • Overuse emojis
  • Sound robotic
  • Break formatting
  • Produce unsafe condolence text

Guardrails enforce:

  • Emotional sensitivity
  • Platform etiquette
  • Professional tone

This is critical in production AI.


🔄 End-to-End Request Flow

UI Input
   ↓
FastAPI schema validation
   ↓
Prompt compilation
   ↓
Local LLaMA inference
   ↓
Post-processing
   ↓
JSON response

Each layer has a single responsibility.

This architecture prevents fragile AI pipelines and makes debugging easy.


🧠 Why This Backend Design Scales

The same structure supports:

  • RAG systems
  • AI agents
  • Workflow automation
  • Chatbots
  • Copilots

Only the prompt logic changes.


🔜 What’s Coming in Part 3

In the next post we’ll build:

  • A modern Next.js SPA
  • Device‑agnostic UI
  • Elegant input controls
  • Copy‑to‑clipboard UX

👉 Part 3 – Building the Frontend with Next.j


✨ Final Thoughts

The backend is the brain of any AI application.

Not the model.

Not the UI.

But the system that controls:

  • how the model thinks
  • what it can say
  • how safe it behaves

Master this layer — and you master AI engineering.