⚙️ Building AI Backends with FastAPI & Prompt Engineering (Part 2)

Wed Jan 14 2026

FastAPI, Local LLaMA Integration & Prompt Engineering for Reliable AI Output

🧠 Blog Series Context

This is Part 2 of our hands-on AI engineering series where we build a local LLM–powered AI Wish Generator from scratch.

If you haven’t read Part 1 yet, start here:

👉 Part 1 – Architecture, Local LLMs & AI Engineering Fundamentals

📘 Series Breakdown


| Part       | Topic                                   
| ---------- | --------------------------------------- 
| Part 1     | Architecture, local LLMs & fundamentals 
| Part 2     | Backend & prompt engineering            
| Part 3     | Frontend with Next.js                   
| Part 4     | Dockerization & cloud shipping          
| Part 5     | AI Engineering Lessons from Building a Local LLM App

🎯 Objective of Part 2

In this post we will focus on the core intelligence layer of the application:

Designing a clean backend API
Integrating local LLaMA via Ollama
Engineering prompts for predictable output
Enforcing emotional and platform guardrails
Generating multiple high‑quality wish options

This is where AI engineering actually happens.

🧱 Backend Architecture Overview

Client (UI)
   ↓ JSON
FastAPI Backend
   ↓ Prompt Builder
Ollama HTTP API
   ↓
Local LLaMA Model

The backend acts as:

Input validator
Prompt compiler
AI behavior controller
Output normalizer

The LLM never talks directly to the frontend.

🐍 Why Python + FastAPI?

FastAPI is ideal for AI systems because:

Extremely fast (built on Starlette)
Async-first architecture
Automatic Swagger documentation
Strong typing with Pydantic
Clean separation of concerns

Most production AI platforms use Python-based services because LLM tooling, embeddings, and orchestration frameworks are Python-native.

🧩 Creating the FastAPI Application

Let’s start with a minimal but production-friendly FastAPI setup.

`main.py`

from fastapi import FastAPI
from app.schemas import WishRequest
from app.prompt import build_prompt
from app.llm import generate_wishes

app = FastAPI(
    title="WishCraft AI Backend",
    version="1.0.0"
)


@app.post("/generate-wish")
def generate_wish(request: WishRequest):
    prompt = build_prompt(
        category=request.category,
        tone=request.tone,
        platform=request.platform,
        with_emojis=request.with_emojis,
    )

    wishes = generate_wishes(prompt)

    return {
        "suggestions": wishes
    }

🔍 What’s happening here?

FastAPI exposes a clean REST endpoint
User intent is accepted as structured JSON
Prompt creation is isolated from API logic
LLM invocation is abstracted into a separate layer

This keeps the backend testable and scalable.

📦 Request Validation with Pydantic

Instead of accepting raw text, we define a strict schema.

`schemas.py`

from pydantic import BaseModel


class WishRequest(BaseModel):
    category: str
    tone: str
    platform: str
    with_emojis: bool = True

✅ Benefits

Prevents malformed input
Eliminates prompt injection risk
Enables frontend auto-completion
Documents API automatically

Strong input typing is essential in AI systems.

📁 Backend Folder Structure

backend/
│
├── app/
│   ├── main.py        # API routes
│   ├── schemas.py     # Request validation
│   ├── prompt.py      # Prompt engineering logic
│   └── llm.py         # Ollama integration
│
└── requirements.txt

Each responsibility is isolated — a key engineering principle.

🧩 Designing the API Contract

The frontend sends structured intent — not raw text.

Example request

{
  "category": "Birthday",
  "tone": "Soft",
  "platform": "WhatsApp",
  "with_emojis": true
}

This structure allows:

Better validation
Controlled prompt injection
Predictable output

📦 Request Schema (Pydantic)

from pydantic import BaseModel

class WishRequest(BaseModel):
    category: str
    tone: str
    platform: str
    with_emojis: bool = True

Strong typing is critical when building AI pipelines.

🦙 Integrating Local LLaMA via Ollama

Ollama exposes local LLMs through a simple HTTP interface.

POST http://localhost:11434/api/generate

Example payload

{
  "model": "llama3",
  "prompt": "Your compiled prompt here",
  "stream": false
}

This makes LLMs behave like any internal microservice.

🔌 LLM Integration Layer

`llm.py`

import requests

OLLAMA_URL = "http://localhost:11434/api/generate"
MODEL = "llama3"


def generate_wishes(prompt: str) -> list[str]:
    response = requests.post(
        OLLAMA_URL,
        json={
            "model": MODEL,
            "prompt": prompt,
            "stream": False
        },
        timeout=120
    )

    output = response.json()["response"]

    wishes = [
        line.strip()
        for line in output.split("~")
        if line.strip()
    ]

    return wishes[:3]

🔍 Explanation

Ollama runs fully locally
No SDK or cloud dependency
Simple HTTP-based inference
Output is normalized before returning

This abstraction allows you to swap models easily later.

🧠 Prompt Engineering Is Not a String

Most beginner AI apps fail here.

❌ Bad prompt engineering:

Write a birthday wish.

This produces:

Inconsistent output
Uncontrolled tone
Emoji overload
Formatting breaks

✅ Structured Prompt Engineering

A production-grade prompt must behave like configuration.

`prompt.py`

def build_prompt(category, tone, platform, with_emojis):

    emoji_rule = (
        "Use platform-specific emoticons."
        if with_emojis
        else "Do not use emoticons."
    )

    return f"""
You are a professional human message writer.

Generate exactly 3 multi-line wishes.

Each wish must:
- Follow the selected tone
- Respect the platform writing style
- Use correct emoticon syntax

Platform rules:
- Slack → :tada: :rocket:
- WhatsApp → 😊🎉
- Instagram → ✨🌸

Occasion: {category}
Tone: {tone}
Platform: {platform}
Emoji rule: {emoji_rule}

Return only the wishes separated by ~ lines.
"""

🧠 Why this works

The prompt explicitly defines:

Output count
Formatting rules
Emoji governance
Platform context

This transforms the LLM from a chatbot into a deterministic generator.

🧩 Prompt as a Control System

Our prompt contains:

Output contract
Line limits
Tone rules
Platform emoji rules
Safety guardrails

Example conceptually:

Generate exactly 3 wishes
Each must be multiline
Use Slack emoticons only
Never mix emoji formats
Max one emoji per line

The LLM becomes deterministic.

🎭 Platform-Specific Emoticon Control

One of the most powerful improvements:

| Platform  | Emoticon Style  |
| --------- | --------------- |
| WhatsApp  | Unicode 😊🎉    |
| Instagram | Aesthetic ✨🌸   |
| Facebook  | Simple 🙂🎈     |
| Slack     | :tada: :rocket: |
| Teams     | (y) (clap)      |

This dramatically improves output realism.

🧠 Multi-Option Wish Generation

Instead of returning one answer, the backend generates:

Option 1 – concise
Option 2 – alternate wording
Option 3 – expressive

Why?

Because creative output is subjective.

Multiple options increase:

User satisfaction
Perceived intelligence
Product quality

🛡️ Guardrails Matter

Without rules, LLMs will:

Overuse emojis
Sound robotic
Break formatting
Produce unsafe condolence text

Guardrails enforce:

Emotional sensitivity
Platform etiquette
Professional tone

This is critical in production AI.

🔄 End-to-End Request Flow

UI Input
   ↓
FastAPI schema validation
   ↓
Prompt compilation
   ↓
Local LLaMA inference
   ↓
Post-processing
   ↓
JSON response

Each layer has a single responsibility.

This architecture prevents fragile AI pipelines and makes debugging easy.

🧠 Why This Backend Design Scales

The same structure supports:

RAG systems
AI agents
Workflow automation
Chatbots
Copilots

Only the prompt logic changes.

🔜 What’s Coming in Part 3

In the next post we’ll build:

A modern Next.js SPA
Device‑agnostic UI
Elegant input controls
Copy‑to‑clipboard UX

👉 Part 3 – Building the Frontend with Next.j

✨ Final Thoughts

The backend is the brain of any AI application.

Not the model.

Not the UI.

But the system that controls:

how the model thinks
what it can say
how safe it behaves

Master this layer — and you master AI engineering.