The Problem

Every team building with LLMs ends up in the same place.

1. Prompts scattered everywhere

Without a dedicated format, prompts end up as f-strings in Python, template literals in JS, or markdown files scattered across the repo. There's no single source of truth.

The typical project structure
project/
├── app/
│   ├── services/
│   │   ├── teacher.py      # f-strings scattered
│   │   ├── grader.py
│   │   └── chatbot.py
│   └── utils/
│       └── prompt_builder.py
├── prompts/
│   ├── teacher_v1.txt      # Version confusion
│   ├── teacher_v2.txt
│   ├── teacher_final.txt
│   └── teacher_final_v2.txt
└── config/
    └── prompts.yaml        # Yet another format

2. Branching tangled in code

Conditional logic lives in your application code, not your prompt. Change a prompt, and you need a developer to update the code. Non-technical team members can't iterate.

prompts.py
# What most teams end up with
prompt = f"""
You are a {role}.
{"Answer the question directly." if is_question else "Continue the lesson."}
{"Give a short answer." if depth == "shallow" else "Give a detailed answer."}
Here is the context: {context}
The user said: {user_message}
"""

3. No versioning

When you update a prompt, there's no way to know what changed. No diff. No history. No way to roll back. And if you have multiple consumers, they all get the new version—whether they're ready or not.

4. No contracts

What parameters does this prompt expect? What shape should the response be? There's no schema. You find out at runtime—usually in production—when the LLM returns something unexpected.

5. Token waste invisible

Every if/else branch, every conditional block gets sent to the LLM—even though only one branch is relevant. You're paying for tokens that never affect the output. And there's no way to measure or optimize.

This works until it doesn't. Then it's very hard to fix.