The Problem
Every team building with LLMs ends up in the same place.
1. Prompts scattered everywhere
Without a dedicated format, prompts end up as f-strings in Python, template literals in JS, or markdown files scattered across the repo. There's no single source of truth.
project/
├── app/
│ ├── services/
│ │ ├── teacher.py # f-strings scattered
│ │ ├── grader.py
│ │ └── chatbot.py
│ └── utils/
│ └── prompt_builder.py
├── prompts/
│ ├── teacher_v1.txt # Version confusion
│ ├── teacher_v2.txt
│ ├── teacher_final.txt
│ └── teacher_final_v2.txt
└── config/
└── prompts.yaml # Yet another format
2. Branching tangled in code
Conditional logic lives in your application code, not your prompt. Change a prompt, and you need a developer to update the code. Non-technical team members can't iterate.
# What most teams end up with
prompt = f"""
You are a {role}.
{"Answer the question directly." if is_question else "Continue the lesson."}
{"Give a short answer." if depth == "shallow" else "Give a detailed answer."}
Here is the context: {context}
The user said: {user_message}
"""
3. No versioning
When you update a prompt, there's no way to know what changed. No diff. No history. No way to roll back. And if you have multiple consumers, they all get the new version—whether they're ready or not.
4. No contracts
What parameters does this prompt expect? What shape should the response be? There's no schema. You find out at runtime—usually in production—when the LLM returns something unexpected.
5. Token waste invisible
Every if/else branch, every conditional block gets sent to the LLM—even though only one branch is relevant. You're paying for tokens that never affect the output. And there's no way to measure or optimize.
This works until it doesn't. Then it's very hard to fix.