The promise of NOT prompting but programming LMs
After the democratization of sufficiently intelligent systems like ChatGPT/Claude/Gemini behind closed APIs, naturally the fear of being reduced to prompting away the problems is quite high. I feel this fear is rather deep, as previous workflows provided a sense of (more) control. At the same time, for many who take pride in the raw code get offended when this reduction to 'mere' prompting happens. Thus, when a solution that addresses the ego or the fear comes along, it will be well received among these groups first and then later adopted by others.
DSPy promises us to let us write programs like we used to do and not to deal with prompting in code. That is just the beginning, it goes beyond that taking inspiration from PyTorch. Introducing a structured approach to the LLM world with ML101 concepts is a hard task and I feel DSPy is slowly achieving that over time. Instead of manually crafting and tweaking prompts, DSPy lets you declare what you want and automatically optimizes how to achieve it.
For Data people, think of DSPy as the SQL for language models - just as SQL lets you declare what data you want without specifying how to retrieve it, DSPy lets you declare your LM pipeline logic without manually engineering prompts. On top of that, it is designed to be composable at all levels making it elegant for many tasks. For people who like to over-engineer stuff, I think prompt engineering will become an endless cycle without any clear end in sight. When I think about the popularity of React (frontend framework) and the long lasting success of SQL, I can only foresee the success of any declarative frameworks like DSPy.
Unlike Langchain/LlamaIndex, DSPy forces us back to the whiteboard and to think from first principles for any ML project. It asks us to think in terms of evaluation which has been questioned in the vibe era. Defining metrics, curating datasets, experimenting with a hypothesis and tracking experiments to systemcatically improve the End-to-End system is back in style with DSPy.
The Problem with Traditional Prompting
Traditional LLM app development is brittle:
- Manual prompt engineering for every task or model
- Hard to optimize - requires extensive trial and error
- Difficult to compose complex pipelines
- May breaks when you change LM providers
- No systematic way to improve performance
The DSPy way: Declarative Programming
DSPy introduces a declarative, modular approach:
import dspy
# Configure your LM
lm = dspy.OpenAI(model='gpt-4o')
dspy.settings.configure(lm=lm)
# Define task
class Translate(dspy.Signature):
"""Translate English to German."""
english = dspy.InputField()
german = dspy.OutputField()
instructions = dspy.InputField(desc="Instructions for the translation")
# Use it
translate = dspy.Predict(Translate)
result = translate(english="Around the World in 80 Days", instructions="Keep it short and concise like movie titles.")
print(result.german) # "Um die Welt in 80 Tagen"
Core Concepts
1. Signatures - Declare Your Task
Signatures are like type signatures for LM operations. They specify inputs, outputs, and the task description:
class Summarize(dspy.Signature):
"""Summarize a long document into key points."""
document = dspy.InputField()
key_points = dspy.OutputField(desc="bullet list of main ideas")
Variable names matter in DSPy as they are sent in the prompt to the LLM for your task.
2. Modules - Build Composable Pipelines
Modules are PyTorch-style components that can be:
- Composed together
- Optimized end-to-end
- Reused across projects
class MultiHopQA(dspy.Module):
def __init__(self):
self.retrieve = dspy.Retrieve()
self.hop1 = dspy.ChainOfThought("context, question -> search_query")
self.hop2 = dspy.ChainOfThought("context, question -> answer")
def forward(self, question):
ctx1 = self.retrieve(question)
query = self.hop1(context=ctx1, question=question)
ctx2 = self.retrieve(query.search_query)
return self.hop2(context=ctx2, question=question)
3. Optimizers - Automatic Prompt and Weight Tuning
This is where DSPy shines. Optimizers automatically optimize your prompts:
BootstrapFewShot: Generates effective few-shot examplesMIPRO: Optimizes instructions and demonstrations jointlyBootstrapFewShotWithRandomSearch: Explores prompt variationsBayesianSignatureOptimizer: Uses Bayesian optimization
# Before compilation: generic prompts
# After compilation: optimized prompts with examples!
optimizer = dspy.BootstrapFewShot(max_bootstrapped_demos=4)
compiled = optimizer.compile(student=my_module, trainset=examples)
There are other primitives like Adapters, Tools and Metrics which I will not go into in detail here rather provide one-liners for comprehensiveness.
- Adapters are the bridge between signatures and the actual LLM calls. You can easily swap JSON, BAML, XML adapters.
- Tools can be standard functions or tools defined in MCP servers that can be attached to make the LLM agentic.
- Metrics is the interesting part that can be combined with optimizers to optimize the program.
Why Declarative?
| Imperative (Old) | Declarative (DSPy) |
|---|---|
| Manually craft prompts | Declare signatures |
| Hard-code examples | Auto-generate examples |
| Trial and error tuning | Systematic optimization |
| Brittle prompt strings | Composable modules |
| One-off solutions | Reusable components |
Real-World Use Cases
- RAG Systems: Retrieval + reasoning pipelines - Complex question answering with multiple reasoning steps
- Data Extraction: Extract structured data from unstructured text - Extract data from unstructured text
- Agentic Systems: Build reliable LLM-based agents - Build agents that can perform tasks
- Classification: Few-shot classification that improves over time - Classify data with few-shot examples
- LLM-as-a-Judge: Use LLMs to judge the quality of other LLMs' responses, align with human experts
DSPy Architecture
When to/not to use DSPy
-
Use DSPy when:
- Building complex LM pipelines
- Need systematic optimization
- Want composable, maintainable code
- Switching between LM providers
- Have training data to optimize with
-
Skip DSPy when:
- Simple one-off prompts
- No training/validation data
- Extremely domain-specific where manual control is critical
Further Reading
DSPy isn't just another prompting library - it's a mindset shift in how we program with language models. By embracing declarative programming, we can build more reliable, maintainable, and performant LM applications.