The promise of NOT prompting but programming LMs

October 28, 2025 · 6 min read

Machine Learning Engineer & Data Scientist

After the democratization of sufficiently intelligent systems like ChatGPT/Claude/Gemini behind closed APIs, naturally the fear of being reduced to prompting away the problems is quite high. I feel this fear is rather deep, as previous workflows provided a sense of (more) control. At the same time, for many who take pride in the raw code get offended when this reduction to 'mere' prompting happens. Thus, when a solution that addresses the ego or the fear comes along, it will be well received among these groups first and then later adopted by others.

DSPy promises us to let us write programs like we used to do and not to deal with prompting in code. That is just the beginning, it goes beyond that taking inspiration from PyTorch. Introducing a structured approach to the LLM world with ML101 concepts is a hard task and I feel DSPy is slowly achieving that over time. Instead of manually crafting and tweaking prompts, DSPy lets you declare what you want and automatically optimizes how to achieve it.

For Data people, think of DSPy as the SQL for language models - just as SQL lets you declare what data you want without specifying how to retrieve it, DSPy lets you declare your LM pipeline logic without manually engineering prompts. On top of that, it is designed to be composable at all levels making it elegant for many tasks. For people who like to over-engineer stuff, I think prompt engineering will become an endless cycle without any clear end in sight. When I think about the popularity of React (frontend framework) and the long lasting success of SQL, I can only foresee the success of any declarative frameworks like DSPy.

Unlike Langchain/LlamaIndex, DSPy forces us back to the whiteboard and to think from first principles for any ML project. It asks us to think in terms of evaluation which has been questioned in the vibe era. Defining metrics, curating datasets, experimenting with a hypothesis and tracking experiments to systemcatically improve the End-to-End system is back in style with DSPy.

The Problem with Traditional Prompting

Traditional LLM app development is brittle:

Manual prompt engineering for every task or model
Hard to optimize - requires extensive trial and error
Difficult to compose complex pipelines
May breaks when you change LM providers
No systematic way to improve performance

The DSPy way: Declarative Programming

DSPy introduces a declarative, modular approach:

import dspy

# Configure your LM
lm = dspy.OpenAI(model='gpt-4o')
dspy.settings.configure(lm=lm)

# Define task
class Translate(dspy.Signature):
    """Translate English to German."""
    english = dspy.InputField()
    german = dspy.OutputField()
    instructions = dspy.InputField(desc="Instructions for the translation")

# Use it
translate = dspy.Predict(Translate)
result = translate(english="Around the World in 80 Days", instructions="Keep it short and concise like movie titles.")
print(result.german)  # "Um die Welt in 80 Tagen"

Core Concepts

1. Signatures - Declare Your Task

Signatures are like type signatures for LM operations. They specify inputs, outputs, and the task description:

class Summarize(dspy.Signature):
    """Summarize a long document into key points."""
    document = dspy.InputField()
    key_points = dspy.OutputField(desc="bullet list of main ideas")

Variable names matter in DSPy as they are sent in the prompt to the LLM for your task.

2. Modules - Build Composable Pipelines

Modules are PyTorch-style components that can be:

Composed together
Optimized end-to-end
Reused across projects

class MultiHopQA(dspy.Module):
    def __init__(self):
        self.retrieve = dspy.Retrieve()
        self.hop1 = dspy.ChainOfThought("context, question -> search_query")
        self.hop2 = dspy.ChainOfThought("context, question -> answer")

    def forward(self, question):
        ctx1 = self.retrieve(question)
        query = self.hop1(context=ctx1, question=question)
        ctx2 = self.retrieve(query.search_query)
        return self.hop2(context=ctx2, question=question)

3. Optimizers - Automatic Prompt and Weight Tuning

This is where DSPy shines. Optimizers automatically optimize your prompts:

BootstrapFewShot: Generates effective few-shot examples
MIPRO: Optimizes instructions and demonstrations jointly
BootstrapFewShotWithRandomSearch: Explores prompt variations
BayesianSignatureOptimizer: Uses Bayesian optimization

# Before compilation: generic prompts
# After compilation: optimized prompts with examples!
optimizer = dspy.BootstrapFewShot(max_bootstrapped_demos=4)
compiled = optimizer.compile(student=my_module, trainset=examples)

There are other primitives like Adapters, Tools and Metrics which I will not go into in detail here rather provide one-liners for comprehensiveness.

Adapters are the bridge between signatures and the actual LLM calls. You can easily swap JSON, BAML, XML adapters.
Tools can be standard functions or tools defined in MCP servers that can be attached to make the LLM agentic.
Metrics is the interesting part that can be combined with optimizers to optimize the program.

Why Declarative?

Imperative (Old)	Declarative (DSPy)
Manually craft prompts	Declare signatures
Hard-code examples	Auto-generate examples
Trial and error tuning	Systematic optimization
Brittle prompt strings	Composable modules
One-off solutions	Reusable components

Real-World Use Cases

RAG Systems: Retrieval + reasoning pipelines - Complex question answering with multiple reasoning steps
Data Extraction: Extract structured data from unstructured text - Extract data from unstructured text
Agentic Systems: Build reliable LLM-based agents - Build agents that can perform tasks
Classification: Few-shot classification that improves over time - Classify data with few-shot examples
LLM-as-a-Judge: Use LLMs to judge the quality of other LLMs' responses, align with human experts

DSPy Architecture

When to/not to use DSPy

Use DSPy when:
- Building complex LM pipelines
- Need systematic optimization
- Want composable, maintainable code
- Switching between LM providers
- Have training data to optimize with
Skip DSPy when:
- Simple one-off prompts
- No training/validation data
- Extremely domain-specific where manual control is critical

The Problem with Traditional Prompting​

The DSPy way: Declarative Programming​

Core Concepts​

1. Signatures - Declare Your Task​

2. Modules - Build Composable Pipelines​

3. Optimizers - Automatic Prompt and Weight Tuning​

There are other primitives like Adapters, Tools and Metrics which I will not go into in detail here rather provide one-liners for comprehensiveness.​

Why Declarative?​

Real-World Use Cases​

DSPy Architecture​

When to/not to use DSPy​

Further Reading​