Skip to main content

The promise of NOT prompting but programming LMs

· 6 min read
Jagan Shanmugam
Machine Learning Engineer & Data Scientist

After the democratization of sufficiently intelligent systems like ChatGPT/Claude/Gemini behind closed APIs, naturally the fear of being reduced to prompting away the problems is quite high. I feel this fear is rather deep, as previous workflows provided a sense of (more) control. At the same time, for many who take pride in the raw code get offended when this reduction to 'mere' prompting happens. Thus, when a solution that addresses the ego or the fear comes along, it will be well received among these groups first and then later adopted by others.

DSPy promises us to let us write programs like we used to do and not to deal with prompting in code. That is just the beginning, it goes beyond that taking inspiration from PyTorch. Introducing a structured approach to the LLM world with ML101 concepts is a hard task and I feel DSPy is slowly achieving that over time. Instead of manually crafting and tweaking prompts, DSPy lets you declare what you want and automatically optimizes how to achieve it.

For Data people, think of DSPy as the SQL for language models - just as SQL lets you declare what data you want without specifying how to retrieve it, DSPy lets you declare your LM pipeline logic without manually engineering prompts. On top of that, it is designed to be composable at all levels making it elegant for many tasks. For people who like to over-engineer stuff, I think prompt engineering will become an endless cycle without any clear end in sight. When I think about the popularity of React (frontend framework) and the long lasting success of SQL, I can only foresee the success of any declarative frameworks like DSPy.

Unlike Langchain/LlamaIndex, DSPy forces us back to the whiteboard and to think from first principles for any ML project. It asks us to think in terms of evaluation which has been questioned in the vibe era. Defining metrics, curating datasets, experimenting with a hypothesis and tracking experiments to systemcatically improve the End-to-End system is back in style with DSPy.

The Problem with Traditional Prompting

Traditional LLM app development is brittle:

  • Manual prompt engineering for every task or model
  • Hard to optimize - requires extensive trial and error
  • Difficult to compose complex pipelines
  • May breaks when you change LM providers
  • No systematic way to improve performance

The DSPy way: Declarative Programming

DSPy introduces a declarative, modular approach:

import dspy

# Configure your LM
lm = dspy.OpenAI(model='gpt-4o')
dspy.settings.configure(lm=lm)

# Define task
class Translate(dspy.Signature):
"""Translate English to German."""
english = dspy.InputField()
german = dspy.OutputField()
instructions = dspy.InputField(desc="Instructions for the translation")

# Use it
translate = dspy.Predict(Translate)
result = translate(english="Around the World in 80 Days", instructions="Keep it short and concise like movie titles.")
print(result.german) # "Um die Welt in 80 Tagen"

Core Concepts

1. Signatures - Declare Your Task

Signatures are like type signatures for LM operations. They specify inputs, outputs, and the task description:

class Summarize(dspy.Signature):
"""Summarize a long document into key points."""
document = dspy.InputField()
key_points = dspy.OutputField(desc="bullet list of main ideas")

Variable names matter in DSPy as they are sent in the prompt to the LLM for your task.

2. Modules - Build Composable Pipelines

Modules are PyTorch-style components that can be:

  • Composed together
  • Optimized end-to-end
  • Reused across projects
class MultiHopQA(dspy.Module):
def __init__(self):
self.retrieve = dspy.Retrieve()
self.hop1 = dspy.ChainOfThought("context, question -> search_query")
self.hop2 = dspy.ChainOfThought("context, question -> answer")

def forward(self, question):
ctx1 = self.retrieve(question)
query = self.hop1(context=ctx1, question=question)
ctx2 = self.retrieve(query.search_query)
return self.hop2(context=ctx2, question=question)

3. Optimizers - Automatic Prompt and Weight Tuning

This is where DSPy shines. Optimizers automatically optimize your prompts:

  • BootstrapFewShot: Generates effective few-shot examples
  • MIPRO: Optimizes instructions and demonstrations jointly
  • BootstrapFewShotWithRandomSearch: Explores prompt variations
  • BayesianSignatureOptimizer: Uses Bayesian optimization
# Before compilation: generic prompts
# After compilation: optimized prompts with examples!
optimizer = dspy.BootstrapFewShot(max_bootstrapped_demos=4)
compiled = optimizer.compile(student=my_module, trainset=examples)

There are other primitives like Adapters, Tools and Metrics which I will not go into in detail here rather provide one-liners for comprehensiveness.

  • Adapters are the bridge between signatures and the actual LLM calls. You can easily swap JSON, BAML, XML adapters.
  • Tools can be standard functions or tools defined in MCP servers that can be attached to make the LLM agentic.
  • Metrics is the interesting part that can be combined with optimizers to optimize the program.

Why Declarative?

Imperative (Old)Declarative (DSPy)
Manually craft promptsDeclare signatures
Hard-code examplesAuto-generate examples
Trial and error tuningSystematic optimization
Brittle prompt stringsComposable modules
One-off solutionsReusable components

Real-World Use Cases

  1. RAG Systems: Retrieval + reasoning pipelines - Complex question answering with multiple reasoning steps
  2. Data Extraction: Extract structured data from unstructured text - Extract data from unstructured text
  3. Agentic Systems: Build reliable LLM-based agents - Build agents that can perform tasks
  4. Classification: Few-shot classification that improves over time - Classify data with few-shot examples
  5. LLM-as-a-Judge: Use LLMs to judge the quality of other LLMs' responses, align with human experts

DSPy Architecture

When to/not to use DSPy

  • Use DSPy when:

    • Building complex LM pipelines
    • Need systematic optimization
    • Want composable, maintainable code
    • Switching between LM providers
    • Have training data to optimize with
  • Skip DSPy when:

    • Simple one-off prompts
    • No training/validation data
    • Extremely domain-specific where manual control is critical

Further Reading

DSPy isn't just another prompting library - it's a mindset shift in how we program with language models. By embracing declarative programming, we can build more reliable, maintainable, and performant LM applications.