Your AI Agent Has 12 Hardcoded API Keys and You Call It "Production-Ready" — Dependency Injection Patterns for Agent Systems
Hardcoded LLM clients, inline API keys, baked-in model names — your agent works until you need to change anything. Here are 5 DI patterns that make agents testable, swappable, and production-ready.
Most agent code I review looks like this: the LLM client is instantiated inline, the vector store connection is hardcoded in the tool function, the prompt template is a raw string in the agent loop, and the embedding model is imported at the top of the file with a specific provider baked in.
Then someone asks: "Can we swap Claude for GPT-4o on this agent?" And the answer is a 3-day refactor.
This is the dependency injection problem applied to AI agents. It's well-solved in traditional software engineering — Spring, Guice, FastAPI's Depends() — but almost nobody applies it to agent architectures. The result is agent systems that work fine in demos but resist every production requirement: swapping models, switching providers, running tests with mock LLMs, A/B testing prompt variants, or deploying the same agent against staging vs. production databases.
Here are 5 patterns that fix this. All code is Python. All patterns are running in real multi-agent systems.
Why Agents Need DI More Than Regular Software
A typical web service has 3-5 dependencies: a database, a cache, maybe an external API. A production AI agent has at least twice that:
- LLM client (model, provider, API key, temperature, max tokens)
- Embedding model (provider, dimensions, batch size)
- Vector store (connection, collection name, search parameters)
- Tool registry (which tools are available, their configurations)
- Prompt templates (system prompts, few-shot examples)
- Memory backend (conversation history, knowledge graph)
- Guardrails (content filters, output validators)
- Observability (tracing, logging, metrics exporters)
Hardcoding any of these makes the agent brittle. Hardcoding all of them makes it unmaintainable.
Pattern 1: The Agent Configuration Object
The simplest pattern. Instead of scattering configuration across your agent code, collect all dependencies into a single typed configuration object that gets passed at construction time.
from dataclasses import dataclass, field
from typing import Protocol
class LLMClient(Protocol):
"""Protocol for any LLM provider."""
def complete(self, messages: list[dict], **kwargs) -> str: ...
def complete_structured(self, messages: list[dict], schema: dict, **kwargs) -> dict: ...
class EmbeddingClient(Protocol):
"""Protocol for any embedding provider."""
def embed(self, texts: list[str]) -> list[list[float]]: ...
class VectorStore(Protocol):
"""Protocol for any vector store."""
def search(self, query_embedding: list[float], top_k: int = 5) -> list[dict]: ...
def upsert(self, documents: list[dict]) -> None: ...
@dataclass(frozen=True)
class AgentConfig:
"""All agent dependencies in one place."""
llm: LLMClient
embeddings: EmbeddingClient
vector_store: VectorStore
system_prompt: str
max_iterations: int = 10
temperature: float = 0.1
timeout_seconds: float = 120.0
tools: list[str] = field(default_factory=list)
class ResearchAgent:
"""Agent that receives all dependencies via config."""
def __init__(self, config: AgentConfig) -> None:
self.config = config
self._iteration_count = 0
def run(self, query: str) -> dict:
self._iteration_count = 0
messages = [
{"role": "system", "content": self.config.system_prompt},
{"role": "user", "content": query},
]
while self._iteration_count < self.config.max_iterations:
response = self.config.llm.complete(
messages, temperature=self.config.temperature
)
self._iteration_count += 1
if self._is_final_answer(response):
return {"answer": response, "iterations": self._iteration_count}
# Use injected embedding client and vector store
query_embedding = self.config.embeddings.embed([query])[0]
context = self.config.vector_store.search(query_embedding, top_k=3)
messages.append({"role": "assistant", "content": response})
messages.append({
"role": "user",
"content": f"Additional context: {context}",
})
return {"answer": "Max iterations reached", "iterations": self._iteration_count}
def _is_final_answer(self, response: str) -> bool:
return "FINAL ANSWER:" in response
The key design choice: AgentConfig is a frozen dataclass. Once constructed, the agent's dependencies don't change mid-execution. This prevents a class of bugs where a dependency gets swapped while the agent is running.
Now creating the agent for different environments is trivial:
# Production
prod_agent = ResearchAgent(AgentConfig(
llm=AnthropicClient(model="claude-sonnet-4-20250514", api_key=os.environ["ANTHROPIC_KEY"]),
embeddings=OpenAIEmbeddings(model="text-embedding-3-small"),
vector_store=PineconeStore(index="prod-knowledge"),
system_prompt=load_prompt("research_agent_v3"),
max_iterations=15,
))
# Testing — no API calls, deterministic
test_agent = ResearchAgent(AgentConfig(
llm=MockLLM(responses=["FINAL ANSWER: test result"]),
embeddings=MockEmbeddings(dimension=256),
vector_store=InMemoryVectorStore(),
system_prompt="You are a test agent.",
max_iterations=3,
))
# A/B test — same agent, different model
variant_b = ResearchAgent(AgentConfig(
llm=OpenAIClient(model="gpt-4o", api_key=os.environ["OPENAI_KEY"]),
embeddings=OpenAIEmbeddings(model="text-embedding-3-small"),
vector_store=PineconeStore(index="prod-knowledge"),
system_prompt=load_prompt("research_agent_v3"),
max_iterations=15,
))
Pattern 2: Protocol-Based Tool Registry
Tools are the most common hardcoded dependency in agent systems. Most implementations define tools as decorated functions that directly import their dependencies:
# This is the problem — hardcoded imports everywhere
import requests
from my_database import get_connection
@tool
def search_web(query: str) -> str:
return requests.get(f"https://api.search.com?q={query}").text
@tool
def query_database(sql: str) -> list:
conn = get_connection() # Hardcoded connection
return conn.execute(sql).fetchall()
The fix is a tool registry that accepts tool implementations at construction time:
from dataclasses import dataclass, field
from typing import Any, Callable
@dataclass
class ToolSpec:
"""Specification for an injectable tool."""
name: str
description: str
handler: Callable[..., Any]
parameters: dict # JSON Schema for the tool's parameters
class ToolRegistry:
"""Registry that holds tool implementations injected at construction."""
def __init__(self) -> None:
self._tools: dict[str, ToolSpec] = {}
def register(self, spec: ToolSpec) -> None:
self._tools[spec.name] = spec
def get(self, name: str) -> ToolSpec | None:
return self._tools.get(name)
def list_tools(self) -> list[dict]:
"""Return tool descriptions in the format LLMs expect."""
return [
{
"type": "function",
"function": {
"name": spec.name,
"description": spec.description,
"parameters": spec.parameters,
},
}
for spec in self._tools.values()
]
def execute(self, name: str, arguments: dict) -> Any:
spec = self._tools.get(name)
if spec is None:
raise ValueError(f"Unknown tool: {name}")
return spec.handler(**arguments)
# --- Build different registries for different environments ---
def build_production_registry(db_url: str, search_api_key: str) -> ToolRegistry:
"""Production tools with real connections."""
import httpx
import psycopg2
conn = psycopg2.connect(db_url)
def search_web(query: str) -> str:
resp = httpx.get(
"https://api.search.com/search",
params={"q": query},
headers={"Authorization": search_api_key},
)
resp.raise_for_status()
return resp.text
def query_db(sql: str) -> list[dict]:
with conn.cursor() as cur:
cur.execute(sql)
columns = [desc[0] for desc in cur.description]
return [dict(zip(columns, row)) for row in cur.fetchall()]
registry = ToolRegistry()
registry.register(ToolSpec(
name="search_web",
description="Search the web for current information",
handler=search_web,
parameters={"type": "object", "properties": {"query": {"type": "string"}}},
))
registry.register(ToolSpec(
name="query_database",
description="Run a read-only SQL query",
handler=query_db,
parameters={"type": "object", "properties": {"sql": {"type": "string"}}},
))
return registry
def build_test_registry() -> ToolRegistry:
"""Test tools that return canned responses."""
registry = ToolRegistry()
registry.register(ToolSpec(
name="search_web",
description="Search the web for current information",
handler=lambda query: '{"results": [{"title": "Test", "snippet": "Mock result"}]}',
parameters={"type": "object", "properties": {"query": {"type": "string"}}},
))
registry.register(ToolSpec(
name="query_database",
description="Run a read-only SQL query",
handler=lambda sql: [{"id": 1, "name": "test_row"}],
parameters={"type": "object", "properties": {"sql": {"type": "string"}}},
))
return registry
The agent code never imports psycopg2 or httpx. It calls registry.execute("search_web", {"query": "..."}) and doesn't know — or care — whether the tool hits a real API or returns a mock string.
Pattern 3: Prompt Templates as Injectable Dependencies
Prompts are dependencies too. When a system prompt is a hardcoded string in your agent class, you can't:
- A/B test prompt variants without code changes
- Version prompts independently from code
- Run the same agent logic with domain-specific instructions
The pattern is to treat prompts as loadable, versionable resources:
import json
from dataclasses import dataclass
from pathlib import Path
@dataclass(frozen=True)
class PromptTemplate:
"""A versioned, parameterized prompt template."""
name: str
version: str
template: str
variables: list[str]
def render(self, **kwargs) -> str:
"""Render the template with provided variables."""
missing = [v for v in self.variables if v not in kwargs]
if missing:
raise ValueError(f"Missing template variables: {missing}")
result = self.template
for key, value in kwargs.items():
result = result.replace(f"{{{{{key}}}}}", str(value))
return result
class PromptStore:
"""Load prompts from versioned files on disk."""
def __init__(self, prompts_dir: Path) -> None:
self._dir = prompts_dir
def load(self, name: str, version: str | None = None) -> PromptTemplate:
"""Load a prompt by name, optionally pinning a version."""
if version:
path = self._dir / name / f"{version}.json"
else:
# Load latest version
prompt_dir = self._dir / name
versions = sorted(prompt_dir.glob("*.json"))
if not versions:
raise FileNotFoundError(f"No prompts found for '{name}'")
path = versions[-1]
with open(path) as f:
data = json.load(f)
return PromptTemplate(
name=data["name"],
version=data["version"],
template=data["template"],
variables=data.get("variables", []),
)
# Usage: inject different prompts into the same agent
store = PromptStore(Path("prompts"))
# Version 2 — conservative, formal tone
v2_prompt = store.load("research_agent", version="v2")
# Version 3 — concise, technical tone
v3_prompt = store.load("research_agent", version="v3")
# Same agent class, different behavior
agent_v2 = ResearchAgent(AgentConfig(
llm=llm_client,
embeddings=embedding_client,
vector_store=vs,
system_prompt=v2_prompt.render(domain="cybersecurity", max_sources=5),
))
agent_v3 = ResearchAgent(AgentConfig(
llm=llm_client,
embeddings=embedding_client,
vector_store=vs,
system_prompt=v3_prompt.render(domain="cybersecurity", max_sources=5),
))
A prompt file at prompts/research_agent/v3.json looks like:
{
"name": "research_agent",
"version": "v3",
"template": "You are a research agent specializing in {{domain}}. Find up to {{max_sources}} sources. Be concise and technical. Cite every claim.",
"variables": ["domain", "max_sources"]
}
The benefit compounds over time. After 20 prompt iterations, you have a version history. You can measure which version performs better in evals and roll back if a new version regresses.
Pattern 4: Factory Functions for Environment-Specific Assembly
With all dependencies injectable, you need a clean way to assemble the full agent for each environment. Factory functions handle this:
import os
from enum import Enum
class Environment(Enum):
PRODUCTION = "production"
STAGING = "staging"
TESTING = "testing"
LOCAL = "local"
def create_research_agent(env: Environment) -> ResearchAgent:
"""Factory that assembles the agent with environment-appropriate dependencies."""
if env == Environment.PRODUCTION:
llm = AnthropicClient(
model="claude-sonnet-4-20250514",
api_key=os.environ["ANTHROPIC_API_KEY"],
)
embeddings = OpenAIEmbeddings(
model="text-embedding-3-small",
api_key=os.environ["OPENAI_API_KEY"],
)
vector_store = PineconeStore(
api_key=os.environ["PINECONE_API_KEY"],
index="prod-knowledge",
)
prompt = PromptStore(Path("prompts")).load("research_agent", version="v3")
max_iter = 15
temp = 0.1
elif env == Environment.STAGING:
llm = AnthropicClient(
model="claude-haiku-4-20250514", # Cheaper model for staging
api_key=os.environ["ANTHROPIC_API_KEY"],
)
embeddings = OpenAIEmbeddings(
model="text-embedding-3-small",
api_key=os.environ["OPENAI_API_KEY"],
)
vector_store = PineconeStore(
api_key=os.environ["PINECONE_API_KEY"],
index="staging-knowledge",
)
prompt = PromptStore(Path("prompts")).load("research_agent") # latest
max_iter = 10
temp = 0.2
elif env == Environment.TESTING:
llm = MockLLM(responses=["FINAL ANSWER: deterministic test output"])
embeddings = MockEmbeddings(dimension=256)
vector_store = InMemoryVectorStore()
prompt = PromptTemplate(
name="test", version="test",
template="You are a test agent. Respond with FINAL ANSWER: followed by your answer.",
variables=[],
)
max_iter = 3
temp = 0.0
else: # LOCAL
llm = OllamaClient(model="llama3.1:8b", base_url="http://localhost:11434")
embeddings = LocalEmbeddings(model="all-MiniLM-L6-v2")
vector_store = ChromaStore(persist_dir="./local_chroma")
prompt = PromptStore(Path("prompts")).load("research_agent")
max_iter = 5
temp = 0.3
return ResearchAgent(AgentConfig(
llm=llm,
embeddings=embeddings,
vector_store=vector_store,
system_prompt=prompt.render(domain="general", max_sources=5)
if hasattr(prompt, "render") else prompt.template,
max_iterations=max_iter,
temperature=temp,
))
# In your application entrypoint:
env = Environment(os.getenv("APP_ENV", "local"))
agent = create_research_agent(env)
result = agent.run("What are the latest developments in quantum computing?")
The factory function is the only place that knows about concrete implementations. The agent class is pure logic. The test suite never touches a real API. The staging environment uses cheaper models. The local environment runs entirely offline with Ollama.
Pattern 5: Runtime Dependency Swapping with Context Managers
Sometimes you need to swap a dependency for a single operation without rebuilding the entire agent. This comes up in A/B testing, fallback scenarios, and debugging. Context managers make this clean:
from contextlib import contextmanager
from dataclasses import replace
from typing import Generator
class ConfigurableAgent:
"""Agent that supports scoped dependency overrides."""
def __init__(self, config: AgentConfig) -> None:
self._base_config = config
self._active_config = config
@property
def config(self) -> AgentConfig:
return self._active_config
@contextmanager
def override(self, **kwargs) -> Generator[None, None, None]:
"""Temporarily replace dependencies for a scoped operation.
Usage:
with agent.override(llm=fallback_llm, temperature=0.5):
result = agent.run(query)
"""
previous = self._active_config
self._active_config = replace(self._base_config, **kwargs)
try:
yield
finally:
self._active_config = previous
def run(self, query: str) -> dict:
# Uses self.config which respects overrides
messages = [
{"role": "system", "content": self.config.system_prompt},
{"role": "user", "content": query},
]
response = self.config.llm.complete(
messages, temperature=self.config.temperature,
)
return {"answer": response}
# Usage: A/B testing at the request level
agent = ConfigurableAgent(base_config)
# Normal request — uses default model
result_a = agent.run("Explain quantum entanglement")
# Override for a single request — different model, different temperature
with agent.override(
llm=OpenAIClient(model="gpt-4o", api_key=os.environ["OPENAI_KEY"]),
temperature=0.7,
):
result_b = agent.run("Explain quantum entanglement")
# Back to original config automatically
result_c = agent.run("Explain quantum entanglement")
# Usage: fallback on failure
try:
result = agent.run("Complex research query")
except TimeoutError:
# Primary model timed out — fall back to faster model
with agent.override(
llm=AnthropicClient(model="claude-haiku-4-20250514", api_key=key),
max_iterations=5,
):
result = agent.run("Complex research query")
The replace() function from dataclasses creates a new frozen config with only the specified fields changed. The context manager guarantees the original config is restored even if the overridden operation raises an exception. This is safe for single-threaded agent execution, which covers most agent use cases.
What Changes When You Apply These Patterns
Before DI, adding a new LLM provider means editing every file that instantiates a model. After DI, it means writing one new class that satisfies the LLMClient protocol and updating the factory function.
Before DI, testing an agent requires mocking at the HTTP level — intercepting API calls with responses or httpretty. After DI, you pass a MockLLM at construction time. Your tests run in milliseconds with zero network calls.
Before DI, A/B testing prompt variants requires feature flags deep inside agent logic. After DI, you construct two agents with different PromptTemplate versions and route traffic between them.
Before DI, switching from Pinecone to Weaviate means refactoring every function that calls pinecone.query(). After DI, you write a WeaviateStore class that satisfies the VectorStore protocol and change one line in the factory.
The compound effect: teams using these patterns report 40-60% faster iteration cycles on agent improvements because changing any single component — model, prompt, tool, store — is a one-file change instead of a cross-codebase refactor.
Common Objections
"This is over-engineering for a prototype." Maybe. But prototypes that work become production systems. The cost of adding DI at the start is 30 minutes. The cost of adding it after 6 months of hardcoded dependencies is a week.
"Python doesn't need DI — just use module-level globals." Module globals are the opposite of DI. They're invisible dependencies that make testing painful and configuration opaque. Protocol classes + config objects make dependencies explicit and swappable.
"My agent only uses one model." Today. Tomorrow you'll want to swap models for cost optimization, test with mocks, or run evals against multiple providers. DI makes that transition trivial.
Implementation Checklist
If you're refactoring an existing agent:
- Identify all dependencies — grep for import statements, API keys, connection strings, hardcoded model names. These are your injection targets.
- Define protocols — write Protocol classes for each dependency category (LLM, embeddings, vector store, tools).
- Create a config object — frozen dataclass that holds all dependencies.
- Build factory functions — one per environment (production, staging, testing, local).
- Update tests — replace HTTP mocking with injected mock implementations.
- Add the override pattern — if you need request-level dependency swapping for A/B testing or fallbacks.
Start with the LLM client — it's always the most frequently hardcoded dependency. Then tools. Then everything else.
The agents that survive production aren't the ones with the best prompts. They're the ones whose dependencies can be changed without rewriting the agent.
Follow @klement_gunndu for more AI agent engineering content. We're building in public.
