Your AI Agent Has 12 Hardcoded API Keys and You Call It "Production-Ready" — Dependency Injection Patterns for Agent Systems

Most agent code I review looks like this: the LLM client is instantiated inline, the vector store connection is hardcoded in the tool function, the prompt template is a raw string in the agent loop, and the embedding model is imported at the top of the file with a specific provider baked in.

Then someone asks: "Can we swap Claude for GPT-4o on this agent?" And the answer is a 3-day refactor.

This is the dependency injection problem applied to AI agents. It's well-solved in traditional software engineering — Spring, Guice, FastAPI's Depends() — but almost nobody applies it to agent architectures. The result is agent systems that work fine in demos but resist every production requirement: swapping models, switching providers, running tests with mock LLMs, A/B testing prompt variants, or deploying the same agent against staging vs. production databases.

Here are 5 patterns that fix this. All code is Python. All patterns are running in real multi-agent systems.

Why Agents Need DI More Than Regular Software

A typical web service has 3-5 dependencies: a database, a cache, maybe an external API. A production AI agent has at least twice that:

LLM client (model, provider, API key, temperature, max tokens)
Embedding model (provider, dimensions, batch size)
Vector store (connection, collection name, search parameters)
Tool registry (which tools are available, their configurations)
Prompt templates (system prompts, few-shot examples)
Memory backend (conversation history, knowledge graph)
Guardrails (content filters, output validators)
Observability (tracing, logging, metrics exporters)

Hardcoding any of these makes the agent brittle. Hardcoding all of them makes it unmaintainable.

Pattern 1: The Agent Configuration Object

The simplest pattern. Instead of scattering configuration across your agent code, collect all dependencies into a single typed configuration object that gets passed at construction time.

from dataclasses import dataclass, field
from typing import Protocol


class LLMClient(Protocol):
    """Protocol for any LLM provider."""
    def complete(self, messages: list[dict], **kwargs) -> str: ...
    def complete_structured(self, messages: list[dict], schema: dict, **kwargs) -> dict: ...


class EmbeddingClient(Protocol):
    """Protocol for any embedding provider."""
    def embed(self, texts: list[str]) -> list[list[float]]: ...


class VectorStore(Protocol):
    """Protocol for any vector store."""
    def search(self, query_embedding: list[float], top_k: int = 5) -> list[dict]: ...
    def upsert(self, documents: list[dict]) -> None: ...


@dataclass(frozen=True)
class AgentConfig:
    """All agent dependencies in one place."""
    llm: LLMClient
    embeddings: EmbeddingClient
    vector_store: VectorStore
    system_prompt: str
    max_iterations: int = 10
    temperature: float = 0.1
    timeout_seconds: float = 120.0
    tools: list[str] = field(default_factory=list)


class ResearchAgent:
    """Agent that receives all dependencies via config."""

    def __init__(self, config: AgentConfig) -> None:
        self.config = config
        self._iteration_count = 0

    def run(self, query: str) -> dict:
        self._iteration_count = 0
        messages = [
            {"role": "system", "content": self.config.system_prompt},
            {"role": "user", "content": query},
        ]

        while self._iteration_count < self.config.max_iterations:
            response = self.config.llm.complete(
                messages, temperature=self.config.temperature
            )
            self._iteration_count += 1

            if self._is_final_answer(response):
                return {"answer": response, "iterations": self._iteration_count}

            # Use injected embedding client and vector store
            query_embedding = self.config.embeddings.embed([query])[0]
            context = self.config.vector_store.search(query_embedding, top_k=3)
            messages.append({"role": "assistant", "content": response})
            messages.append({
                "role": "user",
                "content": f"Additional context: {context}",
            })

        return {"answer": "Max iterations reached", "iterations": self._iteration_count}

    def _is_final_answer(self, response: str) -> bool:
        return "FINAL ANSWER:" in response

The key design choice: AgentConfig is a frozen dataclass. Once constructed, the agent's dependencies don't change mid-execution. This prevents a class of bugs where a dependency gets swapped while the agent is running.

Now creating the agent for different environments is trivial:

# Production
prod_agent = ResearchAgent(AgentConfig(
    llm=AnthropicClient(model="claude-sonnet-4-20250514", api_key=os.environ["ANTHROPIC_KEY"]),
    embeddings=OpenAIEmbeddings(model="text-embedding-3-small"),
    vector_store=PineconeStore(index="prod-knowledge"),
    system_prompt=load_prompt("research_agent_v3"),
    max_iterations=15,
))

# Testing — no API calls, deterministic
test_agent = ResearchAgent(AgentConfig(
    llm=MockLLM(responses=["FINAL ANSWER: test result"]),
    embeddings=MockEmbeddings(dimension=256),
    vector_store=InMemoryVectorStore(),
    system_prompt="You are a test agent.",
    max_iterations=3,
))

# A/B test — same agent, different model
variant_b = ResearchAgent(AgentConfig(
    llm=OpenAIClient(model="gpt-4o", api_key=os.environ["OPENAI_KEY"]),
    embeddings=OpenAIEmbeddings(model="text-embedding-3-small"),
    vector_store=PineconeStore(index="prod-knowledge"),
    system_prompt=load_prompt("research_agent_v3"),
    max_iterations=15,
))

Pattern 2: Protocol-Based Tool Registry

Tools are the most common hardcoded dependency in agent systems. Most implementations define tools as decorated functions that directly import their dependencies:

# This is the problem — hardcoded imports everywhere
import requests
from my_database import get_connection

@tool
def search_web(query: str) -> str:
    return requests.get(f"https://api.search.com?q={query}").text

@tool
def query_database(sql: str) -> list:
    conn = get_connection()  # Hardcoded connection
    return conn.execute(sql).fetchall()

The fix is a tool registry that accepts tool implementations at construction time:

from dataclasses import dataclass, field
from typing import Any, Callable


@dataclass
class ToolSpec:
    """Specification for an injectable tool."""
    name: str
    description: str
    handler: Callable[..., Any]
    parameters: dict  # JSON Schema for the tool's parameters


class ToolRegistry:
    """Registry that holds tool implementations injected at construction."""

    def __init__(self) -> None:
        self._tools: dict[str, ToolSpec] = {}

    def register(self, spec: ToolSpec) -> None:
        self._tools[spec.name] = spec

    def get(self, name: str) -> ToolSpec | None:
        return self._tools.get(name)

    def list_tools(self) -> list[dict]:
        """Return tool descriptions in the format LLMs expect."""
        return [
            {
                "type": "function",
                "function": {
                    "name": spec.name,
                    "description": spec.description,
                    "parameters": spec.parameters,
                },
            }
            for spec in self._tools.values()
        ]

    def execute(self, name: str, arguments: dict) -> Any:
        spec = self._tools.get(name)
        if spec is None:
            raise ValueError(f"Unknown tool: {name}")
        return spec.handler(**arguments)


# --- Build different registries for different environments ---

def build_production_registry(db_url: str, search_api_key: str) -> ToolRegistry:
    """Production tools with real connections."""
    import httpx
    import psycopg2

    conn = psycopg2.connect(db_url)

    def search_web(query: str) -> str:
        resp = httpx.get(
            "https://api.search.com/search",
            params={"q": query},
            headers={"Authorization": search_api_key},
        )
        resp.raise_for_status()
        return resp.text

    def query_db(sql: str) -> list[dict]:
        with conn.cursor() as cur:
            cur.execute(sql)
            columns = [desc[0] for desc in cur.description]
            return [dict(zip(columns, row)) for row in cur.fetchall()]

    registry = ToolRegistry()
    registry.register(ToolSpec(
        name="search_web",
        description="Search the web for current information",
        handler=search_web,
        parameters={"type": "object", "properties": {"query": {"type": "string"}}},
    ))
    registry.register(ToolSpec(
        name="query_database",
        description="Run a read-only SQL query",
        handler=query_db,
        parameters={"type": "object", "properties": {"sql": {"type": "string"}}},
    ))
    return registry


def build_test_registry() -> ToolRegistry:
    """Test tools that return canned responses."""
    registry = ToolRegistry()
    registry.register(ToolSpec(
        name="search_web",
        description="Search the web for current information",
        handler=lambda query: '{"results": [{"title": "Test", "snippet": "Mock result"}]}',
        parameters={"type": "object", "properties": {"query": {"type": "string"}}},
    ))
    registry.register(ToolSpec(
        name="query_database",
        description="Run a read-only SQL query",
        handler=lambda sql: [{"id": 1, "name": "test_row"}],
        parameters={"type": "object", "properties": {"sql": {"type": "string"}}},
    ))
    return registry

The agent code never imports psycopg2 or httpx. It calls registry.execute("search_web", {"query": "..."}) and doesn't know — or care — whether the tool hits a real API or returns a mock string.

Pattern 3: Prompt Templates as Injectable Dependencies

Prompts are dependencies too. When a system prompt is a hardcoded string in your agent class, you can't:

A/B test prompt variants without code changes
Version prompts independently from code
Run the same agent logic with domain-specific instructions

The pattern is to treat prompts as loadable, versionable resources:

import json
from dataclasses import dataclass
from pathlib import Path


@dataclass(frozen=True)
class PromptTemplate:
    """A versioned, parameterized prompt template."""
    name: str
    version: str
    template: str
    variables: list[str]

    def render(self, **kwargs) -> str:
        """Render the template with provided variables."""
        missing = [v for v in self.variables if v not in kwargs]
        if missing:
            raise ValueError(f"Missing template variables: {missing}")
        result = self.template
        for key, value in kwargs.items():
            result = result.replace(f"{{{{{key}}}}}", str(value))
        return result


class PromptStore:
    """Load prompts from versioned files on disk."""

    def __init__(self, prompts_dir: Path) -> None:
        self._dir = prompts_dir

    def load(self, name: str, version: str | None = None) -> PromptTemplate:
        """Load a prompt by name, optionally pinning a version."""
        if version:
            path = self._dir / name / f"{version}.json"
        else:
            # Load latest version
            prompt_dir = self._dir / name
            versions = sorted(prompt_dir.glob("*.json"))
            if not versions:
                raise FileNotFoundError(f"No prompts found for '{name}'")
            path = versions[-1]

        with open(path) as f:
            data = json.load(f)

        return PromptTemplate(
            name=data["name"],
            version=data["version"],
            template=data["template"],
            variables=data.get("variables", []),
        )


# Usage: inject different prompts into the same agent

store = PromptStore(Path("prompts"))

# Version 2 — conservative, formal tone
v2_prompt = store.load("research_agent", version="v2")

# Version 3 — concise, technical tone
v3_prompt = store.load("research_agent", version="v3")

# Same agent class, different behavior
agent_v2 = ResearchAgent(AgentConfig(
    llm=llm_client,
    embeddings=embedding_client,
    vector_store=vs,
    system_prompt=v2_prompt.render(domain="cybersecurity", max_sources=5),
))

agent_v3 = ResearchAgent(AgentConfig(
    llm=llm_client,
    embeddings=embedding_client,
    vector_store=vs,
    system_prompt=v3_prompt.render(domain="cybersecurity", max_sources=5),
))

A prompt file at prompts/research_agent/v3.json looks like:

{
  "name": "research_agent",
  "version": "v3",
  "template": "You are a research agent specializing in {{domain}}. Find up to {{max_sources}} sources. Be concise and technical. Cite every claim.",
  "variables": ["domain", "max_sources"]
}

The benefit compounds over time. After 20 prompt iterations, you have a version history. You can measure which version performs better in evals and roll back if a new version regresses.

Pattern 4: Factory Functions for Environment-Specific Assembly

With all dependencies injectable, you need a clean way to assemble the full agent for each environment. Factory functions handle this:

import os
from enum import Enum


class Environment(Enum):
    PRODUCTION = "production"
    STAGING = "staging"
    TESTING = "testing"
    LOCAL = "local"


def create_research_agent(env: Environment) -> ResearchAgent:
    """Factory that assembles the agent with environment-appropriate dependencies."""

    if env == Environment.PRODUCTION:
        llm = AnthropicClient(
            model="claude-sonnet-4-20250514",
            api_key=os.environ["ANTHROPIC_API_KEY"],
        )
        embeddings = OpenAIEmbeddings(
            model="text-embedding-3-small",
            api_key=os.environ["OPENAI_API_KEY"],
        )
        vector_store = PineconeStore(
            api_key=os.environ["PINECONE_API_KEY"],
            index="prod-knowledge",
        )
        prompt = PromptStore(Path("prompts")).load("research_agent", version="v3")
        max_iter = 15
        temp = 0.1

    elif env == Environment.STAGING:
        llm = AnthropicClient(
            model="claude-haiku-4-20250514",  # Cheaper model for staging
            api_key=os.environ["ANTHROPIC_API_KEY"],
        )
        embeddings = OpenAIEmbeddings(
            model="text-embedding-3-small",
            api_key=os.environ["OPENAI_API_KEY"],
        )
        vector_store = PineconeStore(
            api_key=os.environ["PINECONE_API_KEY"],
            index="staging-knowledge",
        )
        prompt = PromptStore(Path("prompts")).load("research_agent")  # latest
        max_iter = 10
        temp = 0.2

    elif env == Environment.TESTING:
        llm = MockLLM(responses=["FINAL ANSWER: deterministic test output"])
        embeddings = MockEmbeddings(dimension=256)
        vector_store = InMemoryVectorStore()
        prompt = PromptTemplate(
            name="test", version="test",
            template="You are a test agent. Respond with FINAL ANSWER: followed by your answer.",
            variables=[],
        )
        max_iter = 3
        temp = 0.0

    else:  # LOCAL
        llm = OllamaClient(model="llama3.1:8b", base_url="http://localhost:11434")
        embeddings = LocalEmbeddings(model="all-MiniLM-L6-v2")
        vector_store = ChromaStore(persist_dir="./local_chroma")
        prompt = PromptStore(Path("prompts")).load("research_agent")
        max_iter = 5
        temp = 0.3

    return ResearchAgent(AgentConfig(
        llm=llm,
        embeddings=embeddings,
        vector_store=vector_store,
        system_prompt=prompt.render(domain="general", max_sources=5)
            if hasattr(prompt, "render") else prompt.template,
        max_iterations=max_iter,
        temperature=temp,
    ))


# In your application entrypoint:
env = Environment(os.getenv("APP_ENV", "local"))
agent = create_research_agent(env)
result = agent.run("What are the latest developments in quantum computing?")

The factory function is the only place that knows about concrete implementations. The agent class is pure logic. The test suite never touches a real API. The staging environment uses cheaper models. The local environment runs entirely offline with Ollama.

Pattern 5: Runtime Dependency Swapping with Context Managers

Sometimes you need to swap a dependency for a single operation without rebuilding the entire agent. This comes up in A/B testing, fallback scenarios, and debugging. Context managers make this clean:

from contextlib import contextmanager
from dataclasses import replace
from typing import Generator


class ConfigurableAgent:
    """Agent that supports scoped dependency overrides."""

    def __init__(self, config: AgentConfig) -> None:
        self._base_config = config
        self._active_config = config

    @property
    def config(self) -> AgentConfig:
        return self._active_config

    @contextmanager
    def override(self, **kwargs) -> Generator[None, None, None]:
        """Temporarily replace dependencies for a scoped operation.

        Usage:
            with agent.override(llm=fallback_llm, temperature=0.5):
                result = agent.run(query)
        """
        previous = self._active_config
        self._active_config = replace(self._base_config, **kwargs)
        try:
            yield
        finally:
            self._active_config = previous

    def run(self, query: str) -> dict:
        # Uses self.config which respects overrides
        messages = [
            {"role": "system", "content": self.config.system_prompt},
            {"role": "user", "content": query},
        ]
        response = self.config.llm.complete(
            messages, temperature=self.config.temperature,
        )
        return {"answer": response}


# Usage: A/B testing at the request level
agent = ConfigurableAgent(base_config)

# Normal request — uses default model
result_a = agent.run("Explain quantum entanglement")

# Override for a single request — different model, different temperature
with agent.override(
    llm=OpenAIClient(model="gpt-4o", api_key=os.environ["OPENAI_KEY"]),
    temperature=0.7,
):
    result_b = agent.run("Explain quantum entanglement")

# Back to original config automatically
result_c = agent.run("Explain quantum entanglement")


# Usage: fallback on failure
try:
    result = agent.run("Complex research query")
except TimeoutError:
    # Primary model timed out — fall back to faster model
    with agent.override(
        llm=AnthropicClient(model="claude-haiku-4-20250514", api_key=key),
        max_iterations=5,
    ):
        result = agent.run("Complex research query")

The replace() function from dataclasses creates a new frozen config with only the specified fields changed. The context manager guarantees the original config is restored even if the overridden operation raises an exception. This is safe for single-threaded agent execution, which covers most agent use cases.

What Changes When You Apply These Patterns

Before DI, adding a new LLM provider means editing every file that instantiates a model. After DI, it means writing one new class that satisfies the LLMClient protocol and updating the factory function.

Before DI, testing an agent requires mocking at the HTTP level — intercepting API calls with responses or httpretty. After DI, you pass a MockLLM at construction time. Your tests run in milliseconds with zero network calls.

Before DI, A/B testing prompt variants requires feature flags deep inside agent logic. After DI, you construct two agents with different PromptTemplate versions and route traffic between them.

Before DI, switching from Pinecone to Weaviate means refactoring every function that calls pinecone.query(). After DI, you write a WeaviateStore class that satisfies the VectorStore protocol and change one line in the factory.

The compound effect: teams using these patterns report 40-60% faster iteration cycles on agent improvements because changing any single component — model, prompt, tool, store — is a one-file change instead of a cross-codebase refactor.

Common Objections

"This is over-engineering for a prototype." Maybe. But prototypes that work become production systems. The cost of adding DI at the start is 30 minutes. The cost of adding it after 6 months of hardcoded dependencies is a week.

"Python doesn't need DI — just use module-level globals." Module globals are the opposite of DI. They're invisible dependencies that make testing painful and configuration opaque. Protocol classes + config objects make dependencies explicit and swappable.

"My agent only uses one model." Today. Tomorrow you'll want to swap models for cost optimization, test with mocks, or run evals against multiple providers. DI makes that transition trivial.

Implementation Checklist

If you're refactoring an existing agent:

Identify all dependencies — grep for import statements, API keys, connection strings, hardcoded model names. These are your injection targets.
Define protocols — write Protocol classes for each dependency category (LLM, embeddings, vector store, tools).
Create a config object — frozen dataclass that holds all dependencies.
Build factory functions — one per environment (production, staging, testing, local).
Update tests — replace HTTP mocking with injected mock implementations.
Add the override pattern — if you need request-level dependency swapping for A/B testing or fallbacks.

Start with the LLM client — it's always the most frequently hardcoded dependency. Then tools. Then everything else.

The agents that survive production aren't the ones with the best prompts. They're the ones whose dependencies can be changed without rewriting the agent.

Follow @klement_gunndu for more AI agent engineering content. We're building in public.

Your AI Agent Has 12 Hardcoded API Keys and You Call It "Production-Ready" — Dependency Injection Patterns for Agent Systems

Why Agents Need DI More Than Regular Software

Pattern 1: The Agent Configuration Object

Pattern 2: Protocol-Based Tool Registry

Pattern 3: Prompt Templates as Injectable Dependencies

Pattern 4: Factory Functions for Environment-Specific Assembly

Pattern 5: Runtime Dependency Swapping with Context Managers

What Changes When You Apply These Patterns

Common Objections

Implementation Checklist

Comments

More from this blog

Your AI Agent Has 99.9% Uptime and Still Gives Wrong Answers — Here's How Error Budgets Fix That

Your AI Agent's Config Lives in 6 Different Files and Nobody Knows Which One Wins

Your AI Agent Breaks Every Time You Deploy a New Version — Here's How to Version and Ship Agents Safely

Your AI Agent Makes the Same LLM Call 50 Times a Day — 5 Caching Patterns That Cut Latency and Cost

Command Palette

Why Agents Need DI More Than Regular Software

Pattern 1: The Agent Configuration Object

Pattern 2: Protocol-Based Tool Registry

Pattern 3: Prompt Templates as Injectable Dependencies

Pattern 4: Factory Functions for Environment-Specific Assembly

Pattern 5: Runtime Dependency Swapping with Context Managers

What Changes When You Apply These Patterns

Common Objections

Implementation Checklist

Comments

More from this blog