2025.02.01 Security

Agentic Security

As AI systems evolve from passive tools into autonomous agents capable of executing multi-step tasks, the security landscape shifts dramatically. Agentic AI doesn't just respond to prompts — it plans, acts, and iterates. This introduces attack surfaces that traditional application security frameworks weren't designed to handle.

The core challenge is straightforward: when you grant an AI system the ability to take actions in the real world — browsing the web, writing files, calling APIs — you're extending your trust boundary to include a system that can be manipulated through its inputs.

The Threat Model

Agentic systems face a unique class of vulnerabilities centered around prompt injection and tool misuse. An attacker doesn't need to compromise the model weights or the hosting infrastructure. They just need to place malicious instructions somewhere the agent will encounter them — in a webpage, an email, a document, or a database record.

Consider an agent tasked with processing emails and updating a CRM. A carefully crafted email could instruct the agent to exfiltrate data, modify records, or forward sensitive information to an external address. The agent follows instructions by design — the trick is making it distinguish between legitimate instructions from the user and injected instructions from adversarial content.

Defense Patterns

Privilege separation — agents should operate with the minimum permissions required for each specific task, not blanket access
Input provenance tracking — maintain clear boundaries between trusted (user) and untrusted (web, email, documents) instruction sources
Action confirmation gates — high-impact actions (send email, delete data, make purchases) should require explicit user approval
Output sanitization — treat all agent-generated content destined for execution as potentially tainted
Behavioral monitoring — detect anomalous action patterns that deviate from the stated task

Implementation requires layered defense. No single technique is sufficient because the attack surface is the agent's entire input context.

A Practical Example

Here's a simplified permission boundary check for an agent action pipeline:

# Pseudocode: action permission gate
def execute_action(action, context):
    if action.risk_level > THRESHOLD:
        approval = request_user_confirmation(action)
        if not approval:
            return ActionResult.DENIED

    if action.source != InstructionSource.USER:
        log_warning(f"Non-user instruction: {action}")
        return ActionResult.BLOCKED

    return action.execute(context)

The key insight is that risk classification must happen before execution, not after. By the time you detect a malicious action in logs, the damage is done.

Agentic security is still an emerging field. The frameworks and best practices are being written in real time as organizations deploy these systems. What's clear is that treating agents as trusted components is a mistake — they need the same scrutiny we apply to any untrusted input handler.