LLM Attack Surface

Why It Matters and What Guard Does

The AI Explosion and Its Security Blind Spot

Every enterprise is racing to deploy AI. Gartner projects that over 80% of enterprises will have deployed generative AI applications by 2026, up from just 5% in 2023. Worldwide spending on generative AI is forecast to reach $644 billion in 2025 alone — a 76% jump from the prior year.

But security hasn't kept up. Organizations are deploying LLMs faster than they can secure them, and in many cases, faster than they even know about. The result is a rapidly expanding attack surface that traditional security tools were never designed to detect, let alone protect.

This isn't a theoretical concern. It's happening now, at scale, with real consequences.


Why LLMs Are a New Class of Attack Surface

The architecture is fundamentally different

Traditional web applications have a clear separation between code and data — your application logic is in source files, your data is in a database, and your inputs flow through well-defined parameters. Security tools built over the last two decades exploit this separation: WAFs inspect HTTP parameters, SAST tools analyze source code, and DAST scanners fuzz input fields.

LLMs break this model entirely. In an LLM application, every input is both code and data simultaneously. A user prompt can contain instructions that the model interprets and follows. There is no clear boundary between "what the user said" and "what the system should do." This is why prompt injection — instructing the model to ignore its system prompt and follow attacker instructions instead — is fundamentally different from SQL injection or XSS. It exploits the architecture, not a bug.

WAFs can't parse semantic attacks. SAST tools don't understand prompt manipulation. DAST scanners miss indirect injection entirely. The entire traditional security toolchain has a blind spot exactly where organizations are deploying their newest and most sensitive applications.

Shadow AI is already inside your network

98% of organizations report unsanctioned AI use. Three out of four CISOs have discovered unauthorized generative AI tools already running in their environments. When security teams look, they find employees running local Ollama instances, teams deploying open-source chat interfaces, and business units standing up RAG pipelines — all outside any governance framework.

This isn't negligence. It's the natural result of AI tools being extraordinarily easy to deploy. A developer can have a fully functional LLM service running on a corporate laptop in under five minutes. An engineering team can deploy an internal chatbot to a Kubernetes cluster in an afternoon. None of these deployments require security team involvement, and few organizations have policies that require it.

The result: 86% of organizations have no inventory or visibility into where their AI services are connected or what data is exposed.

The vulnerability landscape is severe and growing

The OWASP Top 10 for LLM Applications identifies ten critical vulnerability categories that every LLM deployment faces:

Category

Risk

Prompt Injection

Attackers manipulate inputs to override system instructions, extract data, or hijack model behavior

Sensitive Information Disclosure

Models inadvertently expose confidential data from training sets, system prompts, or connected data sources

Supply Chain Vulnerabilities

Compromised models, plugins, or third-party components introduce backdoors or malicious behavior

Data and Model Poisoning

Corrupted training data leads to compromised or biased model behavior

Improper Output Handling

Inadequate validation of LLM-generated content enables downstream injection attacks

Excessive Agency

LLMs granted unchecked autonomy to execute actions, call APIs, or modify systems

System Prompt Leakage

Exposure of internal system prompts containing sensitive business logic or credentials

Vector and Embedding Weaknesses

Vulnerabilities in RAG systems and vector databases that enable data poisoning or extraction

Misinformation

False or misleading outputs with real-world business or legal consequences

Unbounded Consumption

Uncontrolled resource usage causing financial runaway or denial of service

High and critical-severity AI CVEs have grown from 20 in 2020 to over 640 in 2025. And unlike traditional vulnerabilities, 79% of critical AI findings go unresolved — organizations don't yet have the remediation playbooks.

Real-world failures are mounting

This isn't theoretical. Production LLM deployments have been compromised repeatedly:

  • A dealership chatbot was jailbroken into agreeing to sell a car for $1 and writing arbitrary code

  • A logistics company's chatbot was manipulated into producing profanity, forcing a public takedown

  • An airline chatbot provided false policy information that a court ruled the airline was liable for

  • 77% of enterprise employees who use AI have pasted company data into chatbot queries; 22% of those instances included confidential personal or financial data

  • AI coding assistants have been exploited via prompt injection to achieve remote code execution on developer machines

Regulation is coming fast

The EU AI Act is already in enforcement, with general-purpose AI obligations taking effect in August 2025 and full high-risk requirements by August 2026. Penalties reach up to EUR 35 million or 7% of worldwide annual turnover. NIST's AI Risk Management Framework is increasingly used as procurement criteria. Organizations deploying LLMs without understanding their attack surface face not just security risk, but regulatory exposure.


What Guard Does About It

Guard addresses the LLM attack surface with a two-stage pipeline: discover what's running, then test it for vulnerabilities.

Stage 1: LLM Service Discovery

The first challenge is visibility. You can't secure LLM services you don't know exist.

Guard continuously scans your external attack surface for LLM services running on HTTP/HTTPS endpoints. When it finds one, it identifies exactly what's running — not just "there's a web service here" but "this is an Ollama instance serving Llama 3 and Mistral models."

32 LLM platforms detected, organized by category:

Self-Hosted LLM Servers (15) Ollama, vLLM, LocalAI, llama.cpp, Hugging Face TGI, LM Studio, Aphrodite Engine, FastChat, GPT4All, Gradio, Jan, KoboldCpp, NVIDIA NIM, TabbyAPI, Text Generation WebUI

Gateway and Proxy Services (3) LiteLLM, Kong AI Gateway, Envoy AI Gateway

RAG and Orchestration Platforms (13) AnythingLLM, AstrBot, BetterChatGPT, Dify, Flowise, Hugging Face Chat UI, LibreChat, LobeHub, NextChat, Onyx, Open WebUI, SillyTavern

Cloud-Managed Services (1) Salesforce Einstein

Plus a generic OpenAI-compatible API fallback that catches any service implementing the OpenAI API standard.

How detection works:

Guard doesn't rely on simple banner matching. Each platform has a purpose-built detection probe that sends protocol-specific HTTP requests to service-characteristic endpoints and analyzes the responses against multi-factor signature rules — status codes, response body content, HTTP headers, and content types. A confidence scoring system (1-100) ensures high-specificity matches rank above generic detections.

When a service is identified, Guard also extracts the available models — querying the service's model listing endpoint to discover exactly which models are deployed (e.g., "llama3.1:70b", "gpt-4-turbo", "mistral-large").

Every discovered LLM service is registered as an asset in Guard with its service type, available models, and the API interaction parameters needed for the next stage.

Stage 2: LLM Vulnerability Scanning

Once Guard knows what LLM services are running, it tests them for real security weaknesses.

Guard's LLM vulnerability scanner executes 210+ adversarial attack probes across 47 categories against discovered services. These aren't theoretical checks — they're the actual attack techniques that red teams and threat actors use to compromise LLM deployments.

Vulnerability categories tested:

Category

What's Tested

Jailbreak Attacks

DAN, AIM, Grandma, ArtPrompts, and other techniques that bypass safety alignment to make the model produce prohibited content

Prompt Injection

Base64/ROT13/Morse encoding exploits, tag smuggling, prefix/suffix injection, payload splitting — techniques that slip malicious instructions past input filters

System Prompt Extraction

Attempts to make the model reveal its system prompt, which often contains sensitive business logic, API keys, or access credentials

Data Exfiltration

Link-based exfiltration, markdown injection, and techniques that trick the model into sending sensitive data to attacker-controlled endpoints

Context Manipulation

RAG poisoning, context overflow, and continuation attacks that exploit how models handle retrieved documents and conversation history

Multi-Turn Attacks

Crescendo (gradual escalation across conversation turns) and adaptive techniques that evade single-turn safety filters

Format Exploits

Markdown injection, YAML/JSON parsing attacks, ANSI escape sequences, and web injection (XSS via LLM output)

Safety Benchmark Evasion

Tests against established safety benchmarks to identify gaps in alignment and content filtering

How scanning works:

The scanner authenticates with each discovered LLM using the API parameters collected during discovery. It submits adversarial prompts and analyzes the model's responses using multiple detection strategies — pattern matching, classifier-based scoring, and LLM-as-judge evaluation. Each finding includes the specific prompt that triggered the vulnerability, the model's response, and a confidence score.

Findings are categorized by type (jailbreak, prompt injection, data exfiltration, etc.) and surfaced as risks in Guard with full proof data — so your security team can see exactly what worked, why it matters, and what to fix.


The Discovery-to-Testing Pipeline

External Attack Surface Scanning
  │
  ├─→ Port Scanning ─→ HTTP/HTTPS services discovered
  │     │
  │     └─→ LLM Service Discovery ─→ Service identified (e.g., Ollama on port 11434)
  │           ├─→ Platform detection (32 services, confidence scored)
  │           ├─→ Model enumeration (which models are deployed)
  │           └─→ API interaction parameters extracted
  │                 │
  │                 └─→ LLM Vulnerability Scanning
  │                       ├─→ Jailbreak testing ─→ Safety bypass found
  │                       ├─→ Prompt injection ─→ Instruction override found
  │                       ├─→ Data exfiltration ─→ Leakage path found
  │                       ├─→ System prompt extraction ─→ Credentials exposed
  │                       └─→ Format exploits ─→ XSS via LLM output found
  │
  └─→ Risk created per vulnerability category
        └─→ Proof includes: attack prompt, model response, confidence score

This pipeline runs automatically as part of Guard's external attack surface scanning. When a new LLM service appears on your perimeter, Guard discovers it, identifies it, and tests it — without manual intervention.


What Users See in the Platform

LLM Service Inventory

Every discovered LLM service appears as an asset with:

  • Service type (Ollama, vLLM, Dify, OpenAI-compatible, etc.)

  • Available models and their names

  • Confidence score for service identification

  • Associated endpoint and port information

LLM Vulnerability Findings

Each vulnerability finding includes:

  • Category — Jailbreak, prompt injection, data exfiltration, etc.

  • Severity — Based on the type and impact of the vulnerability

  • Proof — The exact prompt that triggered the vulnerability and the model's response

  • Confidence — How certain the detection is (0-1 score)

  • Remediation context — What the finding means and how to address it

Findings follow the same lifecycle as all Guard risks: Triage → Open → Remediated/Accepted.


Why This Matters Now

The window between "AI is being deployed everywhere" and "AI security is mature" is exactly when the most damage occurs. Organizations are in that window right now.

Every week, new LLM services appear on enterprise networks — some sanctioned, many not. Each one is a potential entry point for data exfiltration, a source of sensitive information disclosure, or a vector for downstream injection attacks. And unlike traditional web applications, these services are often deployed by teams with no security training, using frameworks with no security defaults, connected to data sources with no access controls.

Guard gives security teams the visibility and testing they need to close this gap:

  • Discovery — Find every LLM service running on your attack surface, including the ones nobody told you about

  • Identification — Know exactly what's running: the platform, the models, the API surface

  • Testing — Validate that each service can resist the actual attack techniques being used in the wild

  • Continuous monitoring — Detect new LLM deployments as they appear, not months later during an audit

The organizations that treat LLM security as an afterthought will learn the same lesson that every generation of technology adoption teaches: the attack surface you ignore is the one that gets exploited.