Julius: LLM Service Fingerprinting

Introducing Julius: Open Source LLM Service Fingerprinting

The Growing Shadow AI Problem

Over 14,000 Ollama server instances are publicly accessible on the internet right now. A recent Cisco analysis found that 20% of these actively host models susceptible to unauthorized access. Separately, BankInfoSecurity reported discovering more than 10,000 Ollama servers with no authentication layer—the result of hurried AI deployments by developers under pressure.

This is the new shadow IT: developers spinning up local LLM servers for productivity, unaware they’ve exposed sensitive infrastructure to the internet. And Ollama is just one of dozens of AI serving platforms proliferating across enterprise networks.

The security question is no longer “are we running AI?” but “where is AI running that we don’t know about?”

What is LLM Service Fingerprinting?

LLM service fingerprinting identifies what server software is running on a network endpoint—not which AI model generated text, but which infrastructure is serving it.

Julius answers the question: "Is this HTTP service an LLM?" During a penetration test or attack surface assessment, you’ve found an open port. Is it Ollama? vLLM? A Hugging Face deployment? Some enterprise AI gateway? Julius tells you in seconds.

Julius follows the Unix philosophy: do one thing and do it well. It doesn’t port scan. It doesn’t vulnerability scan. It identifies LLM services—nothing more, nothing less.

This design enables Julius to slot into existing security toolchains rather than replace them.

Introducing Julius

Julius is an open-source LLM service fingerprinting tool that detects 17+ AI platforms through active HTTP probing. Built in Go, it compiles to a single binary with no external dependencies.

Probes Included in Initial Release:

Self-Hosted LLM Servers

Service	Default Port	Description
Ollama	11434	Popular local LLM server with easy model management
vLLM	8000	High-throughput LLM inference engine
SGLang	30000	High-performance LLM serving engine
LocalAI	8080	OpenAI-compatible local AI server
llama.cpp	8080	CPU-optimized LLM inference
Hugging Face TGI	3000	Text Generation Inference server
NVIDIA NIM	8000	NVIDIA's enterprise inference microservices
NVIDIA TensorRT-LLM	8000	NVIDIA TensorRT-LLM inference server
NVIDIA Triton	8000	NVIDIA Triton Inference Server (KServe v2)
BentoML	3000	AI application framework for serving models
Ray Serve	8265	Scalable model serving on Ray cluster
Aphrodite Engine	2242	Large-scale LLM inference engine
Baseten Truss	8080	Open-source ML model serving framework
DeepSpeed-MII	28080	High-throughput inference powered by DeepSpeed
FastChat	21001	Open platform for LLM chatbots
GPT4All	4891	Run local models on any device
Gradio	7860	ML model demo interfaces
Jan	1337	Local OpenAI-compatible API server
KoboldCpp	5001	AI text-generation for GGML/GGUF models
LM Studio	1234	Desktop LLM application with API server
MLC LLM	8000	Universal deployment engine with ML compilation
Petals	5000	Decentralized BitTorrent-style LLM inference
PowerInfer	8080	CPU/GPU hybrid inference engine
TabbyAPI	5000	FastAPI-based server for ExLlama
Text Generation WebUI	5000	Local LLM interface with API

Proxy & Gateway Services

Service	Default Port	Service
LiteLLM	4000	Unified proxy for 100+ LLM providers
Bifrost	8080	High-performance unified LLM gateway
Envoy AI Gateway	80	Unified access to generative AI services
Helicone	8585	Open-source LLM observability platform and gateway
Kong AI Gateway	8001	Enterprise API gateway with AI plugins
OmniRoute	20128	AI gateway with smart routing and caching
Portkey AI Gateway	8787	Unified gateway for 200+ LLM providers
TensorZero	3000	Rust-based LLM gateway with observability

Cloud-Managed Services

Service	Default Port	Description
AWS Bedrock	443	Foundation model hosting and inference
Azure OpenAI	443	Microsoft Azure OpenAI Service
Cloudflare AI Gateway	443	AI proxy with caching and observability
Databricks Model Serving	443	Real-time ML inference endpoints
Fireworks AI	443	Cloud inference platform for LLMs
Google Vertex AI	443	ML training and generative AI platform
Groq	443	LPU-accelerated cloud inference
Modal	443	Serverless AI compute platform
Replicate	443	Cloud ML platform with prediction API
Salesforce Einstein	443	Salesforce AI platform
Together AI	443	Cloud inference for open-source models

RAG & Orchestration Platforms

Service	Default Port	Description
AnythingLLM	3001	All-in-one AI application with RAG and agents
AstrBot	6185	Multi-platform LLM chatbot framework
BetterChatGPT	3000	Enhanced ChatGPT interface
Dify	80	LLM app development platform with workflow orchestration
Flowise	3000	Low-code platform for AI agents and workflows
h2oGPT	7860	Private local GPT with document Q&A
HuggingFace Chat UI	3000	Open source ChatGPT-style interface
Langflow	7860	Low-code platform for AI agents and RAG
LibreChat	3080	Multi-provider chat interface with RAG
LobeHub	3210	Multi-agent AI collaboration platform
NextChat	3000	Self-hosted ChatGPT-style interface
Onyx	3000	Enterprise search and chat with RAG
OpenClaw	18789	AI agent gateway and control plane
Open WebUI	3000	ChatGPT-style interface for local LLMs
PrivateGPT	8001	Private document Q&A with LLMs
Quivr	5050	RAG platform for AI assistants
RAGFlow	80	RAG engine with deep document understanding
SillyTavern	8000	Character-based chat application

Generic Detection

Service	Description
OpenAI-compatible	Any server implementing OpenAI's API specification

What's Next

Julius is the first tool release of our “The 12 Caesars” open source tool campaign where we will be releasing one open source tool per week for the next 12 weeks. Julius focuses on HTTP-based fingerprinting of known LLM services. We’re already working on expanding its capabilities while maintaining the lightweight, fast execution that makes it practical for large-scale reconnaissance.

On our roadmap: additional probes for cloud-hosted LLM services, smarter detection of custom integrations, and the ability to analyze HTTP traffic patterns to identify LLM usage that doesn’t follow standard API conventions. We’re also exploring how Julius can work alongside AI agents to autonomously discover LLM infrastructure across complex environments.

Contributing & Community

Julius is available now under the Apache 2.0 license at https://github.com/praetorian-inc/julius

We welcome contributions from the community. Whether you’re adding probes for services we haven’t covered, reporting bugs, or suggesting new features, check the repository’s CONTRIBUTING.md for guidance on probe definitions and development workflow.

Ready to start? Clone the repository, experiment with Julius in your environment, and join the discussion on GitHub. We’re excited to see how the security community uses this tool in real-world reconnaissance workflows. Star the project if you find it useful, and let us know what LLM services you’d like to see supported next.

For more information on Julius, check out the Praetorian Blog!