Julius: LLM Service Fingerprinting

Julius: LLM Service Fingerprinting

Introducing Julius: Open Source LLM Service Fingerprinting

The Growing Shadow AI Problem

Over 14,000 Ollama server instances are publicly accessible on the internet right now. A recent Cisco analysis found that 20% of these actively host models susceptible to unauthorized access. Separately, BankInfoSecurity reported discovering more than 10,000 Ollama servers with no authentication layer—the result of hurried AI deployments by developers under pressure.

This is the new shadow IT: developers spinning up local LLM servers for productivity, unaware they’ve exposed sensitive infrastructure to the internet. And Ollama is just one of dozens of AI serving platforms proliferating across enterprise networks.

The security question is no longer “are we running AI?” but “where is AI running that we don’t know about?”

What is LLM Service Fingerprinting?

LLM service fingerprinting identifies what server software is running on a network endpoint—not which AI model generated text, but which infrastructure is serving it.

Julius answers the question: "Is this HTTP service an LLM?" During a penetration test or attack surface assessment, you’ve found an open port. Is it Ollama? vLLM? A Hugging Face deployment? Some enterprise AI gateway? Julius tells you in seconds.

Julius follows the Unix philosophy: do one thing and do it well. It doesn’t port scan. It doesn’t vulnerability scan. It identifies LLM services—nothing more, nothing less.

This design enables Julius to slot into existing security toolchains rather than replace them.

Introducing Julius

Julius is an open-source LLM service fingerprinting tool that detects 17+ AI platforms through active HTTP probing. Built in Go, it compiles to a single binary with no external dependencies.

Probes Included in Initial Release:

Self-Hosted LLM Servers

Service

Default Port

Description

Ollama

11434

Popular local LLM server with easy model management

vLLM

8000

High-throughput LLM inference engine

SGLang

30000

High-performance LLM serving engine

LocalAI

8080

OpenAI-compatible local AI server

llama.cpp

8080

CPU-optimized LLM inference

Hugging Face TGI

3000

Text Generation Inference server

NVIDIA NIM

8000

NVIDIA's enterprise inference microservices

NVIDIA TensorRT-LLM

8000

NVIDIA TensorRT-LLM inference server

NVIDIA Triton

8000

NVIDIA Triton Inference Server (KServe v2)

BentoML

3000

AI application framework for serving models

Ray Serve

8265

Scalable model serving on Ray cluster

Aphrodite Engine

2242

Large-scale LLM inference engine

Baseten Truss

8080

Open-source ML model serving framework

DeepSpeed-MII

28080

High-throughput inference powered by DeepSpeed

FastChat

21001

Open platform for LLM chatbots

GPT4All

4891

Run local models on any device

Gradio

7860

ML model demo interfaces

Jan

1337

Local OpenAI-compatible API server

KoboldCpp

5001

AI text-generation for GGML/GGUF models

LM Studio

1234

Desktop LLM application with API server

MLC LLM

8000

Universal deployment engine with ML compilation

Petals

5000

Decentralized BitTorrent-style LLM inference

PowerInfer

8080

CPU/GPU hybrid inference engine

TabbyAPI

5000

FastAPI-based server for ExLlama

Text Generation WebUI

5000

Local LLM interface with API


Proxy & Gateway Services

Service

Default Port

Service

LiteLLM

4000

Unified proxy for 100+ LLM providers

Bifrost

8080

High-performance unified LLM gateway

Envoy AI Gateway

80

Unified access to generative AI services

Helicone

8585

Open-source LLM observability platform and gateway

Kong AI Gateway

8001

Enterprise API gateway with AI plugins

OmniRoute

20128

AI gateway with smart routing and caching

Portkey AI Gateway

8787

Unified gateway for 200+ LLM providers

TensorZero

3000

Rust-based LLM gateway with observability


Cloud-Managed Services

Service

Default Port

Description

AWS Bedrock

443

Foundation model hosting and inference

Azure OpenAI

443

Microsoft Azure OpenAI Service

Cloudflare AI Gateway

443

AI proxy with caching and observability

Databricks Model Serving

443

Real-time ML inference endpoints

Fireworks AI

443

Cloud inference platform for LLMs

Google Vertex AI

443

ML training and generative AI platform

Groq

443

LPU-accelerated cloud inference

Modal

443

Serverless AI compute platform

Replicate

443

Cloud ML platform with prediction API

Salesforce Einstein

443

Salesforce AI platform

Together AI

443

Cloud inference for open-source models


RAG & Orchestration Platforms

Service

Default Port

Description

AnythingLLM

3001

All-in-one AI application with RAG and agents

AstrBot

6185

Multi-platform LLM chatbot framework

BetterChatGPT

3000

Enhanced ChatGPT interface

Dify

80

LLM app development platform with workflow orchestration

Flowise

3000

Low-code platform for AI agents and workflows

h2oGPT

7860

Private local GPT with document Q&A

HuggingFace Chat UI

3000

Open source ChatGPT-style interface

Langflow

7860

Low-code platform for AI agents and RAG

LibreChat

3080

Multi-provider chat interface with RAG

LobeHub

3210

Multi-agent AI collaboration platform

NextChat

3000

Self-hosted ChatGPT-style interface

Onyx

3000

Enterprise search and chat with RAG

OpenClaw

18789

AI agent gateway and control plane

Open WebUI

3000

ChatGPT-style interface for local LLMs

PrivateGPT

8001

Private document Q&A with LLMs

Quivr

5050

RAG platform for AI assistants

RAGFlow

80

RAG engine with deep document understanding

SillyTavern

8000

Character-based chat application


Generic Detection

Service

Description

OpenAI-compatible

Any server implementing OpenAI's API specification

What's Next

Julius is the first tool release of our “The 12 Caesars” open source tool campaign where we will be releasing one open source tool per week for the next 12 weeks. Julius focuses on HTTP-based fingerprinting of known LLM services. We’re already working on expanding its capabilities while maintaining the lightweight, fast execution that makes it practical for large-scale reconnaissance.

On our roadmap: additional probes for cloud-hosted LLM services, smarter detection of custom integrations, and the ability to analyze HTTP traffic patterns to identify LLM usage that doesn’t follow standard API conventions. We’re also exploring how Julius can work alongside AI agents to autonomously discover LLM infrastructure across complex environments.

Contributing & Community

Julius is available now under the Apache 2.0 license at https://github.com/praetorian-inc/julius

We welcome contributions from the community. Whether you’re adding probes for services we haven’t covered, reporting bugs, or suggesting new features, check the repository’s CONTRIBUTING.md for guidance on probe definitions and development workflow.

Ready to start? Clone the repository, experiment with Julius in your environment, and join the discussion on GitHub. We’re excited to see how the security community uses this tool in real-world reconnaissance workflows. Star the project if you find it useful, and let us know what LLM services you’d like to see supported next.

For more information on Julius, check out the Praetorian Blog!