Complete AI Infrastructure — From laptop to planet-scale deployment
Build, deploy, and orchestrate AI infrastructure at any scale. Controllers for orchestration, routers for load balancing, workers for inference, and edge gateways for external access.
What Can You Build?
Small Team / Startup
1 controller + 1 router + 2 workers. Internal API access. Cost: Your existing hardware.
Growing Company
Add Edge gateway for OpenWebUI. Scale workers. AI-powered routing for optimization.
Enterprise
Multi-region with secondary controllers. Geo-aware routing, automatic failover, compliance.
Secure external access with TLS, API key authentication, and rate limiting.
Deployment Scenarios
1. Quick Start (Single Machine)
Development setup on localhost
2. Team Setup (3 Servers)
Small team with dedicated GPUs
3. Production with Edge Gateway
HTTPS, API keys, rate limiting
4. AI-Powered Smart Routing
LLM-powered worker selection
5. Config Sync Across Cluster
Automatic configuration distribution
6. Multi-Region Deployment
Global HA with region-aware routing
Routing Strategies
Round Robin
Simple rotation
Even distribution
No overhead
Best for uniform load
Least Connections
Fewest active requests
Dynamic balancing
Prevents queue buildup
Good for varied tasks
Load Based
Real-time metrics
CPU/memory aware
Prevent hotspots
Default strategy
Latency Based
Fastest response time
Adaptive routing
Best for real-time
Tracks P95 latency
Random
Simple selection
No state needed
Good for testing
Minimal overhead
AI Routing
LLM-powered decisions
Context-aware selection
Model matching
Self-optimizing
# Configure routing strategy on router
./eldric-router --strategy load_based --controller http://controller:8880
# Available strategies: round_robin, least_connections, load_based, latency_based, random, ai_routing
Worker Backend Connectivity
Each worker connects to one or more AI inference backends. Workers automatically discover available models and report capabilities to the controller. Mix different backends across your cluster for optimal resource utilization.
Local & Self-Hosted Backends
Ollama
Port: 11434
REST API
Auto model discovery
Default backend
vLLM
Port: 8000
OpenAI-compatible
PagedAttention
High throughput
llama.cpp
Port: 8080
REST + WebSocket
GGUF models
CPU + GPU
HuggingFace TGI
Port: 8080
REST + gRPC
Tensor parallelism
Continuous batching
LocalAI
Port: 8080
OpenAI-compatible
Multiple formats
CPU optimized
ExLlamaV2
Port: 5000
REST API
GPTQ/EXL2 quants
Fast inference
LMDeploy
Port: 23333
OpenAI-compatible
TurboMind engine
Quantization
MLC LLM
Port: 8080
REST API
Universal deploy
WebGPU support
Enterprise & ML Platforms
NVIDIA Triton
Port: 8000-8002
REST + gRPC
TensorRT optimization
Multi-framework
NVIDIA NIM
Port: 8000
OpenAI-compatible
Optimized containers
Enterprise ready
TensorFlow Serving
Port: 8501/8500
REST + gRPC
Model versioning
Batch prediction
TorchServe
Port: 8080/8081
REST + gRPC
PyTorch native
Model archive
ONNX Runtime
Port: 8001
REST + gRPC
Cross-platform
Hardware agnostic
DeepSpeed-MII
Port: 28080
REST API
ZeRO-Inference
Low latency
BentoML
Port: 3000
REST + gRPC
Model packaging
Adaptive batching
Ray Serve
Port: 8000
REST API
Auto-scaling
Distributed
Cloud AI Services
AWS SageMaker
HTTPS endpoint
REST API
Auto-scaling
Multi-model
Azure ML
HTTPS endpoint
REST + SDK
Managed compute
MLflow integration
Google Vertex AI
HTTPS endpoint
REST + gRPC
TPU support
Model Garden
Groq
HTTPS API
OpenAI-compatible
LPU inference
Ultra-fast
Together AI
HTTPS API
OpenAI-compatible
Open models
Fine-tuning
Fireworks AI
HTTPS API
OpenAI-compatible
Fast inference
Function calling
Anyscale
HTTPS API
OpenAI-compatible
Ray-based
Scalable
Replicate
HTTPS API
REST API
Model hosting
Pay-per-use
Specialized & Platform-Specific
MLX (Apple Silicon)
Port: 8080
REST API
Metal acceleration
Unified memory
Mistral AI
HTTPS API
OpenAI-compatible
Mistral models
Function calling
Cohere
HTTPS API
REST API
Embeddings
Rerank
Anthropic
HTTPS API
REST API
Claude models
Tool use
OpenAI
HTTPS API
REST API
GPT models
Assistants
KServe
Port: 8080
REST + gRPC
Kubernetes native
Serverless
Seldon Core
Port: 9000
REST + gRPC
ML deployment
A/B testing
OpenAI-Compatible
Any port
Custom endpoints
API key auth
Drop-in support
Worker Configuration Examples
# Connect worker to local Ollama
./eldric-workerd --controller http://ctrl:8880 --backend ollama \
--backend-url http://localhost:11434
# Connect worker to vLLM server
./eldric-workerd --controller http://ctrl:8880 --backend vllm \
--backend-url http://gpu-server:8000 --models mixtral:8x7b
# Connect worker to NVIDIA Triton
./eldric-workerd --controller http://ctrl:8880 --backend triton \
--backend-url http://triton-server:8000 --models llama3.1:70b
# Connect worker to TensorFlow Serving
./eldric-workerd --controller http://ctrl:8880 --backend tensorflow \
--backend-url http://tf-server:8501 --models gpt-neo
# Connect worker to HuggingFace TGI
./eldric-workerd --controller http://ctrl:8880 --backend tgi \
--backend-url http://tgi-server:8080 --models codellama:34b
# Connect worker to MLX on Mac
./eldric-workerd --controller http://ctrl:8880 --backend mlx \
--backend-url http://localhost:8080 --models llama3.2:3b
# Connect to any OpenAI-compatible endpoint
./eldric-workerd --controller http://ctrl:8880 --backend openai \
--backend-url https://api.example.com/v1 --api-key $API_KEY
OpenAI-Compatible API
Drop-in replacement for OpenAI API. Works with any compatible tool.
Tools
OpenWebUI
LangChain
LlamaIndex
Cursor
Endpoints
/v1/chat/completions
/v1/completions
/v1/models
/v1/embeddings
Features
Streaming
Tool calling
JSON mode
Vision
Admin
/api/v1/workers
/api/v1/routers
/api/v1/metrics
/api/v1/config
Data Worker Integration
AI backends can directly query databases through Data Workers for real-time data access.
Direct Database Access
AI Workers query Data Workers via REST API
Connection pooling for high throughput
Schema discovery for context-aware queries
Parameterized queries prevent SQL injection
Supported Databases
SQLite - Built-in, local databases
PostgreSQL - Enterprise analytics
MySQL/MariaDB - Web applications
IBM DB2 - Mainframe and z/OS
# AI Worker requests data from database
curl -X POST http://data-worker:8895/api/v1/data/query \
-H "Content-Type: application/json" \
-d '{"source_id":"warehouse","sql":"SELECT * FROM sales WHERE region=$1","params":["EMEA"]}'