Eldric Multi-API - Global AI Infrastructure & Orchestration

Build, deploy, and orchestrate AI infrastructure at any scale. Controllers for orchestration, routers for load balancing, workers for inference, and edge gateways for external access.

What Can You Build?

Small Team / Startup

1 controller + 1 router + 2 workers. Internal API access. Cost: Your existing hardware.

Growing Company

Add Edge gateway for OpenWebUI. Scale workers. AI-powered routing for optimization.

Enterprise

Multi-region with secondary controllers. Geo-aware routing, automatic failover, compliance.

Global Scale

CDN for AI. Primary controller orchestrating worldwide. 50+ edge locations, 100+ workers.

Architecture

Ports

443

Edge

8880

Controller

8881

Router

8890

Worker

8895

Data Worker

Edge Gateway Features

Secure external access with TLS, API key authentication, and rate limiting.

Deployment Scenarios

1. Quick Start (Single Machine)

Development setup on localhost

2. Team Setup (3 Servers)

Small team with dedicated GPUs

3. Production with Edge Gateway

HTTPS, API keys, rate limiting

4. AI-Powered Smart Routing

LLM-powered worker selection

5. Config Sync Across Cluster

Automatic configuration distribution

6. Multi-Region Deployment

Global HA with region-aware routing

Routing Strategies

Round Robin

Simple rotation
Even distribution
No overhead
Best for uniform load

Least Connections

Fewest active requests
Dynamic balancing
Prevents queue buildup
Good for varied tasks

Load Based

Real-time metrics
CPU/memory aware
Prevent hotspots
Default strategy

Latency Based

Fastest response time
Adaptive routing
Best for real-time
Tracks P95 latency

Random

Simple selection
No state needed
Good for testing
Minimal overhead

AI Routing

LLM-powered decisions
Context-aware selection
Model matching
Self-optimizing

# Configure routing strategy on router ./eldric-router --strategy load_based --controller http://controller:8880 # Available strategies: round_robin, least_connections, load_based, latency_based, random, ai_routing

Worker Backend Connectivity

Each worker connects to one or more AI inference backends. Workers automatically discover available models and report capabilities to the controller. Mix different backends across your cluster for optimal resource utilization.

Local & Self-Hosted Backends

Ollama

Port: 11434
REST API
Auto model discovery
Default backend

vLLM

Port: 8000
OpenAI-compatible
PagedAttention
High throughput

llama.cpp

Port: 8080
REST + WebSocket
GGUF models
CPU + GPU

HuggingFace TGI

Port: 8080
REST + gRPC
Tensor parallelism
Continuous batching

LocalAI

Port: 8080
OpenAI-compatible
Multiple formats
CPU optimized

ExLlamaV2

Port: 5000
REST API
GPTQ/EXL2 quants
Fast inference

LMDeploy

Port: 23333
OpenAI-compatible
TurboMind engine
Quantization

MLC LLM

Port: 8080
REST API
Universal deploy
WebGPU support

Enterprise & ML Platforms

NVIDIA Triton

Port: 8000-8002
REST + gRPC
TensorRT optimization
Multi-framework

NVIDIA NIM

Port: 8000
OpenAI-compatible
Optimized containers
Enterprise ready

TensorFlow Serving

Port: 8501/8500
REST + gRPC
Model versioning
Batch prediction

TorchServe

Port: 8080/8081
REST + gRPC
PyTorch native
Model archive

ONNX Runtime

Port: 8001
REST + gRPC
Cross-platform
Hardware agnostic

DeepSpeed-MII

Port: 28080
REST API
ZeRO-Inference
Low latency

BentoML

Port: 3000
REST + gRPC
Model packaging
Adaptive batching

Ray Serve

Port: 8000
REST API
Auto-scaling
Distributed

Cloud AI Services

AWS SageMaker

HTTPS endpoint
REST API
Auto-scaling
Multi-model

Azure ML

HTTPS endpoint
REST + SDK
Managed compute
MLflow integration

Google Vertex AI

HTTPS endpoint
REST + gRPC
TPU support
Model Garden

Groq

HTTPS API
OpenAI-compatible
LPU inference
Ultra-fast

Together AI

HTTPS API
OpenAI-compatible
Open models
Fine-tuning

Fireworks AI

HTTPS API
OpenAI-compatible
Fast inference
Function calling

Anyscale

HTTPS API
OpenAI-compatible
Ray-based
Scalable

Replicate

HTTPS API
REST API
Model hosting
Pay-per-use

Specialized & Platform-Specific

MLX (Apple Silicon)

Port: 8080
REST API
Metal acceleration
Unified memory

Mistral AI

HTTPS API
OpenAI-compatible
Mistral models
Function calling

Cohere

HTTPS API
REST API
Embeddings
Rerank

Anthropic

HTTPS API
REST API
Claude models
Tool use

OpenAI

HTTPS API
REST API
GPT models
Assistants

KServe

Port: 8080
REST + gRPC
Kubernetes native
Serverless

Seldon Core

Port: 9000
REST + gRPC
ML deployment
A/B testing

OpenAI-Compatible

Any port
Custom endpoints
API key auth
Drop-in support

Worker Configuration Examples

# Connect worker to local Ollama ./eldric-workerd --controller http://ctrl:8880 --backend ollama \ --backend-url http://localhost:11434 # Connect worker to vLLM server ./eldric-workerd --controller http://ctrl:8880 --backend vllm \ --backend-url http://gpu-server:8000 --models mixtral:8x7b # Connect worker to NVIDIA Triton ./eldric-workerd --controller http://ctrl:8880 --backend triton \ --backend-url http://triton-server:8000 --models llama3.1:70b # Connect worker to TensorFlow Serving ./eldric-workerd --controller http://ctrl:8880 --backend tensorflow \ --backend-url http://tf-server:8501 --models gpt-neo # Connect worker to HuggingFace TGI ./eldric-workerd --controller http://ctrl:8880 --backend tgi \ --backend-url http://tgi-server:8080 --models codellama:34b # Connect worker to MLX on Mac ./eldric-workerd --controller http://ctrl:8880 --backend mlx \ --backend-url http://localhost:8080 --models llama3.2:3b # Connect to any OpenAI-compatible endpoint ./eldric-workerd --controller http://ctrl:8880 --backend openai \ --backend-url https://api.example.com/v1 --api-key $API_KEY