Both Eldric Client and Multi-API support multiple backends. Mix local inference with cloud APIs across your infrastructure.
Local & Self-Hosted
Ollama
- Port: 11434
- REST API
- Auto model discovery
- Default backend
vLLM
- Port: 8000
- OpenAI-compatible
- PagedAttention
- High throughput
llama.cpp
- Port: 8080
- REST + WebSocket
- GGUF models
- CPU + GPU
HuggingFace TGI
- Port: 8080
- REST + gRPC
- Tensor parallelism
- Continuous batching
LocalAI
- Port: 8080
- OpenAI-compatible
- Multiple formats
- CPU optimized
ExLlamaV2
- Port: 5000
- REST API
- GPTQ/EXL2 quants
- Fast inference
LMDeploy
- Port: 23333
- OpenAI-compatible
- TurboMind engine
- Quantization
MLC LLM
- Port: 8080
- REST API
- Universal deploy
- WebGPU support
Enterprise & ML Platforms
NVIDIA Triton
- Port: 8000-8002
- REST + gRPC
- TensorRT optimization
- Multi-framework
NVIDIA NIM
- Port: 8000
- OpenAI-compatible
- Optimized containers
- Enterprise ready
TensorFlow Serving
- Port: 8501/8500
- REST + gRPC
- Model versioning
- Batch prediction
TorchServe
- Port: 8080/8081
- REST + gRPC
- PyTorch native
- Model archive
ONNX Runtime
- Port: 8001
- REST + gRPC
- Cross-platform
- Hardware agnostic
DeepSpeed-MII
- Port: 28080
- REST API
- ZeRO-Inference
- Low latency
BentoML
- Port: 3000
- REST + gRPC
- Model packaging
- Adaptive batching
Ray Serve
- Port: 8000
- REST API
- Auto-scaling
- Distributed
Cloud AI Services
AWS SageMaker
- HTTPS endpoint
- REST API
- Auto-scaling
- Multi-model
AWS Bedrock
- HTTPS endpoint
- REST API
- Foundation models
- Managed service
Azure ML
- HTTPS endpoint
- REST + SDK
- Managed compute
- MLflow integration
Azure OpenAI
- HTTPS endpoint
- OpenAI-compatible
- Enterprise security
- Regional deploy
Google Vertex AI
- HTTPS endpoint
- REST + gRPC
- TPU support
- Model Garden
Groq
- HTTPS API
- OpenAI-compatible
- LPU inference
- Ultra-fast
Together AI
- HTTPS API
- OpenAI-compatible
- Open models
- Fine-tuning
Fireworks AI
- HTTPS API
- OpenAI-compatible
- Fast inference
- Function calling
Anyscale
- HTTPS API
- OpenAI-compatible
- Ray-based
- Scalable
Replicate
- HTTPS API
- REST API
- Model hosting
- Pay-per-use
Model Provider APIs
OpenAI
- HTTPS API
- REST API
- GPT-4, GPT-4o
- Assistants API
Anthropic
- HTTPS API
- REST API
- Claude models
- Tool use
Google Gemini
- HTTPS API
- REST API
- Gemini Pro/Ultra
- Multimodal
Mistral AI
- HTTPS API
- OpenAI-compatible
- Mistral/Mixtral
- Function calling
Cohere
- HTTPS API
- REST API
- Command models
- Embeddings + Rerank
AI21 Labs
- HTTPS API
- REST API
- Jurassic models
- Specialized tasks
Specialized & Platform-Specific
MLX (Apple Silicon)
- Port: 8080
- REST API
- Metal acceleration
- Unified memory
KServe
- Port: 8080
- REST + gRPC
- Kubernetes native
- Serverless
Seldon Core
- Port: 9000
- REST + gRPC
- ML deployment
- A/B testing
OpenAI-Compatible
- Any port
- Custom endpoints
- API key auth
- Drop-in support
Backend by Use Case
| Use Case |
Recommended Backends |
Why |
| Development |
Ollama, LocalAI, LMDeploy |
Easy setup, free, local |
| Production API |
vLLM, TGI, Triton, NIM |
High throughput, batching, enterprise |
| Edge / IoT |
llama.cpp, MLC LLM, ExLlamaV2 |
CPU inference, small footprint, quantized |
| Apple Silicon |
MLX, Ollama, MLC LLM |
Metal acceleration, unified memory |
| Low Latency |
Groq, Fireworks, DeepSpeed-MII |
Optimized hardware, fast inference |
| Enterprise Cloud |
Azure OpenAI, Bedrock, Vertex AI |
Compliance, SLA, managed |
| Open Models |
Together AI, Anyscale, Replicate |
Llama, Mistral, open weights |
| Kubernetes |
KServe, Seldon, Ray Serve |
Cloud-native, auto-scaling |
Availability
Eldric Client (CLI + GUI): Ollama, vLLM, llama.cpp, TGI, MLX, OpenAI-compatible endpoints
Eldric Multi-API: All 32+ backends with unified API, load balancing, and failover