Provider Selection Guide¶
This guide helps you choose the best LLM provider based on your requirements.
Decision Flowchart¶
flowchart TD
A[Start] --> B{Privacy Required?}
B -->|Yes| C[Local Providers]
B -->|No| D{Budget?}
C --> C1{Hardware Available?}
C1 -->|GPU 8GB+| C2[Ollama / vLLM]
C1 -->|CPU Only| C3[Ollama with small models]
D -->|Free| E[Groq / Gemini Free]
D -->|Low $10-50/mo| F[Together / DeepSeek]
D -->|Production| G{Priority?}
G -->|Quality| H[OpenAI GPT-4o / Claude]
G -->|Speed| I[Groq / Cerebras]
G -->|Cost| J[GPT-4o-mini / Together]
G -->|Compliance| K[Azure / Bedrock] Quick Recommendations¶
By Primary Requirement¶
| Requirement | Provider | Model | Why |
|---|---|---|---|
| Getting Started | Groq | llama-3.3-70b-versatile | Free, fast, easy setup |
| Best Quality | Anthropic | claude-3.5-sonnet | Best reasoning, safe |
| Best Value | OpenAI | gpt-4o-mini | Quality/cost balance |
| Lowest Cost | DeepSeek | deepseek-v3 | $0.07/1M tokens |
| Fastest | Groq | llama-3.1-8b-instant | 500+ tokens/sec |
| Privacy | Ollama | llama3.2 | Fully local |
| Enterprise | Azure OpenAI | gpt-4o | SLAs, compliance |
| Long Context | gemini-1.5-pro | 2M token context |
By Industry¶
Recommended: GPT-4o-mini or Groq
- High volume review analysis
- Cost-effective
- Fast turnaround
Recommended: Azure OpenAI or Claude
- Compliance requirements
- High accuracy needed
- Audit trails
Recommended: Local (Ollama/vLLM) or Azure
- HIPAA compliance
- Data privacy
- No external API calls
Recommended: Groq or Together
- High volume
- Fast responses
- Cost-effective
Cost Optimization¶
Free Options¶
-
Groq Free Tier
- 30 requests/min
- 100K tokens/day
- Great for development
-
Google Gemini Free
- 15 requests/min
- Good for low volume
-
Ollama (Local)
- Unlimited (hardware cost only)
- Best for privacy
Budget Tiers¶
| Monthly Budget | Recommended Setup |
|---|---|
| $0 | Groq free + Ollama fallback |
| $10-25 | DeepSeek + Groq |
| $25-50 | Together AI + DeepSeek |
| $50-100 | GPT-4o-mini primary |
| $100-500 | GPT-4o + Claude mix |
| $500+ | Enterprise (Azure/Bedrock) |
Cost Comparison Table¶
| Provider | Model | 1M Input | 1M Output | 10M Reviews* |
|---|---|---|---|---|
| Groq | LLaMA 3.3 70B | Free | Free | $0 |
| DeepSeek | V3 | $0.07 | $0.27 | ~$5 |
| Together | LLaMA 70B | $0.88 | $0.88 | ~$15 |
| OpenAI | GPT-4o-mini | $0.15 | $0.60 | ~$10 |
| OpenAI | GPT-4o | $2.50 | $10.00 | ~$175 |
| Anthropic | Claude 3.5 | $3.00 | $15.00 | ~$250 |
*Estimated for typical sentiment analysis workload
Quality Comparison¶
Sentiment Analysis Accuracy¶
Based on internal benchmarks:
| Provider | Model | Accuracy | F1 Score |
|---|---|---|---|
| Anthropic | claude-3.5-sonnet | 94.2% | 0.941 |
| OpenAI | gpt-4o | 93.8% | 0.936 |
| OpenAI | gpt-4o-mini | 92.1% | 0.918 |
| Groq | llama-3.3-70b | 91.5% | 0.912 |
| DeepSeek | v3 | 90.8% | 0.904 |
| Together | llama-70b | 90.5% | 0.901 |
| Ollama | llama3.2 | 88.2% | 0.878 |
Summarization Quality¶
| Provider | Coherence | Accuracy | Conciseness |
|---|---|---|---|
| Claude 3.5 Sonnet | Excellent | Excellent | Excellent |
| GPT-4o | Excellent | Excellent | Good |
| GPT-4o-mini | Good | Good | Good |
| LLaMA 3.3 70B | Good | Good | Good |
| LLaMA 3.2 | Fair | Fair | Good |
Latency Comparison¶
Average response time for typical requests:
| Provider | Model | First Token | Full Response |
|---|---|---|---|
| Groq | LLaMA 8B | 50ms | 200ms |
| Groq | LLaMA 70B | 100ms | 400ms |
| Cerebras | LLaMA 70B | 80ms | 300ms |
| OpenAI | GPT-4o-mini | 200ms | 800ms |
| OpenAI | GPT-4o | 300ms | 1.2s |
| Anthropic | Claude 3.5 | 250ms | 1.0s |
| Ollama (GPU) | LLaMA 8B | 150ms | 500ms |
| Ollama (CPU) | LLaMA 8B | 500ms | 3s |
Feature Requirements¶
Vision Support¶
For analyzing images (product photos, screenshots):
- OpenAI: GPT-4o, GPT-4-turbo
- Anthropic: Claude 3.5 Sonnet, Claude 3
- Google: Gemini 1.5 Pro, Gemini 2.0
- Together: LLaVA models
- Ollama: LLaVA, BakLLaVA
Long Context¶
For analyzing many reviews at once:
| Provider | Model | Context |
|---|---|---|
| gemini-1.5-pro | 2M tokens | |
| Anthropic | claude-3.5 | 200K tokens |
| OpenAI | gpt-4o | 128K tokens |
| Groq | llama-3.3-70b | 128K tokens |
Structured Output¶
For reliable JSON responses:
- Best: OpenAI (JSON mode), Anthropic
- Good: Groq, Together, Fireworks
- Variable: Local models (depends on prompting)
Fallback Strategies¶
Recommended Fallback Chain¶
from sentimatrix.config import SentimatrixConfig, LLMConfig
config = SentimatrixConfig(
llm=LLMConfig(
provider="groq",
model="llama-3.3-70b-versatile",
fallback=[
# Fallback 1: Another fast provider
{"provider": "together", "model": "meta-llama/Llama-3-70b-chat-hf"},
# Fallback 2: Premium provider
{"provider": "openai", "model": "gpt-4o-mini"},
# Fallback 3: Local (always available)
{"provider": "ollama", "model": "llama3.2"},
]
)
)
By Reliability Requirement¶
Summary Recommendations¶
For Most Users¶
Start with Groq - Free tier, fast, good quality:
For Production¶
Use GPT-4o-mini - Best balance of quality, cost, reliability:
For Best Quality¶
Use Claude 3.5 Sonnet - Best reasoning and safety:
For Privacy¶
Use Ollama - Fully local, no data leaves your machine: