Groq¶
Groq provides ultra-fast LLM inference using custom hardware (LPUs). It offers a generous free tier, making it perfect for getting started with Sentimatrix.
Stable
Quick Facts¶
| Property | Value |
|---|---|
| Speed | Ultra-fast (100+ tokens/s) |
| Free Tier | Yes (generous limits) |
| Models | LLaMA 3.3, Mixtral, Gemma |
| Streaming | Supported |
| Functions | Supported |
| Vision | Not supported |
Setup¶
Get API Key¶
- Go to console.groq.com
- Create an account (free)
- Navigate to API Keys
- Create a new API key
Configure¶
Available Models¶
| Model | Context | Speed | Best For |
|---|---|---|---|
llama-3.3-70b-versatile | 128K | Fast | General use, best quality |
llama-3.1-70b-versatile | 128K | Fast | General use |
llama-3.1-8b-instant | 128K | Ultra-fast | Quick tasks |
mixtral-8x7b-32768 | 32K | Fast | Code, reasoning |
gemma2-9b-it | 8K | Ultra-fast | Lightweight tasks |
Usage Examples¶
Basic Summarization¶
import asyncio
from sentimatrix import Sentimatrix
from sentimatrix.config import SentimatrixConfig, LLMConfig
config = SentimatrixConfig(
llm=LLMConfig(
provider="groq",
model="llama-3.3-70b-versatile"
)
)
async def main():
async with Sentimatrix(config) as sm:
reviews = [
{"text": "Amazing product, works perfectly!"},
{"text": "Good value for the price."},
{"text": "Shipping was slow but product is fine."},
]
summary = await sm.summarize_reviews(reviews)
print(summary)
asyncio.run(main())
Generate Insights¶
async with Sentimatrix(config) as sm:
insights = await sm.generate_insights(reviews)
print("Pros:")
for pro in insights.pros:
print(f" + {pro}")
print("\nCons:")
for con in insights.cons:
print(f" - {con}")
Streaming Responses¶
async with Sentimatrix(config) as sm:
async for chunk in sm.stream_summary(reviews):
print(chunk, end="", flush=True)
Rate Limits¶
Groq's free tier has generous limits:
| Model | Requests/min | Tokens/min | Tokens/day |
|---|---|---|---|
| LLaMA 3.3 70B | 30 | 6,000 | 100,000 |
| LLaMA 3.1 8B | 30 | 20,000 | 500,000 |
| Mixtral 8x7B | 30 | 5,000 | 100,000 |
Handling Rate Limits¶
Sentimatrix handles rate limits automatically:
config = SentimatrixConfig(
llm=LLMConfig(
provider="groq",
model="llama-3.3-70b-versatile",
rate_limit={
"requests_per_minute": 25, # Stay under limit
"retry_on_rate_limit": True,
}
)
)
Configuration Options¶
LLMConfig(
provider="groq",
model="llama-3.3-70b-versatile",
# Generation settings
temperature=0.7, # Creativity (0-2)
max_tokens=4096, # Max response length
top_p=0.9, # Nucleus sampling
# Reliability
timeout=30, # Request timeout
max_retries=3, # Retry count
retry_delay=1.0, # Initial retry delay
# Rate limiting
rate_limit={
"requests_per_minute": 25,
"tokens_per_minute": 5000,
}
)
Best Practices¶
-
Use the Right Model
llama-3.3-70b-versatilefor best qualityllama-3.1-8b-instantfor speed-critical tasks
-
Handle Rate Limits
- Implement exponential backoff
- Cache responses when possible
-
Optimize Prompts
- Keep prompts concise
- Use structured output formats
Troubleshooting¶
Rate limit exceeded
Reduce request frequency or upgrade to a paid plan:
Model not found
Check available models at console.groq.com/docs/models
Related¶
- Provider Overview
- OpenAI - Alternative cloud provider
- Ollama - Local alternative