Groq¶

Groq provides ultra-fast LLM inference using custom hardware (LPUs). It offers a generous free tier, making it perfect for getting started with Sentimatrix.

Stable

Quick Facts¶

Property	Value
Speed	Ultra-fast (100+ tokens/s)
Free Tier	Yes (generous limits)
Models	LLaMA 3.3, Mixtral, Gemma
Streaming	Supported
Functions	Supported
Vision	Not supported

Setup¶

Get API Key¶

Go to console.groq.com
Create an account (free)
Navigate to API Keys
Create a new API key

Configure¶

Environment VariablePythonYAML

export GROQ_API_KEY="gsk_..."

from sentimatrix.config import SentimatrixConfig, LLMConfig

config = SentimatrixConfig(
    llm=LLMConfig(
        provider="groq",
        api_key="gsk_...",
        model="llama-3.3-70b-versatile"
    )
)

llm:
  provider: groq
  model: llama-3.3-70b-versatile

Available Models¶

Model	Context	Speed	Best For
`llama-3.3-70b-versatile`	128K	Fast	General use, best quality
`llama-3.1-70b-versatile`	128K	Fast	General use
`llama-3.1-8b-instant`	128K	Ultra-fast	Quick tasks
`mixtral-8x7b-32768`	32K	Fast	Code, reasoning
`gemma2-9b-it`	8K	Ultra-fast	Lightweight tasks

Usage Examples¶

Basic Summarization¶

import asyncio
from sentimatrix import Sentimatrix
from sentimatrix.config import SentimatrixConfig, LLMConfig

config = SentimatrixConfig(
    llm=LLMConfig(
        provider="groq",
        model="llama-3.3-70b-versatile"
    )
)

async def main():
    async with Sentimatrix(config) as sm:
        reviews = [
            {"text": "Amazing product, works perfectly!"},
            {"text": "Good value for the price."},
            {"text": "Shipping was slow but product is fine."},
        ]

        summary = await sm.summarize_reviews(reviews)
        print(summary)

asyncio.run(main())

Generate Insights¶

async with Sentimatrix(config) as sm:
    insights = await sm.generate_insights(reviews)

    print("Pros:")
    for pro in insights.pros:
        print(f"  + {pro}")

    print("\nCons:")
    for con in insights.cons:
        print(f"  - {con}")

Streaming Responses¶

async with Sentimatrix(config) as sm:
    async for chunk in sm.stream_summary(reviews):
        print(chunk, end="", flush=True)

Rate Limits¶

Groq's free tier has generous limits:

Model	Requests/min	Tokens/min	Tokens/day
LLaMA 3.3 70B	30	6,000	100,000
LLaMA 3.1 8B	30	20,000	500,000
Mixtral 8x7B	30	5,000	100,000

Handling Rate Limits¶

Sentimatrix handles rate limits automatically:

config = SentimatrixConfig(
    llm=LLMConfig(
        provider="groq",
        model="llama-3.3-70b-versatile",
        rate_limit={
            "requests_per_minute": 25,  # Stay under limit
            "retry_on_rate_limit": True,
        }
    )
)

Configuration Options¶

LLMConfig(
    provider="groq",
    model="llama-3.3-70b-versatile",

    # Generation settings
    temperature=0.7,          # Creativity (0-2)
    max_tokens=4096,          # Max response length
    top_p=0.9,                # Nucleus sampling

    # Reliability
    timeout=30,               # Request timeout
    max_retries=3,            # Retry count
    retry_delay=1.0,          # Initial retry delay

    # Rate limiting
    rate_limit={
        "requests_per_minute": 25,
        "tokens_per_minute": 5000,
    }
)

Best Practices¶

Use the Right Model
- llama-3.3-70b-versatile for best quality
- llama-3.1-8b-instant for speed-critical tasks
Handle Rate Limits
- Implement exponential backoff
- Cache responses when possible
Optimize Prompts
- Keep prompts concise
- Use structured output formats

Troubleshooting¶

Rate limit exceeded

Reduce request frequency or upgrade to a paid plan:

config = SentimatrixConfig(
    llm=LLMConfig(
        provider="groq",
        rate_limit={"requests_per_minute": 20}
    )
)

Model not found

Check available models at console.groq.com/docs/models

Timeout errors

Increase timeout for large requests:

LLMConfig(provider="groq", timeout=60)

Provider Overview
OpenAI - Alternative cloud provider
Ollama - Local alternative