Skip to content

Cerebras¶

Cerebras provides the fastest LLM inference available, powered by their custom Wafer-Scale Engine chips.

Quick Start¶

from sentimatrix import Sentimatrix
from sentimatrix.config import SentimatrixConfig, LLMConfig

config = SentimatrixConfig(
    llm=LLMConfig(
        provider="cerebras",
        model="llama3.1-70b",
        api_key="your-cerebras-key"  # Or set CEREBRAS_API_KEY
    )
)

async with Sentimatrix(config) as sm:
    summary = await sm.summarize_reviews(reviews)

Available Models¶

Model	Context	Speed
`llama3.1-70b`	128K	2000+ tokens/sec
`llama3.1-8b`	128K	4000+ tokens/sec

Configuration¶

LLMConfig(
    provider="cerebras",
    model="llama3.1-70b",
    api_key="your-key",           # Or CEREBRAS_API_KEY env var
    temperature=0.7,
    max_tokens=4096,
    timeout=30,
)

Environment Variables¶

export CEREBRAS_API_KEY="your-cerebras-api-key"

Features¶

Ultra-Fast: 10-20x faster than GPU inference
Low Latency: Sub-100ms time-to-first-token
High Throughput: 2000+ tokens/second
OpenAI Compatible: Standard API format

Performance¶

Metric	Cerebras	GPU Cloud
Tokens/sec	2000+	100-200
Time to first token	<100ms	500ms+
Latency (70B)	~2s	20-30s

Use Cases¶

Real-time Applications: Chat, live analysis
High-Volume Processing: Batch analysis at scale
Interactive UX: Instant responses

Example: Speed Test¶

import time

async with Sentimatrix(config) as sm:
    start = time.time()

    # Process 100 reviews
    results = await sm.analyze_batch(reviews[:100])

    elapsed = time.time() - start
    print(f"Processed 100 reviews in {elapsed:.2f}s")
    print(f"Rate: {100/elapsed:.1f} reviews/sec")