Skip to content

Configuration

Sentimatrix uses Pydantic V2 for centralized configuration management with full validation, type safety, and multiple loading methods.

Configuration Methods

:material-file-cog: YAML/JSON Files

Best for: Production deployments, version control

Learn more →

:material-cog: Environment Variables

Best for: Docker, CI/CD, secrets

Learn more →

:material-language-python: Python Objects

Best for: Dynamic configuration, testing

Learn more →

Quick Start

Minimal Configuration

# No config needed for basic usage
async with Sentimatrix() as sm:
    result = await sm.analyze("Hello world!")

With LLM Provider

from sentimatrix import Sentimatrix
from sentimatrix.config import SentimatrixConfig, LLMConfig

config = SentimatrixConfig(
    llm=LLMConfig(
        provider="groq",
        model="llama-3.3-70b-versatile"
    )
)

async with Sentimatrix(config) as sm:
    summary = await sm.summarize_reviews(reviews)
sentimatrix.yaml
llm:
  provider: groq
  model: llama-3.3-70b-versatile
# Load from specific file
config = SentimatrixConfig.from_file("sentimatrix.yaml")
async with Sentimatrix(config) as sm:
    summary = await sm.summarize_reviews(reviews)
export GROQ_API_KEY="gsk_..."
export SENTIMATRIX_LLM__PROVIDER="groq"
export SENTIMATRIX_LLM__MODEL="llama-3.3-70b-versatile"
# Automatically loads from environment
config = SentimatrixConfig.from_env()

Configuration Classes

Sentimatrix uses a hierarchy of Pydantic configuration classes:

SentimatrixConfig (main)
├── LLMConfig
│   ├── RetryConfig
│   └── RateLimitConfig
├── ScraperConfig
│   ├── ProxyConfig
│   ├── RetryConfig
│   └── RateLimitConfig
├── ModelConfig
├── CacheConfig
├── LogConfig
├── OutputConfig
└── FallbackConfig

LLMConfig

Configure LLM providers for summarization and insights:

class LLMConfig(BaseModel):
    provider: LLMProvider = "openai"      # LLM provider to use
    model: str = "gpt-4o-mini"            # Model name/identifier
    api_key: Optional[str] = None         # API key (can use env var)
    api_base: Optional[str] = None        # Custom API base URL
    organization: Optional[str] = None    # Organization ID
    timeout: int = 30                     # Request timeout (5-300 seconds)
    max_tokens: int = 1024                # Max tokens to generate (1-128000)
    temperature: float = 0.7              # Sampling temperature (0.0-2.0)
    top_p: float = 1.0                    # Top-p sampling (0.0-1.0)
    retry: RetryConfig                    # Retry configuration
    rate_limit: RateLimitConfig           # Rate limit configuration

Supported Providers (21 total):

Category Providers
Core openai, anthropic, gemini
Cloud Enterprise azure_openai, bedrock
Fast Inference groq, cerebras, fireworks, together
Router/Gateway openrouter
Specialized mistral, cohere, deepseek
Local Inference ollama, lmstudio, vllm, llamacpp, textgen, exllamav2
Legacy huggingface
# YAML example
llm:
  provider: groq
  model: llama-3.3-70b-versatile
  temperature: 0.7
  max_tokens: 4096
  timeout: 30
  retry:
    max_retries: 3
    initial_delay: 1.0
  rate_limit:
    requests_per_second: 1.0
    concurrent_requests: 5

ScraperConfig

Configure web scraping behavior:

class ScraperConfig(BaseModel):
    provider: ScraperProvider = "playwright"   # Scraper provider
    headless: bool = True                      # Run browser headless
    timeout: int = 30                          # Page load timeout (5-120s)
    wait_for_selector: Optional[str] = None    # CSS selector to wait for
    user_agent: Optional[str] = None           # Custom user agent
    viewport_width: int = 1920                 # Browser width (320-3840)
    viewport_height: int = 1080                # Browser height (240-2160)
    proxy: ProxyConfig                         # Proxy configuration
    rate_limit: RateLimitConfig                # Rate limit configuration
    retry: RetryConfig                         # Retry configuration
    screenshots: bool = False                  # Capture screenshots
    screenshot_dir: Optional[str] = None       # Screenshot directory

Supported Scraper Providers:

Type Providers
Browser playwright, selenium
HTTP httpx, requests
Commercial API scraperapi, brightdata, oxylabs, apify, zyte, firecrawl
# YAML example
scrapers:
  provider: playwright
  headless: true
  timeout: 30
  viewport_width: 1920
  viewport_height: 1080
  proxy:
    enabled: false
  rate_limit:
    requests_per_second: 2.0
    concurrent_requests: 5
  retry:
    max_retries: 3
    exponential_base: 2.0

ModelConfig

Configure ML models for analysis:

class ModelConfig(BaseModel):
    sentiment_model: str = "cardiffnlp/twitter-roberta-base-sentiment-latest"
    emotion_model: str = "SamLowe/roberta-base-go_emotions"
    device: Literal["auto", "cpu", "cuda", "mps"] = "auto"
    batch_size: int = 32                  # Batch size (1-512)
    max_length: int = 512                 # Max sequence length (32-4096)
    use_quantization: bool = False        # Enable model quantization
    cache_models: bool = True             # Cache loaded models
# YAML example
models:
  sentiment_model: cardiffnlp/twitter-roberta-base-sentiment-latest
  emotion_model: SamLowe/roberta-base-go_emotions
  device: auto
  batch_size: 32
  max_length: 512
  use_quantization: false
  cache_models: true

CacheConfig

Configure caching for performance:

class CacheConfig(BaseModel):
    enabled: bool = True                  # Enable caching
    backend: CacheBackend = "memory"      # memory, redis, sqlite
    ttl: int = 3600                       # TTL in seconds (0-86400)
    max_size: int = 1000                  # Max cache entries (10-100000)
    namespace: str = "sentimatrix"        # Cache key namespace
    redis_url: Optional[str] = None       # Required for redis backend
    sqlite_path: Optional[str] = None     # Required for sqlite backend
    compression: bool = False             # Enable value compression
# YAML example
cache:
  enabled: true
  backend: memory
  ttl: 3600
  max_size: 1000
  namespace: sentimatrix
  compression: false

# Redis backend
cache:
  enabled: true
  backend: redis
  redis_url: redis://localhost:6379
  ttl: 3600

# SQLite backend
cache:
  enabled: true
  backend: sqlite
  sqlite_path: .cache/sentimatrix.db

LogConfig

Configure logging output:

class LogConfig(BaseModel):
    level: LogLevel = "INFO"              # DEBUG, INFO, WARNING, ERROR, CRITICAL
    format: Literal["json", "text"] = "json"
    file_path: Optional[str] = None       # Log file path
    max_file_size_mb: int = 10            # Max file size (1-100 MB)
    backup_count: int = 5                 # Backup files (0-20)
    include_timestamp: bool = True
    include_caller: bool = True
    console_output: bool = True
    colorize: bool = True
# YAML example
logging:
  level: INFO
  format: json
  file_path: logs/sentimatrix.log
  max_file_size_mb: 10
  backup_count: 5
  console_output: true
  colorize: true

RetryConfig

Configure retry behavior (used by LLM and Scraper):

class RetryConfig(BaseModel):
    max_retries: int = 3                  # Max attempts (0-10)
    initial_delay: float = 1.0            # Initial delay (0.1-60s)
    max_delay: float = 60.0               # Max delay (1-300s)
    exponential_base: float = 2.0         # Backoff base (1-5)
    jitter: bool = True                   # Add random jitter

RateLimitConfig

Configure rate limiting (used by LLM and Scraper):

class RateLimitConfig(BaseModel):
    requests_per_second: float = 1.0      # RPS (0.1-100)
    requests_per_minute: int = 60         # RPM (1-6000)
    concurrent_requests: int = 5          # Max concurrent (1-100)
    backoff_factor: float = 2.0           # Backoff multiplier (1-10)

ProxyConfig

Configure proxy settings for scraping:

class ProxyConfig(BaseModel):
    enabled: bool = False
    provider: Optional[str] = None        # brightdata, oxylabs, custom
    url: Optional[str] = None             # Proxy URL
    username: Optional[str] = None
    password: Optional[str] = None
    rotation: bool = True                 # Enable rotation
    country: Optional[str] = None         # Target country code

FallbackConfig

Configure provider fallback chain:

class FallbackConfig(BaseModel):
    enabled: bool = True
    providers: List[LLMProvider] = ["openai", "anthropic", "groq"]
    max_attempts: int = 3                 # Max attempts (1-10)

OutputConfig

Configure output handling:

class OutputConfig(BaseModel):
    default_format: Literal["json", "csv", "xlsx"] = "json"
    include_metadata: bool = True
    include_raw_data: bool = False
    pretty_print: bool = True
    datetime_format: str = "%Y-%m-%dT%H:%M:%SZ"

Full Configuration Example

sentimatrix.yaml
# LLM Provider Configuration
llm:
  provider: groq
  model: llama-3.3-70b-versatile
  temperature: 0.7
  max_tokens: 4096
  timeout: 30
  retry:
    max_retries: 3
    initial_delay: 1.0
    max_delay: 60.0
    exponential_base: 2.0
    jitter: true
  rate_limit:
    requests_per_second: 1.0
    requests_per_minute: 60
    concurrent_requests: 5

# Provider Fallback Chain
fallback:
  enabled: true
  providers:
    - openai
    - anthropic
    - groq
  max_attempts: 3

# Web Scraping Configuration
scrapers:
  provider: playwright
  headless: true
  timeout: 30
  viewport_width: 1920
  viewport_height: 1080
  proxy:
    enabled: false
    rotation: true
  rate_limit:
    requests_per_second: 2.0
    concurrent_requests: 5
  retry:
    max_retries: 3
    initial_delay: 1.0
    exponential_base: 2.0

# ML Model Configuration
models:
  sentiment_model: cardiffnlp/twitter-roberta-base-sentiment-latest
  emotion_model: SamLowe/roberta-base-go_emotions
  device: auto
  batch_size: 32
  max_length: 512
  use_quantization: false
  cache_models: true

# Caching Configuration
cache:
  enabled: true
  backend: memory
  ttl: 3600
  max_size: 1000
  namespace: sentimatrix
  compression: false

# Logging Configuration
logging:
  level: INFO
  format: json
  console_output: true
  colorize: true

# Output Configuration
output:
  default_format: json
  include_metadata: true
  pretty_print: true

# Global Settings
debug: false
dry_run: false

Loading Configuration

From File

Load from YAML or JSON files:

from sentimatrix.config import SentimatrixConfig

# Load from YAML
config = SentimatrixConfig.from_file("config.yaml")

# Load from JSON
config = SentimatrixConfig.from_file("config.json")

# With runtime overrides
config = SentimatrixConfig.from_file(
    "config.yaml",
    debug=True,
    llm={"model": "gpt-4o"}
)

From Environment Variables

Environment variables use SENTIMATRIX_ prefix with double underscore for nesting:

from sentimatrix.config import SentimatrixConfig

# Automatically reads SENTIMATRIX_* environment variables
config = SentimatrixConfig.from_env()

# With overrides
config = SentimatrixConfig.from_env(debug=True)

Combining Methods

Configuration is merged in order of precedence:

  1. Runtime overrides (highest)
  2. Environment variables
  3. File configuration (lowest)
from sentimatrix.config import SentimatrixConfig, LLMConfig

# File provides base, env vars override, runtime overrides take precedence
config = SentimatrixConfig.from_file("config.yaml", debug=True)

# Create modified copy
new_config = config.with_overrides(
    llm={"provider": "anthropic", "model": "claude-3-5-sonnet"}
)

Saving Configuration

config = SentimatrixConfig(
    llm=LLMConfig(provider="groq", model="llama-3.3-70b-versatile")
)

# Save to YAML
config.save("my-config.yaml")

# Save to JSON
config.save("my-config.json")

# Convert to dict/YAML string
config_dict = config.to_dict()
yaml_string = config.to_yaml()

Environment Variables

All configuration can be set via environment variables with SENTIMATRIX_ prefix. Nested values use double underscore (__) as delimiter:

# LLM Configuration
export SENTIMATRIX_LLM__PROVIDER="groq"
export SENTIMATRIX_LLM__MODEL="llama-3.3-70b-versatile"
export SENTIMATRIX_LLM__TEMPERATURE="0.7"
export SENTIMATRIX_LLM__MAX_TOKENS="4096"
export SENTIMATRIX_LLM__TIMEOUT="30"

# API Keys (standard provider pattern)
export OPENAI_API_KEY="sk-..."
export ANTHROPIC_API_KEY="sk-ant-..."
export GROQ_API_KEY="gsk_..."
export GOOGLE_API_KEY="..."
export MISTRAL_API_KEY="..."

# Or via config with env: prefix
export MY_CUSTOM_KEY="sk-..."
# Then in config: api_key: "env:MY_CUSTOM_KEY"

# Scraper Configuration
export SENTIMATRIX_SCRAPERS__PROVIDER="playwright"
export SENTIMATRIX_SCRAPERS__HEADLESS="true"
export SENTIMATRIX_SCRAPERS__TIMEOUT="30"

# Rate Limiting
export SENTIMATRIX_SCRAPERS__RATE_LIMIT__REQUESTS_PER_SECOND="2.0"
export SENTIMATRIX_SCRAPERS__RATE_LIMIT__CONCURRENT_REQUESTS="5"

# Model Configuration
export SENTIMATRIX_MODELS__DEVICE="cuda"
export SENTIMATRIX_MODELS__BATCH_SIZE="64"

# Cache Configuration
export SENTIMATRIX_CACHE__ENABLED="true"
export SENTIMATRIX_CACHE__BACKEND="redis"
export SENTIMATRIX_CACHE__REDIS_URL="redis://localhost:6379"
export SENTIMATRIX_CACHE__TTL="3600"

# Logging
export SENTIMATRIX_LOGGING__LEVEL="DEBUG"
export SENTIMATRIX_LOGGING__FORMAT="json"

# Global
export SENTIMATRIX_DEBUG="true"
export SENTIMATRIX_DRY_RUN="false"

Configuration Validation

Sentimatrix validates all configuration using Pydantic V2:

from sentimatrix.config import SentimatrixConfig, LLMConfig, CacheConfig
from pydantic import ValidationError

# Invalid provider
try:
    config = SentimatrixConfig(
        llm=LLMConfig(provider="invalid_provider")
    )
except ValidationError as e:
    print(f"Validation error: {e}")

# Invalid range
try:
    config = SentimatrixConfig(
        llm=LLMConfig(temperature=3.0)  # Max is 2.0
    )
except ValidationError as e:
    print(f"Temperature out of range: {e}")

# Missing required backend config
try:
    config = SentimatrixConfig(
        cache=CacheConfig(backend="redis")  # Missing redis_url
    )
except ValidationError as e:
    print(f"Redis URL required: {e}")

Validation Rules

Config Field Constraints
LLMConfig timeout 5-300 seconds
LLMConfig max_tokens 1-128000
LLMConfig temperature 0.0-2.0
LLMConfig top_p 0.0-1.0
ScraperConfig timeout 5-120 seconds
ScraperConfig viewport_width 320-3840
ModelConfig batch_size 1-512
ModelConfig max_length 32-4096
CacheConfig ttl 0-86400 seconds
CacheConfig max_size 10-100000
RetryConfig max_retries 0-10
RateLimitConfig requests_per_second 0.1-100

Main Configuration Class

The SentimatrixConfig class is the main entry point:

class SentimatrixConfig(BaseSettings):
    """
    Main configuration class for Sentimatrix.

    Supports loading from:
    - YAML/JSON files
    - Environment variables (prefixed with SENTIMATRIX_)
    - Runtime overrides
    """

    # Sub-configurations
    llm: LLMConfig                    # LLM provider settings
    scrapers: ScraperConfig           # Web scraping settings
    models: ModelConfig               # ML model settings
    cache: CacheConfig                # Cache backend settings
    logging: LogConfig                # Logging settings
    output: OutputConfig              # Output format settings
    fallback: FallbackConfig          # Provider fallback chain

    # Global settings
    debug: bool = False               # Enable debug mode
    dry_run: bool = False             # No API calls mode

    # Class methods
    @classmethod
    def from_file(cls, path, **overrides) -> "SentimatrixConfig": ...

    @classmethod
    def from_env(cls, **overrides) -> "SentimatrixConfig": ...

    def to_dict(self) -> dict: ...
    def to_yaml(self) -> str: ...
    def save(self, path) -> None: ...
    def with_overrides(self, **overrides) -> "SentimatrixConfig": ...

Convenience Function

from sentimatrix.config import get_config

# From file
config = get_config("config.yaml")

# From environment
config = get_config()

# With overrides
config = get_config("config.yaml", debug=True)