Skip to content

Reddit Scraper

Scrape posts, comments, and discussions from Reddit subreddits and threads.

Quick Start

from sentimatrix import Sentimatrix

async with Sentimatrix() as sm:
    comments = await sm.scrape_reviews(
        "https://reddit.com/r/gaming/comments/...",
        platform="reddit",
        max_reviews=100
    )

    for comment in comments:
        print(f"u/{comment.author}: {comment.text[:100]}...")
        print(f"  Score: {comment.helpful_count}")

Configuration

reviews = await sm.scrape_reviews(
    url="https://reddit.com/r/subreddit/comments/...",
    platform="reddit",
    max_reviews=200,           # Max comments
    include_replies=True,      # Include nested replies
    min_score=1,               # Minimum upvote score
    sort_by="best",            # "best", "top", "new", "controversial"
)

Supported URL Formats

# Post URL
"https://www.reddit.com/r/gaming/comments/abc123/post_title/"

# Old Reddit
"https://old.reddit.com/r/gaming/comments/abc123/"

# Subreddit (scrapes top posts)
"https://reddit.com/r/gaming"

# Search results
"https://reddit.com/r/gaming/search?q=review"

Review Object

@dataclass
class Review:
    text: str                    # Comment/post body
    author: str                  # u/username
    posted_date: datetime        # Post date
    helpful_count: int           # Upvote score
    rating: None                 # Not applicable
    platform: str = "reddit"
    metadata: dict               # Extra data

Metadata Fields

comment.metadata = {
    "subreddit": "gaming",
    "post_id": "abc123",
    "comment_id": "xyz789",
    "is_post": False,
    "parent_id": "t1_abc",
    "depth": 2,
    "permalink": "/r/gaming/comments/..."
}

Example: Subreddit Sentiment

async with Sentimatrix(config) as sm:
    # Scrape product discussion subreddit
    posts = await sm.scrape_reviews(
        "https://reddit.com/r/ProductName",
        platform="reddit",
        max_reviews=500
    )

    # Analyze sentiment
    results = await sm.analyze_batch([p.text for p in posts])

    # Generate insights
    insights = await sm.generate_insights(posts)
    print(insights.summary)

Authentication (Optional)

For higher rate limits, use Reddit API:

export REDDIT_CLIENT_ID="your-client-id"
export REDDIT_CLIENT_SECRET="your-client-secret"
export REDDIT_USER_AGENT="sentimatrix/1.0"

Rate Limits

Method Rate Limit
Without Auth 10 req/min
With OAuth 60 req/min