🔍 EA Forum Scraper API

Scrape EA Forum posts and comments into structured JSON or human-readable text

v1.0.0 REST API

🚀 Quick Start

Get started in 30 seconds:

# Scrape a post - returns JSON with structured data AND readable text
curl -X POST https://eafapi-production.up.railway.app/scrape \
  -H "Content-Type: application/json" \
  -d '{"url": "https://forum.effectivealtruism.org/posts/..."}'
import requests

response = requests.post(
    "https://eafapi-production.up.railway.app/scrape",
    json={"url": "https://forum.effectivealtruism.org/posts/..."}
)

data = response.json()

# Access structured data
post = data['post']
comments = data['comments']

# Access human-readable text
readable_text = data['text']
const response = await fetch('https://eafapi-production.up.railway.app/scrape', {
    method: 'POST',
    headers: { 'Content-Type': 'application/json' },
    body: JSON.stringify({
        url: 'https://forum.effectivealtruism.org/posts/...'
    })
});

const data = await response.json();

// Access structured data
const post = data.post;
const comments = data.comments;

// Access human-readable text
const readableText = data.text;

✨ Features

⚡ Fast Caching

1-hour TTL cache reduces load and speeds up repeated requests

🛡️ Rate Limiting

5-second cooldown per IP to protect EA Forum

🌐 CORS Enabled

Call from any browser or client

📝 Markdown Output

Clean markdown conversion of HTML content

💬 Full Comments

Extracts all comments with metadata and reply threading

📄 Dual Formats

Returns both structured JSON and human-readable text in one response

📡 API Endpoints

GET /

API information and usage instructions

GET /health

Health check endpoint (used for monitoring)

GET /docs

Interactive Swagger UI documentation

→ Open Interactive Docs

POST /scrape

Scrape an EA Forum post and return structured data with human-readable text

Request Body

Field Type Required Description
url string (URL) ✅ Yes EA Forum post URL to scrape

Response

Field Type Description
post object Post metadata and content (title, author, body, tags, score, etc.)
comments array All comments with metadata (author, timestamp, score, body, is_reply)
text string Human-readable formatted version with post and all comments
url string Original URL that was scraped
fetched_at string ISO timestamp of when data was fetched
cached boolean Whether response came from cache

Text Format Details

The text field contains a formatted document with:

💡 Code Examples

Full Python Example

import requests
import json

response = requests.post(
    "https://eafapi-production.up.railway.app/scrape",
    json={
        "url": "https://forum.effectivealtruism.org/posts/je5TiYESSv53tWHC9/utilitarians-should-accept-that-some-suffering-cannot-be-1"
    }
)

data = response.json()

# Access structured data
print(f"Title: {data['post']['title']}")
print(f"Author: {data['post']['author']}")
print(f"Score: {data['post']['vote_score']}")
print(f"Number of comments: {len(data['comments'])}")

# Save full JSON response
with open("post.json", "w") as f:
    json.dump(data, f, indent=2)
import requests

response = requests.post(
    "https://eafapi-production.up.railway.app/scrape",
    json={
        "url": "https://forum.effectivealtruism.org/posts/je5TiYESSv53tWHC9/utilitarians-should-accept-that-some-suffering-cannot-be-1"
    }
)

data = response.json()

# Get human-readable text from response
readable_text = data['text']

# Save to file
with open("post.txt", "w") as f:
    f.write(readable_text)

# Or print it
print(readable_text)

JavaScript/Node.js Example

const fs = require('fs');

async function scrapePost() {
    const response = await fetch(
        'https://eafapi-production.up.railway.app/scrape',
        {
            method: 'POST',
            headers: { 'Content-Type': 'application/json' },
            body: JSON.stringify({
                url: 'https://forum.effectivealtruism.org/posts/...'
            })
        }
    );

    const data = await response.json();

    console.log(`Title: ${data.post.title}`);
    console.log(`Comments: ${data.comments.length}`);

    // Save JSON
    fs.writeFileSync('post.json', JSON.stringify(data, null, 2));

    // Or save readable text
    fs.writeFileSync('post.txt', data.text);
}

scrapePost();

cURL Examples

# Save full JSON response
curl -X POST https://eafapi-production.up.railway.app/scrape \
  -H "Content-Type: application/json" \
  -d '{"url": "https://forum.effectivealtruism.org/posts/..."}' \
  -o post.json

# Extract just the text field with jq
curl -X POST https://eafapi-production.up.railway.app/scrape \
  -H "Content-Type: application/json" \
  -d '{"url": "https://forum.effectivealtruism.org/posts/..."}' \
  | jq -r '.text' > post.txt

⚠️ Error Responses

Status Code Error Description
422 Validation Error Invalid URL format in request body
429 Rate Limited Too many requests (wait 5 seconds between requests)
502 Bad Gateway Failed to fetch EA Forum URL (network issue or invalid URL)
500 Internal Server Error Scraping error (EA Forum HTML structure may have changed)

📚 Additional Resources