Scrape EA Forum posts and comments into structured JSON or human-readable text
v1.0.0 REST APIGet started in 30 seconds:
# Scrape a post - returns JSON with structured data AND readable text
curl -X POST https://eafapi-production.up.railway.app/scrape \
-H "Content-Type: application/json" \
-d '{"url": "https://forum.effectivealtruism.org/posts/..."}'
import requests
response = requests.post(
"https://eafapi-production.up.railway.app/scrape",
json={"url": "https://forum.effectivealtruism.org/posts/..."}
)
data = response.json()
# Access structured data
post = data['post']
comments = data['comments']
# Access human-readable text
readable_text = data['text']
const response = await fetch('https://eafapi-production.up.railway.app/scrape', {
method: 'POST',
headers: { 'Content-Type': 'application/json' },
body: JSON.stringify({
url: 'https://forum.effectivealtruism.org/posts/...'
})
});
const data = await response.json();
// Access structured data
const post = data.post;
const comments = data.comments;
// Access human-readable text
const readableText = data.text;
1-hour TTL cache reduces load and speeds up repeated requests
5-second cooldown per IP to protect EA Forum
Call from any browser or client
Clean markdown conversion of HTML content
Extracts all comments with metadata and reply threading
Returns both structured JSON and human-readable text in one response
API information and usage instructions
Health check endpoint (used for monitoring)
Scrape an EA Forum post and return structured data with human-readable text
Field | Type | Required | Description |
---|---|---|---|
url | string (URL) | ✅ Yes | EA Forum post URL to scrape |
Field | Type | Description |
---|---|---|
post | object | Post metadata and content (title, author, body, tags, score, etc.) |
comments | array | All comments with metadata (author, timestamp, score, body, is_reply) |
text | string | Human-readable formatted version with post and all comments |
url | string | Original URL that was scraped |
fetched_at | string | ISO timestamp of when data was fetched |
cached | boolean | Whether response came from cache |
The text field contains a formatted document with:
import requests
import json
response = requests.post(
"https://eafapi-production.up.railway.app/scrape",
json={
"url": "https://forum.effectivealtruism.org/posts/je5TiYESSv53tWHC9/utilitarians-should-accept-that-some-suffering-cannot-be-1"
}
)
data = response.json()
# Access structured data
print(f"Title: {data['post']['title']}")
print(f"Author: {data['post']['author']}")
print(f"Score: {data['post']['vote_score']}")
print(f"Number of comments: {len(data['comments'])}")
# Save full JSON response
with open("post.json", "w") as f:
json.dump(data, f, indent=2)
import requests
response = requests.post(
"https://eafapi-production.up.railway.app/scrape",
json={
"url": "https://forum.effectivealtruism.org/posts/je5TiYESSv53tWHC9/utilitarians-should-accept-that-some-suffering-cannot-be-1"
}
)
data = response.json()
# Get human-readable text from response
readable_text = data['text']
# Save to file
with open("post.txt", "w") as f:
f.write(readable_text)
# Or print it
print(readable_text)
const fs = require('fs');
async function scrapePost() {
const response = await fetch(
'https://eafapi-production.up.railway.app/scrape',
{
method: 'POST',
headers: { 'Content-Type': 'application/json' },
body: JSON.stringify({
url: 'https://forum.effectivealtruism.org/posts/...'
})
}
);
const data = await response.json();
console.log(`Title: ${data.post.title}`);
console.log(`Comments: ${data.comments.length}`);
// Save JSON
fs.writeFileSync('post.json', JSON.stringify(data, null, 2));
// Or save readable text
fs.writeFileSync('post.txt', data.text);
}
scrapePost();
# Save full JSON response
curl -X POST https://eafapi-production.up.railway.app/scrape \
-H "Content-Type: application/json" \
-d '{"url": "https://forum.effectivealtruism.org/posts/..."}' \
-o post.json
# Extract just the text field with jq
curl -X POST https://eafapi-production.up.railway.app/scrape \
-H "Content-Type: application/json" \
-d '{"url": "https://forum.effectivealtruism.org/posts/..."}' \
| jq -r '.text' > post.txt
Status Code | Error | Description |
---|---|---|
422 | Validation Error | Invalid URL format in request body |
429 | Rate Limited | Too many requests (wait 5 seconds between requests) |
502 | Bad Gateway | Failed to fetch EA Forum URL (network issue or invalid URL) |
500 | Internal Server Error | Scraping error (EA Forum HTML structure may have changed) |