Web Scraper API Documentation
A powerful REST API for web scraping with intelligent content extraction, validation, and cleanup.
Base URL
https://flying-extract.inOverview
The FlyingExtract API provides intelligent web scraping capabilities with advanced AI processing. It can extract structured data from any webpage with automatic validation, contamination detection, and cleanup to ensure high-quality results.
Key Features
- • AI-powered content extraction
- • Automatic validation & cleanup
- • Contamination detection
- • Proxy network support
- • Deletion detection
Supported Content
- • News articles
- • Blog posts
- • Academic papers
- • Product descriptions
- • Any structured content
Authentication
All endpoints require API key authentication via query parameter:
?apiKey=YOUR_API_KEYNote: api_key is also accepted as an alternative parameter name.
Note: Keep your API key secure and never expose it in client-side code. Contact us at hello@flyingstars.co to get your API key.
Endpoints
1. Health Check
/Check if the API is running and view available endpoints.
Parameters: None
2. Scrape Webpage
/scrapeExtract article content from a webpage.
Parameters
| Parameter | Type | Required | Default | Description |
|---|---|---|---|---|
apiKey | string | Yes | - | Your API key |
url | string | Yes | - | The webpage URL to scrape |
ai | 0, 1, 2 | No | 0 | AI extraction mode (see below) |
proxy | 0, 1 | No | 0 | Enable proxy for browser requests |
AI Extraction Modes
| Mode | Name | Description |
|---|---|---|
ai=0 | Traditional Only | Fast extraction using newspaperjs + Readability. No AI fallback. |
ai=1 | Traditional + AI Fallback | Traditional first, falls back to AI if validation fails. |
ai=2 | Forced AI + Union | Always runs AI extraction and unions with traditional results. |
Example Requests
# Basic request
GET https://flying-extract.in/scrape?apiKey=YOUR_API_KEY&url=https://example.com/article# With AI fallback enabled
GET https://flying-extract.in/scrape?apiKey=YOUR_API_KEY&url=https://example.com/article&ai=1Response Format
Success Response
{
"success": true,
"url": "https://example.com/article",
"data": {
"title": "Article Title",
"subheading": "Article subtitle or null",
"body": "The complete article body text...",
"classification": "News Article",
"author": "Author Name",
"images": [
"https://example.com/image1.jpg",
"https://example.com/image2.jpg"
],
"keywords": "keyword1, keyword2",
"publishedDate": "2024-01-15"
},
"wordCount": 850,
"validation": {
"isValid": true,
"result": "VALID",
"pageType": "news_article",
"contentType": "clean_content_fully_extracted",
"contaminationDetected": []
},
"extractionMethod": "traditional"
}Response Fields
| Field | Type | Description |
|---|---|---|
success | boolean | Whether the extraction succeeded |
url | string | The scraped URL |
data.title | string | Article title |
data.subheading | string/null | Article subtitle if present |
data.body | string | Full article body text |
data.classification | string | Page type (e.g., "News Article") |
data.author | string/null | Author name if found |
data.images | array | Array of image URLs from the article |
data.keywords | string/null | Keywords if available |
data.publishedDate | string/null | Publication date if found |
wordCount | number | Word count of the article body |
validation.isValid | boolean | Whether valid article content was found |
validation.result | string | "VALID" or "INVALID" |
validation.pageType | string | Detected page type (e.g., news_article, blog_post, video_page) |
validation.contentType | string | Content classification |
validation.contaminationDetected | array | List of contamination types found and removed |
extractionMethod | string | "traditional", "ai_fallback", or "hybrid_union" |
cleaned | boolean | Present and true if contamination was removed from body |
Error Responses
Missing URL (400)
{
"error": "Missing required parameter: url",
"example": "/scrape?url=https://example.com"
}Invalid API Key (401/403)
{
"error": "API key required",
"message": "Please provide an API key as a URL parameter: ?apiKey=YOUR_KEY"
}Article Not Found (404)
{
"success": false,
"deleted": true,
"statusCode": 404,
"message": "Article not found (HTTP 404)",
"url": "https://example.com/article"
}Article Permanently Deleted (410)
{
"success": false,
"deleted": true,
"statusCode": 410,
"message": "Article permanently deleted (HTTP 410 Gone)",
"url": "https://example.com/article"
}Request Timeout (408)
{
"success": false,
"error": "Request timeout",
"message": "Browser job timeout after 120000ms",
"url": "https://example.com/article",
"timeoutSeconds": 120
}Server Error (500)
{
"success": false,
"error": "Failed to scrape the webpage",
"message": "Error details",
"url": "https://example.com/article"
}Content Validation
The API classifies extracted content into these categories to help you determine data quality:
Valid Article Content
clean_article_body_fully_extractedComplete article with no contamination
clean_article_body_partially_extractedArticle content but may be incomplete
article_extracted_with_possible_contaminationArticle present with minor contamination
Invalid Content
cookie_consent_text_onlyprivacy_policy_text_onlynavigation_text_onlyadvertisement_text_onlyerror_message_text_onlypaywall_text_onlyContamination Detection
The API automatically detects and removes these contamination types:
Navigation Elements
- • Navigation menus and breadcrumbs
- • Header and footer content
- • Sidebar elements and widgets
Content Pollution
- • Related articles sections
- • Comments and social share buttons
- • Advertisement blocks
Technical Elements
- • JavaScript/CSS code fragments
- • Multiple article snippets mixed together
- • Cookie consent banners
Legal Content
- • Copyright notices
- • Privacy policy text
- • Terms of service content
Error Codes
| HTTP Status | Error Type | Description |
|---|---|---|
400 | Bad Request | Missing or invalid URL parameter |
401 | Unauthorized | Missing API key |
403 | Forbidden | Invalid API key |
404 | Not Found | Article deleted/not found |
408 | Request Timeout | Browser job timeout (120s default) |
410 | Gone | Article permanently deleted |
500 | Internal Server Error | Scraping or processing failure |
Code Examples
JavaScript
const apiKey = 'YOUR_API_KEY';
const articleUrl = 'https://example.com/article';
const response = await fetch(
`https://flying-extract.in/scrape?apiKey=${apiKey}&url=${encodeURIComponent(articleUrl)}`
);
const data = await response.json();
if (data.success) {
console.log('Title:', data.data.title);
console.log('Body:', data.data.body);
console.log('Images:', data.data.images);
console.log('Method:', data.extractionMethod);
} else {
console.error('Error:', data.error || data.message);
}Python
import requests
api_key = 'YOUR_API_KEY'
article_url = 'https://example.com/article'
response = requests.get('https://flying-extract.in/scrape', params={
'apiKey': api_key,
'url': article_url,
'ai': '1' # Enable AI fallback
})
data = response.json()
if data['success']:
print(f"Title: {data['data']['title']}")
print(f"Body: {data['data']['body'][:200]}...")
print(f"Images: {len(data['data']['images'])} found")
print(f"Method: {data['extractionMethod']}")
else:
print(f"Error: {data.get('error') or data.get('message')}")cURL
# Basic request
curl "https://flying-extract.in/scrape?apiKey=YOUR_API_KEY&url=https://example.com/article"
# With AI fallback
curl "https://flying-extract.in/scrape?apiKey=YOUR_API_KEY&url=https://example.com/article&ai=1"Support
Need Help?
For issues or questions, please check the following:
- • Check the validation results in the response
- • Review contamination detection results
- • Verify your API key and parameters
- • Check for proper URL encoding