Web Scraper API Documentation
A powerful REST API for web scraping with intelligent content extraction, validation, and cleanup.
Base URL
http://api2.flying-extract.inOverview
The FlyingExtract API provides intelligent web scraping capabilities with advanced AI processing. It can extract structured data from any webpage with automatic validation, contamination detection, and cleanup to ensure high-quality results.
Key Features
- • AI-powered content extraction
- • Automatic validation & cleanup
- • Contamination detection
- • Proxy network support
- • Deletion detection
Supported Content
- • News articles
- • Blog posts
- • Academic papers
- • Product descriptions
- • Any structured content
Authentication
All endpoints require API key authentication via query parameter:
?apiKey=YOUR_API_KEYNote: Keep your API key secure and never expose it in client-side code. Contact us at hello@flyingstars.co to get your API key.
Endpoints
1. Health Check
/Check if the API is running and view available endpoints.
Parameters: None
2. Scrape Webpage
/scrapeExtract structured data from any webpage with intelligent content validation and cleanup.
Required Parameters
apiKey (string)Valid API key for authentication
url (string)Target webpage URL to scrape
Optional Parameters
ai (string)Set to "1" for enhanced AI-powered extraction, "0" or omit for standard extraction (default)
proxy (string)Set to "1" to enable proxy routing
Example Requests
Basic extraction:
curl "http://api2.flying-extract.in/scrape?apiKey=abc123&url=https://example.com"AI-powered extraction:
curl "http://api2.flying-extract.in/scrape?apiKey=abc123&url=https://example.com&ai=1"With proxy:
curl "http://api2.flying-extract.in/scrape?apiKey=abc123&url=https://example.com&proxy=1"Response Format
Success Response (Basic)
{
"success": true,
"url": "https://example.com",
"data": {
"title": "Article Title",
"subheading": "Article subtitle or null",
"body": "Complete article body text...",
"classification": "News Article",
"author": "Author Name",
"topImage": "https://example.com/image.jpg",
"description": "Article description",
"keywords": "keyword1, keyword2",
"publishedDate": "2024-01-15"
},
"extractionMethod": "standard",
"validation": {
"isValid": true,
"result": "VALID",
"contentType": "clean_article_body_fully_extracted",
"contaminationDetected": []
},
"cleaned": false
}Success Response (AI Mode)
{
"success": true,
"url": "https://example.com",
"data": {
"title": "Article Title",
"subheading": "Article subtitle",
"body": "Complete article body text...",
"classification": "News Article",
"author": "Author Name",
"topImage": "https://example.com/image.jpg",
"description": "Article description",
"keywords": "keyword1, keyword2",
"publishedDate": "2024-01-15"
},
"extractionMethod": "enhanced_ai",
"validation": {
"isValid": true,
"result": "VALID",
"contentType": "clean_article_body_fully_extracted",
"contaminationDetected": []
},
"cleaned": true
}Deleted Content Response
{
"success": false,
"deleted": true,
"statusCode": 404,
"message": "Article not found (HTTP 404)",
"url": "https://example.com"
}Error Response
{
"success": false,
"error": "Failed to scrape the webpage",
"message": "Detailed error description",
"url": "https://example.com"
}Content Validation
The API classifies extracted content into these categories to help you determine data quality:
Valid Article Content
clean_article_body_fully_extractedComplete article with no contamination
clean_article_body_partially_extractedArticle content but may be incomplete
article_extracted_with_possible_contaminationArticle present with minor contamination
Invalid Content
cookie_consent_text_onlyprivacy_policy_text_onlynavigation_text_onlyadvertisement_text_onlyerror_message_text_onlypaywall_text_onlyContamination Detection
The API automatically detects and removes these contamination types:
Navigation Elements
- • Navigation menus and breadcrumbs
- • Header and footer content
- • Sidebar elements and widgets
Content Pollution
- • Related articles sections
- • Comments and social share buttons
- • Advertisement blocks
Technical Elements
- • JavaScript/CSS code fragments
- • Multiple article snippets mixed together
- • Cookie consent banners
Legal Content
- • Copyright notices
- • Privacy policy text
- • Terms of service content
Error Codes
| HTTP Status | Error Type | Description |
|---|---|---|
400 | Bad Request | Missing or invalid URL parameter |
401 | Unauthorized | Missing API key |
403 | Forbidden | Invalid API key |
404 | Not Found | Article deleted/not found |
410 | Gone | Article permanently deleted |
500 | Internal Server Error | Scraping or processing failure |
Code Examples
JavaScript/Node.js
const apiKey = 'your-api-key';
const url = 'https://example.com/article';
const response = await fetch(`http://api2.flying-extract.in/scrape?apiKey=${apiKey}&url=${encodeURIComponent(url)}&ai=1`);
const data = await response.json();
if (data.success) {
console.log('Title:', data.data.title);
console.log('Body:', data.data.body);
console.log('Validation:', data.validation.contentType);
} else {
console.error('Error:', data.error);
}Python
import requests
from urllib.parse import quote
api_key = 'your-api-key'
url = 'https://example.com/article'
response = requests.get(f'http://api2.flying-extract.in/scrape', params={
'apiKey': api_key,
'url': url,
'ai': '1'
})
data = response.json()
if data['success']:
print(f"Title: {data['data']['title']}")
print(f"Body: {data['data']['body'][:200]}...")
print(f"Validation: {data['validation']['contentType']}")
else:
print(f"Error: {data['error']}")cURL
# Basic extraction
curl "http://api2.flying-extract.in/scrape?apiKey=abc123&url=https://example.com"
# AI extraction
curl "http://api2.flying-extract.in/scrape?apiKey=abc123&url=https://example.com&ai=1"
# Save to file
curl "http://api2.flying-extract.in/scrape?apiKey=abc123&url=https://example.com" > result.jsonSupport
Need Help?
For issues or questions, please check the following:
- • Check the validation results in the response
- • Review contamination detection results
- • Verify your API key and parameters
- • Check for proper URL encoding