Web Scraper API Documentation
A powerful REST API for web scraping with intelligent content extraction, validation, and cleanup.
Base URL
https://api2.flying-extract.inOverview
The FlyingExtract API provides intelligent web scraping capabilities with advanced AI processing. It can extract structured data from any webpage with automatic validation, contamination detection, and cleanup to ensure high-quality results.
Key Features
- • AI-powered content extraction
- • Automatic validation & cleanup
- • Contamination detection
- • Proxy network support
- • Deletion detection
Supported Content
- • News pages
- • Blog posts
- • Academic papers
- • Product descriptions
- • Any structured content
Authentication
All endpoints require API key authentication via query parameter:
?apiKey=YOUR_API_KEYNote: api_key is also accepted as an alternative parameter name.
Note: Keep your API key secure and never expose it in client-side code. Contact us at hello@flyingstars.co to get your API key.
Endpoints
1. Health Check
/Check if the API is running and view available endpoints.
Parameters: None
2. Scrape Webpage
/scrapeExtract content from a webpage.
Parameters
| Parameter | Type | Required | Default | Description |
|---|---|---|---|---|
apiKey | string | Yes | - | Your API key |
url | string | Yes | - | The webpage URL to scrape |
ai | 0, 1, 2 | No | 0 | AI extraction mode (see below) |
proxy | 0, 1 | No | 0 | Proxy routing mode (see Proxy Modes below) |
AI Extraction Modes
| Mode | Name | Description |
|---|---|---|
ai=0 | No AI | Fast extraction without AI. Works for ~80% of websites. |
ai=1 | AI + Validation | Intelligent AI-assisted extraction with validation. Works for 99.9% of websites. |
ai=2 | Full AI | Fully AI-driven extraction for the hardest 0.1% of websites that resist standard scraping. |
Proxy Modes
| Mode | Name | Description |
|---|---|---|
proxy=0 | No Proxy | Direct connection to the target website. |
proxy=1 | Intelligent Proxy | Automatic proxy that routes requests through the best geographic route for the target. |
Example Requests
# Basic request
GET https://api2.flying-extract.in/scrape?apiKey=YOUR_API_KEY&url=https://example.com/page# With AI and proxy enabled
GET https://api2.flying-extract.in/scrape?apiKey=YOUR_API_KEY&url=https://example.com/page&ai=1&proxy=1Response Format
Success Response
{
"success": true,
"url": "https://example.com/page",
"data": {
"title": "Page Title",
"subheading": "Subtitle or null",
"body": "The complete body text...",
"classification": "News Article",
"author": "Author Name",
"images": [
"https://example.com/image1.jpg",
"https://example.com/image2.jpg"
],
"social-media-share-image": "https://example.com/image1.jpg",
"keywords": "keyword1, keyword2",
"publishedDate": "2026-01-15",
"duplicate_images": []
},
"wordCount": 850,
"validation": {
"isValid": true,
"result": "VALID",
"pageType": "news_article",
"contentType": "clean_content_fully_extracted",
"contaminationDetected": [],
"validationSkipped": false
},
"extractionMethod": "traditional"
}Response Fields
| Field | Type | Description |
|---|---|---|
success | boolean | Whether the extraction succeeded |
url | string | The scraped URL |
data.title | string | Page title |
data.subheading | string/null | Subtitle if present |
data.body | string | Full body text |
data.classification | string | Page type (e.g., "News Article") |
data.author | string/null | Author name if found |
data.images | array | Array of unique image URLs from the page |
data.social-media-share-image | string/null | Primary share image for the page |
data.keywords | string/null | Keywords if available |
data.publishedDate | string/null | Publication date if found |
data.duplicate_images | array | Duplicate image URLs that were deduplicated from the images array |
wordCount | number | Word count of the body |
validation.isValid | boolean | Whether valid content was found |
validation.result | string | "VALID" or "INVALID" |
validation.pageType | string | Detected page type (e.g., news_article, blog_post, video_page) |
validation.contentType | string | Content classification |
validation.contaminationDetected | array | List of contamination types found and removed |
validation.validationSkipped | boolean | Whether validation was skipped |
extractionMethod | string | "traditional", "ai_fallback", or "hybrid_union" |
cleaned | boolean | Present and true if contamination was removed from body |
Error Responses
Missing URL (400)
{
"error": "Missing required parameter: url",
"example": "/scrape?url=https://example.com"
}Invalid API Key (401/403)
{
"error": "API key required",
"message": "Please provide an API key as a URL parameter: ?apiKey=YOUR_KEY"
}Content Not Found (404)
{
"success": false,
"deleted": true,
"statusCode": 404,
"message": "Content not found (HTTP 404)",
"url": "https://example.com/page"
}Content Permanently Deleted (410)
{
"success": false,
"deleted": true,
"statusCode": 410,
"message": "Content permanently deleted (HTTP 410 Gone)",
"url": "https://example.com/page"
}Request Timeout (408)
{
"success": false,
"error": "Request timeout",
"message": "Browser job timeout after 120000ms",
"url": "https://example.com/page",
"timeoutSeconds": 120
}Server Error (500)
{
"success": false,
"error": "Failed to scrape the webpage",
"message": "Error details",
"url": "https://example.com/page"
}Content Validation
The API classifies extracted content into these categories to help you determine data quality:
Valid Content
clean_content_body_fully_extractedComplete content with no contamination
clean_content_body_partially_extractedContent extracted but may be incomplete
content_extracted_with_possible_contaminationContent present with minor contamination
Invalid Content
cookie_consent_text_onlyprivacy_policy_text_onlynavigation_text_onlyadvertisement_text_onlyerror_message_text_onlypaywall_text_onlyContamination Detection
The API automatically detects and removes these contamination types:
Navigation Elements
- • Navigation menus and breadcrumbs
- • Header and footer content
- • Sidebar elements and widgets
Content Pollution
- • Related content sections
- • Comments and social share buttons
- • Advertisement blocks
Technical Elements
- • JavaScript/CSS code fragments
- • Multiple content snippets mixed together
- • Cookie consent banners
Legal Content
- • Copyright notices
- • Privacy policy text
- • Terms of service content
Error Codes
| HTTP Status | Error Type | Description |
|---|---|---|
400 | Bad Request | Missing or invalid URL parameter |
401 | Unauthorized | Missing API key |
403 | Forbidden | Invalid API key |
404 | Not Found | Content deleted/not found |
408 | Request Timeout | Browser job timeout (120s default) |
410 | Gone | Content permanently deleted |
500 | Internal Server Error | Scraping or processing failure |
Code Examples
JavaScript
const apiKey = 'YOUR_API_KEY';
const pageUrl = 'https://example.com/page';
const response = await fetch(
`https://api2.flying-extract.in/scrape?apiKey=${apiKey}&url=${encodeURIComponent(pageUrl)}&ai=1&proxy=1`
);
const data = await response.json();
if (data.success) {
console.log('Title:', data.data.title);
console.log('Body:', data.data.body);
console.log('Images:', data.data.images);
console.log('Method:', data.extractionMethod);
} else {
console.error('Error:', data.error || data.message);
}Python
import requests
api_key = 'YOUR_API_KEY'
page_url = 'https://example.com/page'
response = requests.get('https://api2.flying-extract.in/scrape', params={
'apiKey': api_key,
'url': page_url,
'ai': '1',
'proxy': '1'
})
data = response.json()
if data['success']:
print(f"Title: {data['data']['title']}")
print(f"Body: {data['data']['body'][:200]}...")
print(f"Images: {len(data['data']['images'])} found")
print(f"Method: {data['extractionMethod']}")
else:
print(f"Error: {data.get('error') or data.get('message')}")cURL
# Basic request
curl "https://api2.flying-extract.in/scrape?apiKey=YOUR_API_KEY&url=https://example.com/page"
# With AI and proxy enabled
curl "https://api2.flying-extract.in/scrape?apiKey=YOUR_API_KEY&url=https://example.com/page&ai=1&proxy=1"Support
Need Help?
For issues or questions, please check the following:
- • Check the validation results in the response
- • Review contamination detection results
- • Verify your API key and parameters
- • Check for proper URL encoding