Web Scraper API Documentation

A powerful REST API for web scraping with intelligent content extraction, validation, and cleanup.

Base URL

https://api2.flying-extract.in

Overview

The FlyingExtract API provides intelligent web scraping capabilities with advanced AI processing. It can extract structured data from any webpage with automatic validation, contamination detection, and cleanup to ensure high-quality results.

Key Features

  • • AI-powered content extraction
  • • Automatic validation & cleanup
  • • Contamination detection
  • • Proxy network support
  • • Deletion detection

Supported Content

  • • News pages
  • • Blog posts
  • • Academic papers
  • • Product descriptions
  • • Any structured content

Authentication

All endpoints require API key authentication via query parameter:

?apiKey=YOUR_API_KEY

Note: api_key is also accepted as an alternative parameter name.

Note: Keep your API key secure and never expose it in client-side code. Contact us at hello@flyingstars.co to get your API key.

Endpoints

1. Health Check

GET/

Check if the API is running and view available endpoints.

Parameters: None

2. Scrape Webpage

GET/scrape

Extract content from a webpage.

Parameters

ParameterTypeRequiredDefaultDescription
apiKeystringYes-Your API key
urlstringYes-The webpage URL to scrape
ai0, 1, 2No0AI extraction mode (see below)
proxy0, 1No0Proxy routing mode (see Proxy Modes below)

AI Extraction Modes

ModeNameDescription
ai=0No AIFast extraction without AI. Works for ~80% of websites.
ai=1AI + ValidationIntelligent AI-assisted extraction with validation. Works for 99.9% of websites.
ai=2Full AIFully AI-driven extraction for the hardest 0.1% of websites that resist standard scraping.

Proxy Modes

ModeNameDescription
proxy=0No ProxyDirect connection to the target website.
proxy=1Intelligent ProxyAutomatic proxy that routes requests through the best geographic route for the target.

Example Requests

# Basic request

GET https://api2.flying-extract.in/scrape?apiKey=YOUR_API_KEY&url=https://example.com/page

# With AI and proxy enabled

GET https://api2.flying-extract.in/scrape?apiKey=YOUR_API_KEY&url=https://example.com/page&ai=1&proxy=1

Response Format

Success Response

{
  "success": true,
  "url": "https://example.com/page",
  "data": {
    "title": "Page Title",
    "subheading": "Subtitle or null",
    "body": "The complete body text...",
    "classification": "News Article",
    "author": "Author Name",
    "images": [
      "https://example.com/image1.jpg",
      "https://example.com/image2.jpg"
    ],
    "social-media-share-image": "https://example.com/image1.jpg",
    "keywords": "keyword1, keyword2",
    "publishedDate": "2026-01-15",
    "duplicate_images": []
  },
  "wordCount": 850,
  "validation": {
    "isValid": true,
    "result": "VALID",
    "pageType": "news_article",
    "contentType": "clean_content_fully_extracted",
    "contaminationDetected": [],
    "validationSkipped": false
  },
  "extractionMethod": "traditional"
}

Response Fields

FieldTypeDescription
successbooleanWhether the extraction succeeded
urlstringThe scraped URL
data.titlestringPage title
data.subheadingstring/nullSubtitle if present
data.bodystringFull body text
data.classificationstringPage type (e.g., "News Article")
data.authorstring/nullAuthor name if found
data.imagesarrayArray of unique image URLs from the page
data.social-media-share-imagestring/nullPrimary share image for the page
data.keywordsstring/nullKeywords if available
data.publishedDatestring/nullPublication date if found
data.duplicate_imagesarrayDuplicate image URLs that were deduplicated from the images array
wordCountnumberWord count of the body
validation.isValidbooleanWhether valid content was found
validation.resultstring"VALID" or "INVALID"
validation.pageTypestringDetected page type (e.g., news_article, blog_post, video_page)
validation.contentTypestringContent classification
validation.contaminationDetectedarrayList of contamination types found and removed
validation.validationSkippedbooleanWhether validation was skipped
extractionMethodstring"traditional", "ai_fallback", or "hybrid_union"
cleanedbooleanPresent and true if contamination was removed from body

Error Responses

Missing URL (400)

{
  "error": "Missing required parameter: url",
  "example": "/scrape?url=https://example.com"
}

Invalid API Key (401/403)

{
  "error": "API key required",
  "message": "Please provide an API key as a URL parameter: ?apiKey=YOUR_KEY"
}

Content Not Found (404)

{
  "success": false,
  "deleted": true,
  "statusCode": 404,
  "message": "Content not found (HTTP 404)",
  "url": "https://example.com/page"
}

Content Permanently Deleted (410)

{
  "success": false,
  "deleted": true,
  "statusCode": 410,
  "message": "Content permanently deleted (HTTP 410 Gone)",
  "url": "https://example.com/page"
}

Request Timeout (408)

{
  "success": false,
  "error": "Request timeout",
  "message": "Browser job timeout after 120000ms",
  "url": "https://example.com/page",
  "timeoutSeconds": 120
}

Server Error (500)

{
  "success": false,
  "error": "Failed to scrape the webpage",
  "message": "Error details",
  "url": "https://example.com/page"
}

Content Validation

The API classifies extracted content into these categories to help you determine data quality:

Valid Content

clean_content_body_fully_extracted

Complete content with no contamination

clean_content_body_partially_extracted

Content extracted but may be incomplete

content_extracted_with_possible_contamination

Content present with minor contamination

Invalid Content

cookie_consent_text_only
privacy_policy_text_only
navigation_text_only
advertisement_text_only
error_message_text_only
paywall_text_only

Contamination Detection

The API automatically detects and removes these contamination types:

Navigation Elements

  • • Navigation menus and breadcrumbs
  • • Header and footer content
  • • Sidebar elements and widgets

Content Pollution

  • • Related content sections
  • • Comments and social share buttons
  • • Advertisement blocks

Technical Elements

  • • JavaScript/CSS code fragments
  • • Multiple content snippets mixed together
  • • Cookie consent banners

Legal Content

  • • Copyright notices
  • • Privacy policy text
  • • Terms of service content

Error Codes

HTTP StatusError TypeDescription
400Bad RequestMissing or invalid URL parameter
401UnauthorizedMissing API key
403ForbiddenInvalid API key
404Not FoundContent deleted/not found
408Request TimeoutBrowser job timeout (120s default)
410GoneContent permanently deleted
500Internal Server ErrorScraping or processing failure

Code Examples

JavaScript

const apiKey = 'YOUR_API_KEY';
const pageUrl = 'https://example.com/page';

const response = await fetch(
  `https://api2.flying-extract.in/scrape?apiKey=${apiKey}&url=${encodeURIComponent(pageUrl)}&ai=1&proxy=1`
);
const data = await response.json();

if (data.success) {
  console.log('Title:', data.data.title);
  console.log('Body:', data.data.body);
  console.log('Images:', data.data.images);
  console.log('Method:', data.extractionMethod);
} else {
  console.error('Error:', data.error || data.message);
}

Python

import requests

api_key = 'YOUR_API_KEY'
page_url = 'https://example.com/page'

response = requests.get('https://api2.flying-extract.in/scrape', params={
    'apiKey': api_key,
    'url': page_url,
    'ai': '1',
    'proxy': '1'
})

data = response.json()

if data['success']:
    print(f"Title: {data['data']['title']}")
    print(f"Body: {data['data']['body'][:200]}...")
    print(f"Images: {len(data['data']['images'])} found")
    print(f"Method: {data['extractionMethod']}")
else:
    print(f"Error: {data.get('error') or data.get('message')}")

cURL

# Basic request
curl "https://api2.flying-extract.in/scrape?apiKey=YOUR_API_KEY&url=https://example.com/page"

# With AI and proxy enabled
curl "https://api2.flying-extract.in/scrape?apiKey=YOUR_API_KEY&url=https://example.com/page&ai=1&proxy=1"

Support

Need Help?

For issues or questions, please check the following:

  • • Check the validation results in the response
  • • Review contamination detection results
  • • Verify your API key and parameters
  • • Check for proper URL encoding