Web Scraper API Documentation

A powerful REST API for web scraping with intelligent content extraction, validation, and cleanup.

Base URL

http://api2.flying-extract.in

Overview

The FlyingExtract API provides intelligent web scraping capabilities with advanced AI processing. It can extract structured data from any webpage with automatic validation, contamination detection, and cleanup to ensure high-quality results.

Key Features

  • • AI-powered content extraction
  • • Automatic validation & cleanup
  • • Contamination detection
  • • Proxy network support
  • • Deletion detection

Supported Content

  • • News articles
  • • Blog posts
  • • Academic papers
  • • Product descriptions
  • • Any structured content

Authentication

All endpoints require API key authentication via query parameter:

?apiKey=YOUR_API_KEY

Note: Keep your API key secure and never expose it in client-side code. Contact us at hello@flyingstars.co to get your API key.

Endpoints

1. Health Check

GET/

Check if the API is running and view available endpoints.

Parameters: None

2. Scrape Webpage

GET/scrape

Extract structured data from any webpage with intelligent content validation and cleanup.

Required Parameters

apiKey (string)

Valid API key for authentication

url (string)

Target webpage URL to scrape

Optional Parameters

ai (string)

Set to "1" for enhanced AI-powered extraction, "0" or omit for standard extraction (default)

proxy (string)

Set to "1" to enable proxy routing

Example Requests

Basic extraction:

curl "http://api2.flying-extract.in/scrape?apiKey=abc123&url=https://example.com"

AI-powered extraction:

curl "http://api2.flying-extract.in/scrape?apiKey=abc123&url=https://example.com&ai=1"

With proxy:

curl "http://api2.flying-extract.in/scrape?apiKey=abc123&url=https://example.com&proxy=1"

Response Format

Success Response (Basic)

{
  "success": true,
  "url": "https://example.com",
  "data": {
    "title": "Article Title",
    "subheading": "Article subtitle or null",
    "body": "Complete article body text...",
    "classification": "News Article",
    "author": "Author Name",
    "topImage": "https://example.com/image.jpg",
    "description": "Article description",
    "keywords": "keyword1, keyword2",
    "publishedDate": "2024-01-15"
  },
  "extractionMethod": "standard",
  "validation": {
    "isValid": true,
    "result": "VALID",
    "contentType": "clean_article_body_fully_extracted",
    "contaminationDetected": []
  },
  "cleaned": false
}

Success Response (AI Mode)

{
  "success": true,
  "url": "https://example.com",
  "data": {
    "title": "Article Title",
    "subheading": "Article subtitle",
    "body": "Complete article body text...",
    "classification": "News Article",
    "author": "Author Name",
    "topImage": "https://example.com/image.jpg",
    "description": "Article description",
    "keywords": "keyword1, keyword2",
    "publishedDate": "2024-01-15"
  },
  "extractionMethod": "enhanced_ai",
  "validation": {
    "isValid": true,
    "result": "VALID",
    "contentType": "clean_article_body_fully_extracted",
    "contaminationDetected": []
  },
  "cleaned": true
}

Deleted Content Response

{
  "success": false,
  "deleted": true,
  "statusCode": 404,
  "message": "Article not found (HTTP 404)",
  "url": "https://example.com"
}

Error Response

{
  "success": false,
  "error": "Failed to scrape the webpage",
  "message": "Detailed error description",
  "url": "https://example.com"
}

Content Validation

The API classifies extracted content into these categories to help you determine data quality:

Valid Article Content

clean_article_body_fully_extracted

Complete article with no contamination

clean_article_body_partially_extracted

Article content but may be incomplete

article_extracted_with_possible_contamination

Article present with minor contamination

Invalid Content

cookie_consent_text_only
privacy_policy_text_only
navigation_text_only
advertisement_text_only
error_message_text_only
paywall_text_only

Contamination Detection

The API automatically detects and removes these contamination types:

Navigation Elements

  • • Navigation menus and breadcrumbs
  • • Header and footer content
  • • Sidebar elements and widgets

Content Pollution

  • • Related articles sections
  • • Comments and social share buttons
  • • Advertisement blocks

Technical Elements

  • • JavaScript/CSS code fragments
  • • Multiple article snippets mixed together
  • • Cookie consent banners

Legal Content

  • • Copyright notices
  • • Privacy policy text
  • • Terms of service content

Error Codes

HTTP StatusError TypeDescription
400Bad RequestMissing or invalid URL parameter
401UnauthorizedMissing API key
403ForbiddenInvalid API key
404Not FoundArticle deleted/not found
410GoneArticle permanently deleted
500Internal Server ErrorScraping or processing failure

Code Examples

JavaScript/Node.js

const apiKey = 'your-api-key';
const url = 'https://example.com/article';

const response = await fetch(`http://api2.flying-extract.in/scrape?apiKey=${apiKey}&url=${encodeURIComponent(url)}&ai=1`);
const data = await response.json();

if (data.success) {
  console.log('Title:', data.data.title);
  console.log('Body:', data.data.body);
  console.log('Validation:', data.validation.contentType);
} else {
  console.error('Error:', data.error);
}

Python

import requests
from urllib.parse import quote

api_key = 'your-api-key'
url = 'https://example.com/article'

response = requests.get(f'http://api2.flying-extract.in/scrape', params={
    'apiKey': api_key,
    'url': url,
    'ai': '1'
})

data = response.json()

if data['success']:
    print(f"Title: {data['data']['title']}")
    print(f"Body: {data['data']['body'][:200]}...")
    print(f"Validation: {data['validation']['contentType']}")
else:
    print(f"Error: {data['error']}")

cURL

# Basic extraction
curl "http://api2.flying-extract.in/scrape?apiKey=abc123&url=https://example.com"

# AI extraction
curl "http://api2.flying-extract.in/scrape?apiKey=abc123&url=https://example.com&ai=1"

# Save to file
curl "http://api2.flying-extract.in/scrape?apiKey=abc123&url=https://example.com" > result.json

Support

Need Help?

For issues or questions, please check the following:

  • • Check the validation results in the response
  • • Review contamination detection results
  • • Verify your API key and parameters
  • • Check for proper URL encoding