Web Scraper API Documentation

A powerful REST API for web scraping with intelligent content extraction, validation, and cleanup.

Base URL

https://flying-extract.in

Overview

The FlyingExtract API provides intelligent web scraping capabilities with advanced AI processing. It can extract structured data from any webpage with automatic validation, contamination detection, and cleanup to ensure high-quality results.

Key Features

  • • AI-powered content extraction
  • • Automatic validation & cleanup
  • • Contamination detection
  • • Proxy network support
  • • Deletion detection

Supported Content

  • • News articles
  • • Blog posts
  • • Academic papers
  • • Product descriptions
  • • Any structured content

Authentication

All endpoints require API key authentication via query parameter:

?apiKey=YOUR_API_KEY

Note: api_key is also accepted as an alternative parameter name.

Note: Keep your API key secure and never expose it in client-side code. Contact us at hello@flyingstars.co to get your API key.

Endpoints

1. Health Check

GET/

Check if the API is running and view available endpoints.

Parameters: None

2. Scrape Webpage

GET/scrape

Extract article content from a webpage.

Parameters

ParameterTypeRequiredDefaultDescription
apiKeystringYes-Your API key
urlstringYes-The webpage URL to scrape
ai0, 1, 2No0AI extraction mode (see below)
proxy0, 1No0Enable proxy for browser requests

AI Extraction Modes

ModeNameDescription
ai=0Traditional OnlyFast extraction using newspaperjs + Readability. No AI fallback.
ai=1Traditional + AI FallbackTraditional first, falls back to AI if validation fails.
ai=2Forced AI + UnionAlways runs AI extraction and unions with traditional results.

Example Requests

# Basic request

GET https://flying-extract.in/scrape?apiKey=YOUR_API_KEY&url=https://example.com/article

# With AI fallback enabled

GET https://flying-extract.in/scrape?apiKey=YOUR_API_KEY&url=https://example.com/article&ai=1

Response Format

Success Response

{
  "success": true,
  "url": "https://example.com/article",
  "data": {
    "title": "Article Title",
    "subheading": "Article subtitle or null",
    "body": "The complete article body text...",
    "classification": "News Article",
    "author": "Author Name",
    "images": [
      "https://example.com/image1.jpg",
      "https://example.com/image2.jpg"
    ],
    "keywords": "keyword1, keyword2",
    "publishedDate": "2024-01-15"
  },
  "wordCount": 850,
  "validation": {
    "isValid": true,
    "result": "VALID",
    "pageType": "news_article",
    "contentType": "clean_content_fully_extracted",
    "contaminationDetected": []
  },
  "extractionMethod": "traditional"
}

Response Fields

FieldTypeDescription
successbooleanWhether the extraction succeeded
urlstringThe scraped URL
data.titlestringArticle title
data.subheadingstring/nullArticle subtitle if present
data.bodystringFull article body text
data.classificationstringPage type (e.g., "News Article")
data.authorstring/nullAuthor name if found
data.imagesarrayArray of image URLs from the article
data.keywordsstring/nullKeywords if available
data.publishedDatestring/nullPublication date if found
wordCountnumberWord count of the article body
validation.isValidbooleanWhether valid article content was found
validation.resultstring"VALID" or "INVALID"
validation.pageTypestringDetected page type (e.g., news_article, blog_post, video_page)
validation.contentTypestringContent classification
validation.contaminationDetectedarrayList of contamination types found and removed
extractionMethodstring"traditional", "ai_fallback", or "hybrid_union"
cleanedbooleanPresent and true if contamination was removed from body

Error Responses

Missing URL (400)

{
  "error": "Missing required parameter: url",
  "example": "/scrape?url=https://example.com"
}

Invalid API Key (401/403)

{
  "error": "API key required",
  "message": "Please provide an API key as a URL parameter: ?apiKey=YOUR_KEY"
}

Article Not Found (404)

{
  "success": false,
  "deleted": true,
  "statusCode": 404,
  "message": "Article not found (HTTP 404)",
  "url": "https://example.com/article"
}

Article Permanently Deleted (410)

{
  "success": false,
  "deleted": true,
  "statusCode": 410,
  "message": "Article permanently deleted (HTTP 410 Gone)",
  "url": "https://example.com/article"
}

Request Timeout (408)

{
  "success": false,
  "error": "Request timeout",
  "message": "Browser job timeout after 120000ms",
  "url": "https://example.com/article",
  "timeoutSeconds": 120
}

Server Error (500)

{
  "success": false,
  "error": "Failed to scrape the webpage",
  "message": "Error details",
  "url": "https://example.com/article"
}

Content Validation

The API classifies extracted content into these categories to help you determine data quality:

Valid Article Content

clean_article_body_fully_extracted

Complete article with no contamination

clean_article_body_partially_extracted

Article content but may be incomplete

article_extracted_with_possible_contamination

Article present with minor contamination

Invalid Content

cookie_consent_text_only
privacy_policy_text_only
navigation_text_only
advertisement_text_only
error_message_text_only
paywall_text_only

Contamination Detection

The API automatically detects and removes these contamination types:

Navigation Elements

  • • Navigation menus and breadcrumbs
  • • Header and footer content
  • • Sidebar elements and widgets

Content Pollution

  • • Related articles sections
  • • Comments and social share buttons
  • • Advertisement blocks

Technical Elements

  • • JavaScript/CSS code fragments
  • • Multiple article snippets mixed together
  • • Cookie consent banners

Legal Content

  • • Copyright notices
  • • Privacy policy text
  • • Terms of service content

Error Codes

HTTP StatusError TypeDescription
400Bad RequestMissing or invalid URL parameter
401UnauthorizedMissing API key
403ForbiddenInvalid API key
404Not FoundArticle deleted/not found
408Request TimeoutBrowser job timeout (120s default)
410GoneArticle permanently deleted
500Internal Server ErrorScraping or processing failure

Code Examples

JavaScript

const apiKey = 'YOUR_API_KEY';
const articleUrl = 'https://example.com/article';

const response = await fetch(
  `https://flying-extract.in/scrape?apiKey=${apiKey}&url=${encodeURIComponent(articleUrl)}`
);
const data = await response.json();

if (data.success) {
  console.log('Title:', data.data.title);
  console.log('Body:', data.data.body);
  console.log('Images:', data.data.images);
  console.log('Method:', data.extractionMethod);
} else {
  console.error('Error:', data.error || data.message);
}

Python

import requests

api_key = 'YOUR_API_KEY'
article_url = 'https://example.com/article'

response = requests.get('https://flying-extract.in/scrape', params={
    'apiKey': api_key,
    'url': article_url,
    'ai': '1'  # Enable AI fallback
})

data = response.json()

if data['success']:
    print(f"Title: {data['data']['title']}")
    print(f"Body: {data['data']['body'][:200]}...")
    print(f"Images: {len(data['data']['images'])} found")
    print(f"Method: {data['extractionMethod']}")
else:
    print(f"Error: {data.get('error') or data.get('message')}")

cURL

# Basic request
curl "https://flying-extract.in/scrape?apiKey=YOUR_API_KEY&url=https://example.com/article"

# With AI fallback
curl "https://flying-extract.in/scrape?apiKey=YOUR_API_KEY&url=https://example.com/article&ai=1"

Support

Need Help?

For issues or questions, please check the following:

  • • Check the validation results in the response
  • • Review contamination detection results
  • • Verify your API key and parameters
  • • Check for proper URL encoding