FireCrawl

🚀

Enhanced

Direct integration with Langfuse tracing

FireCrawl Node

FireCrawl Document Loader

FireCrawl is a powerful web crawling and scraping service that provides advanced capabilities for extracting content from websites. This module enables loading and processing web content through the FireCrawl API.

This module provides a sophisticated web crawler that can:

  • Scrape single web pages
  • Crawl entire websites
  • Extract structured data
  • Handle JavaScript-rendered content
  • Process content with text splitters
  • Customize metadata extraction
  • Support multiple operation modes

Inputs

Required Parameters

  • URL: The webpage or website URL to process
  • Connect Credential: FireCrawl API credentials
  • Mode: Choose between:
    • Scrape: Single page extraction
    • Crawl: Multi-page website crawling
    • Extract: Structured data extraction

Optional Parameters

  • Text Splitter: A text splitter to process the extracted content
  • Scrape Options:
    • Include Tags: HTML tags to include
    • Exclude Tags: HTML tags to exclude
    • Mobile: Use mobile user agent
    • Skip TLS Verification: Bypass SSL checks
    • Timeout: Request timeout
  • Additional Metadata: JSON object with additional metadata
  • Omit Metadata Keys: Comma-separated list of metadata keys to omit

Outputs

  • Document: Array of document objects containing metadata and pageContent
  • Text: Concatenated string from pageContent of documents

Features

  • Multiple operation modes
  • Advanced scraping options
  • Structured data extraction
  • JavaScript rendering
  • Mobile device emulation
  • Custom timeout settings
  • Error handling

Operation Modes

Scrape Mode

  • Single page processing
  • Main content extraction
  • Format selection
  • Custom tag filtering

Crawl Mode

  • Multi-page crawling
  • Subdomain handling
  • Sitemap processing
  • Link extraction

Extract Mode

  • Structured data extraction
  • Schema-based parsing
  • LLM-powered extraction
  • Custom extraction prompts

Document Structure

Each document contains:

  • pageContent: Extracted content in markdown format
  • metadata:
    • title: Page title
    • description: Meta description
    • language: Content language
    • sourceURL: Original URL
    • Additional custom metadata

Notes

  • Requires valid FireCrawl API key
  • Supports multiple content formats
  • Handles rate limiting
  • Job status monitoring
  • Error handling and retries
  • Customizable request options
  • Memory-efficient processing