GitBook

🚀

Enhanced

Direct integration with Langfuse tracing

GitBook Node

GitBook Document Loader

GitBook is a modern documentation platform that helps teams share knowledge. This module provides functionality to load and process content from GitBook documentation sites.

This module provides a sophisticated GitBook document loader that can:

  • Load content from specific GitBook pages
  • Crawl entire GitBook documentation sites
  • Extract structured content
  • Process content with text splitters
  • Customize metadata extraction
  • Handle recursive page loading

Inputs

Required Parameters

Optional Parameters

  • Should Load All Paths: Whether to recursively load all pages from the root path
  • Text Splitter: A text splitter to process the extracted content
  • Additional Metadata: JSON object with additional metadata
  • Omit Metadata Keys: Comma-separated list of metadata keys to omit

Outputs

  • Document: Array of document objects containing metadata and pageContent
  • Text: Concatenated string from pageContent of documents

Features

  • Single page loading
  • Recursive site crawling
  • Content extraction
  • Text splitting support
  • Metadata customization
  • Error handling
  • Path management

Loading Modes

Single Page Mode

  • Loads content from a specific page
  • Extracts page content and metadata
  • Preserves page structure
  • Faster for single page access

All Paths Mode

  • Recursively loads all pages from root
  • Maintains site hierarchy
  • Extracts all available content
  • Preserves navigation structure

Document Structure

Each document contains:

  • pageContent: Extracted content from the page
  • metadata:
    • title: Page title
    • url: Original page URL
    • Additional custom metadata

Notes

  • Supports both single page and full site loading
  • Handles GitBook’s dynamic content
  • Preserves document structure
  • Supports custom metadata addition
  • Error handling for invalid URLs
  • Memory-efficient processing
  • Flexible output formats