File

🚀

Enhanced

Direct integration with Langfuse tracing

The File Loader is a versatile document loader that supports multiple file formats including TXT, JSON, CSV, DOCX, PDF, Excel, PowerPoint, and more. This module provides a unified interface for loading and processing various file types.

This module provides a sophisticated file loader that can:

  • Load multiple file formats
  • Support both base64-encoded files and files from storage
  • Handle PDF-specific processing options
  • Process JSON and JSONL with pointer extraction
  • Support text splitting
  • Customize metadata extraction
  • Handle file storage integration

Inputs

Required Parameters

  • File: The file(s) to process (supports multiple formats)

Optional Parameters

  • Text Splitter: A text splitter to process the extracted content
  • PDF Usage: Choose between:
    • One document per page
    • One document per file
  • Use Legacy Build: Use legacy build for PDF compatibility issues
  • JSONL Pointer Extraction: Pointer name for JSONL files
  • Additional Metadata: JSON object with additional metadata
  • Omit Metadata Keys: Comma-separated list of metadata keys to omit

Outputs

  • Document: Array of document objects containing metadata and pageContent
  • Text: Concatenated string from pageContent of documents

Supported File Types

  • Text Files (.txt)
  • JSON Files (.json)
  • JSONL Files (.jsonl)
  • CSV Files (.csv)
  • PDF Files (.pdf)
  • Word Documents (.docx)
  • Excel Files (.xlsx, .xls)
  • PowerPoint Files (.pptx, .ppt)
  • And more…

Features

  • Multi-format support
  • Storage integration
  • PDF processing options
  • JSON pointer extraction
  • Text splitting support
  • Metadata customization
  • Error handling
  • MIME type detection

File Processing Options

PDF Processing

  • Per-page splitting
  • Single document mode
  • Legacy build support
  • OCR compatibility

JSON/JSONL Processing

  • Pointer-based extraction
  • Structured data handling
  • Array processing
  • Nested object support

Notes

  • Automatically detects file type
  • Handles multiple files simultaneously
  • Supports file storage integration
  • Preserves file metadata
  • Handles large files efficiently
  • Error handling for invalid files
  • Memory-efficient processing