Microsoft Excel Document Loader
🚀
Enhanced
Direct integration with Langfuse tracing
.png)
Microsoft Excel is a spreadsheet program that features calculation tools, pivot tables, and a macro programming language. This module provides functionality to load and process Excel files using SheetJS.
This module provides a sophisticated Excel document loader that can:
- Load multiple Excel file formats
- Process multiple worksheets
- Convert rows to structured documents
- Handle various data types
- Preserve cell formatting
- Extract metadata per row
- Support type inference
Inputs
Required Parameters
- Excel File: The Excel file(s) to process (.xls, .xlsx, .xlsm, .xlsb)
Optional Parameters
- Text Splitter: A text splitter to process the extracted content
- Additional Metadata: JSON object with additional metadata
- Omit Metadata Keys: Comma-separated list of metadata keys to omit
Outputs
- Document: Array of document objects containing metadata and pageContent
- Text: Concatenated string from pageContent of documents
Features
- Multiple format support
- Multi-sheet processing
- Data type preservation
- Metadata extraction
- Type inference
- Error handling
- Memory-efficient processing
Supported Formats
- Excel Binary (.xls)
- Excel Workbook (.xlsx)
- Excel Macro-Enabled (.xlsm)
- Excel Binary Workbook (.xlsb)
Data Type Handling
Supported Types
- Text (string)
- Numbers (number)
- Dates (date)
- Booleans (boolean)
- Formulas (calculated values)
- Empty cells (null)
Document Structure
Each document contains:
- pageContent: Formatted row content as key-value pairs
- metadata:
- worksheet: Sheet name
- rowNum: Row index
- Original column values
- Additional custom metadata
Row Processing
Each row is converted to a document with:
- Key-value pairs for each cell
- Preserved column headers
- Type information
- Row position
Metadata Attributes
Default attributes include:
- worksheet: Sheet or Worksheet Name (string)
- rowNum: Row index (number)
- Dynamic attributes based on column headers
Notes
- Uses SheetJS for parsing
- Preserves data types
- Handles multiple sheets
- Infers column types
- Memory-efficient processing
- Error handling for invalid files
- Flexible output formats
- Column type inference