Document Store

🚀

Enhanced

Direct integration with Langfuse tracing

The Document Store loader enables you to load data from pre-configured document stores in your database. This loader provides a convenient way to access and utilize previously processed and stored documents in your workflows.

Features

Load documents from synchronized stores
Automatic metadata handling
Multiple output formats
Asynchronous store selection
Database integration
Chunk-based document retrieval
JSON metadata support

How It Works

Store Selection:
- Lists all available document stores that are in ‘SYNC’ status
- Provides store information including name and description
- Allows selection from synchronized stores only
Document Retrieval:
- Fetches document chunks from the selected store
- Reconstructs documents with original metadata
- Maintains document structure and relationships

Parameters

Required Parameters

Select Store: Choose from available synchronized document stores
- Displays store name and description
- Only shows stores in ‘SYNC’ status
- Dynamically updated based on database content

Outputs

The loader provides two output formats:

Document Output

Returns an array of document objects, each containing:

pageContent: The actual content of the document chunk
metadata: Original document metadata in JSON format

Text Output

Returns a concatenated string containing:

All document chunks’ content
Separated by newlines
Properly escaped characters

Database Integration

The loader integrates with your database through:

TypeORM data source connection
Document store entity management
Chunk-based storage and retrieval
Metadata preservation

Document Structure

Each loaded document contains:

{
  pageContent: string,    // The actual content
  metadata: {            // Parsed JSON metadata
    // Original document metadata
    // Store-specific information
    // Custom metadata fields
  }
}

Usage Examples

Basic Store Selection

{
  "selectedStore": "store-id-123"
}

Accessing Document Content

// Document output format
[
  {
    "pageContent": "Document content here...",
    "metadata": {
      "source": "original-file.pdf",
      "page": 1,
      "category": "reports"
    }
  }
]
 
// Text output format
"Document content here...\nNext document content here...\n"

Best Practices

Ensure stores are synchronized before access
Choose appropriate output format for your use case
Handle metadata appropriately in your workflow
Consider chunk size when processing large documents
Monitor database performance with large stores

Notes

Only synchronized stores are available for selection
Metadata is automatically parsed from JSON
Documents are reconstructed from chunks
Supports both document and text output formats
Integrates with TypeORM for database access
Handles escape characters in text output
Maintains original document structure

This section is a work in progress. We appreciate any help you can provide in completing this section. Please check our Contribution Guide to get started.

Custom Document Loader DOCX Files