Document Store
🚀
Enhanced
Direct integration with Langfuse tracing
 (1) (1) (1) (1) (1) (1) (1).png)
Document Store Node
The Document Store loader enables you to load data from pre-configured document stores in your database. This loader provides a convenient way to access and utilize previously processed and stored documents in your workflows.
Features
- Load documents from synchronized stores
- Automatic metadata handling
- Multiple output formats
- Asynchronous store selection
- Database integration
- Chunk-based document retrieval
- JSON metadata support
How It Works
-
Store Selection:
- Lists all available document stores that are in ‘SYNC’ status
- Provides store information including name and description
- Allows selection from synchronized stores only
-
Document Retrieval:
- Fetches document chunks from the selected store
- Reconstructs documents with original metadata
- Maintains document structure and relationships
Parameters
Required Parameters
- Select Store: Choose from available synchronized document stores
- Displays store name and description
- Only shows stores in ‘SYNC’ status
- Dynamically updated based on database content
Outputs
The loader provides two output formats:
Document Output
Returns an array of document objects, each containing:
- pageContent: The actual content of the document chunk
- metadata: Original document metadata in JSON format
Text Output
Returns a concatenated string containing:
- All document chunks’ content
- Separated by newlines
- Properly escaped characters
Database Integration
The loader integrates with your database through:
- TypeORM data source connection
- Document store entity management
- Chunk-based storage and retrieval
- Metadata preservation
Document Structure
Each loaded document contains:
{
pageContent: string, // The actual content
metadata: { // Parsed JSON metadata
// Original document metadata
// Store-specific information
// Custom metadata fields
}
}Usage Examples
Basic Store Selection
{
"selectedStore": "store-id-123"
}Accessing Document Content
// Document output format
[
{
"pageContent": "Document content here...",
"metadata": {
"source": "original-file.pdf",
"page": 1,
"category": "reports"
}
}
]
// Text output format
"Document content here...\nNext document content here...\n"Best Practices
- Ensure stores are synchronized before access
- Choose appropriate output format for your use case
- Handle metadata appropriately in your workflow
- Consider chunk size when processing large documents
- Monitor database performance with large stores
Notes
- Only synchronized stores are available for selection
- Metadata is automatically parsed from JSON
- Documents are reconstructed from chunks
- Supports both document and text output formats
- Integrates with TypeORM for database access
- Handles escape characters in text output
- Maintains original document structure
This section is a work in progress. We appreciate any help you can provide in completing this section. Please check our Contribution Guide to get started.