Microsoft SharePoint
Enhanced
Added functionality
Microsoft SharePoint is a web-based collaboration and document management platform. This module provides a SharePoint document loader that ingests documents from both SharePoint Online (Microsoft 365) and SharePoint Server (on-premises / IaaS-hosted) into PebbleAgent’s vector stores for AI-powered search and retrieval.
Key capabilities:
- SharePoint Online and SharePoint Server (2016+) support via the Environment selector
- Load documents from standard SharePoint document libraries (folders and files)
- Load items from SharePoint lists with rich metadata
- Navigate specific folders within libraries
- Filter by status, file type, date, size, and custom metadata fields
- Handle large document sets (thousands of files) with batch processing and memory management
- Five authentication methods covering cloud, hybrid, and on-premises deployments
Inputs
Required Parameters
| Parameter | Description |
|---|---|
| Connect Credential | A SharePoint credential. The available credential types depend on your environment — see Authentication. |
| Environment | Choose your SharePoint deployment type: SharePoint Online (Microsoft 365, uses Graph API) or SharePoint Server (2016+ on-premises or IaaS-hosted, uses REST API). Defaults to SharePoint Online. |
| Mode | Choose how to load documents: Document Library (files and folders) or SharePoint List (structured table with metadata columns). Controls which fields appear below. |
Mode-Dependent Parameters
| Parameter | Shown When | Description |
|---|---|---|
| Library Name | Document Library mode | Display name of the library (e.g., Documents). Leave empty for the site’s default library. |
| Folder Path | Document Library mode | Path to a specific folder (e.g., /Reports/2024). Leave empty to load from the library root. |
| List Name | SharePoint List mode | Name of the SharePoint list (e.g., Master, ControlDocument). |
Optional Parameters
| Parameter | Description |
|---|---|
| Filter | Semi-colon separated key=value pairs to filter documents (e.g., RFCStatus=Current; fileTypes=pdf,docx). See Filtering. |
| Max Documents | Maximum number of documents to load (default: 500, max: 10,000). |
| Text Splitter | A text splitter node to chunk the extracted content. |
| Additional Metadata | JSON object with extra metadata to add to all documents. |
| Omit Metadata Keys | Comma-separated list of metadata keys to exclude. Use * to omit all default metadata. |
Additional Parameters
| Parameter | Description |
|---|---|
| Batch Size | Documents processed per batch (default: 100, range: 10–500). Reduce for large files, increase for small files. |
Outputs
| Output | Description |
|---|---|
| Document | Array of document objects containing metadata and pageContent |
| Text | Concatenated string from pageContent of documents |
Supported File Types
| Format | Extensions |
|---|---|
.pdf | |
| Word | .docx, .doc |
| Excel | .xlsx, .xls |
| Text | .txt |
| CSV | .csv |
Authentication
PebbleAgent supports five authentication methods across SharePoint Online and SharePoint Server environments. The credential you choose determines which Environment setting to use.
| Environment | Credential | Best For |
|---|---|---|
| SharePoint Online | Device Code Flow | Quick setup, no admin needed |
| SharePoint Online | OAuth2 (Enterprise) | Centralised admin control |
| SharePoint Server | Windows NTLM | Simple domain auth, widest compatibility |
| SharePoint Server | Microsoft ADFS OAuth2 | Token-based with MFA support |
| SharePoint Server | Windows Kerberos | SSO with per-user permission preservation |
SharePoint Online Credentials
Device Code Flow
A simple, secure OAuth2 method that requires no Azure AD app registration and no admin approval.
How it works:
- PebbleAgent displays a code (e.g.,
A1B2-C3D4) and a link - You open a browser and go to microsoft.com/devicelogin
- You enter the code and sign in with your Microsoft account
- PebbleAgent receives permission to access your SharePoint files
Setting up the credential:
- Navigate to Credentials in PebbleAgent
- Click ”+ Add Credential”
- Select “Microsoft SharePoint Device Code Authentication”
- Enter a descriptive name (e.g., “SharePoint – DocCentre”)
- Click “Save”
- Click the “Authenticate” button next to your credential
- PebbleAgent displays the device code and link — open the link, enter the code, and sign in
- Complete Multi-Factor Authentication if required by your organisation
- Click “Accept” on the permission request
The permission dialog shows “Azure CLI” as the app name. This is expected — PebbleAgent uses Microsoft’s public Azure CLI client ID to enable device code flow without requiring an Azure AD app registration.
OAuth2 (Enterprise)
Standard OAuth2 Authorization Code Flow for organisations that require a custom Azure AD app registration.
Prerequisites:
- An Azure AD admin must create an app registration with the following:
- Redirect URI configured for your PebbleAgent instance
- API permissions:
Sites.Read.All,Files.Read.All,User.Read,openid,offline_access - A client secret generated for the app
Setting up the credential:
- Navigate to Credentials in PebbleAgent
- Click ”+ Add Credential”
- Select “Microsoft SharePoint OAuth2”
- Enter the following from your Azure AD app registration:
- Authorization URL:
https://login.microsoftonline.com/<tenantId>/oauth2/v2.0/authorize - Access Token URL:
https://login.microsoftonline.com/<tenantId>/oauth2/v2.0/token - Client ID: From your app registration
- Client Secret: From your app registration
- Site URL: Your SharePoint site URL (e.g.,
https://contoso.sharepoint.com/sites/MySite)
- Authorization URL:
- Click “Save” and complete the OAuth2 authorisation flow
Online Permissions
Both SharePoint Online methods request the same read-only permissions:
| Permission | Microsoft Name | Description |
|---|---|---|
| Read files | Files.Read.All | Read any file you have permission to access in SharePoint |
| Read sites | Sites.Read.All | See which SharePoint sites you have access to |
| Maintain access | offline_access | Refresh the access token without re-authenticating |
PebbleAgent can only access files and sites that the authenticated user already has permission to.
Online Token Lifecycle
- Access token expires every hour and is refreshed automatically
- Refresh token is valid for 90 days and renews each time it is used
- If the refresh token expires (90 days of inactivity), you will need to re-authenticate
SharePoint Server Credentials
For on-premises or IaaS-hosted SharePoint Server 2016+, PebbleAgent connects via the SharePoint REST API instead of Microsoft Graph. Three authentication methods are available.
Windows NTLM
The simplest option for SharePoint Server. Uses Windows domain credentials directly.
Setting up the credential:
- Navigate to Credentials in PebbleAgent
- Click ”+ Add Credential”
- Select “Windows NTLM Authentication”
- Enter:
- Base URL: Your SharePoint Server site URL (e.g.,
https://sharepoint.contoso.local/sites/DocCentre) - Domain: Your Windows domain (e.g.,
CONTOSO) - Username: Your Windows username (without domain prefix)
- Password: Your Windows domain password
- Base URL: Your SharePoint Server site URL (e.g.,
- Click “Save”
NTLM does not support MFA. If your organisation requires multi-factor authentication, use ADFS OAuth2 or Kerberos instead.
Optional: Enable Allow Self-Signed Certificates (under Additional Parameters) if the server uses an internal CA or self-signed certificate.
Microsoft ADFS OAuth2
Token-based authentication via Active Directory Federation Services. Supports MFA and device code flow — no passwords stored in PebbleAgent.
Prerequisites:
- ADFS deployed with OAuth2/OpenID Connect endpoints
- Application registered as a relying party trust in ADFS (your ADFS admin can set this up)
- ADFS admin provides: the metadata URL and a client ID
Setting up the credential:
- Navigate to Credentials in PebbleAgent
- Click ”+ Add Credential”
- Select “Microsoft ADFS OAuth2”
- Enter:
- Base URL: Your SharePoint Server site URL (e.g.,
https://sharepoint.contoso.local/sites/DocCentre) - ADFS Metadata URL: The OpenID Connect discovery endpoint (e.g.,
https://adfs.contoso.com/adfs/.well-known/openid-configuration) - Client ID: From your ADFS relying party trust registration
- Base URL: Your SharePoint Server site URL (e.g.,
- Click “Save” and complete the ADFS device code flow authentication
Windows Kerberos
Kerberos Constrained Delegation preserves the full SSO chain and per-user permissions. The most secure option but requires the most AD admin setup.
Prerequisites:
- AD admin creates a service principal (SPN) for PebbleAgent
- AD admin configures constrained delegation to the SharePoint service
- AD admin generates and provides a keytab file
- Keytab file placed on the PebbleAgent server at a known path
Setting up the credential:
- Navigate to Credentials in PebbleAgent
- Click ”+ Add Credential”
- Select “Windows Kerberos Authentication”
- Enter:
- Base URL: Your SharePoint Server site URL (e.g.,
https://sharepoint.contoso.local/sites/DocCentre) - Service Principal Name (SPN): The Kerberos SPN registered in AD (e.g.,
HTTP/pebbleagent.contoso.local) - Keytab File Path: Absolute path to the keytab file on the server (e.g.,
/etc/krb5/pebbleagent.keytab)
- Base URL: Your SharePoint Server site URL (e.g.,
- Click “Save”
Server Authentication Comparison
| Feature | NTLM | ADFS OAuth2 | Kerberos |
|---|---|---|---|
| MFA support | No | Yes | Yes (via AD) |
| Passwords stored | Yes (encrypted) | No (token-based) | No (keytab-based) |
| AD admin setup | None | Moderate | Most |
| Per-user permissions | Yes | Yes | Yes (strongest) |
| Self-signed certs | Supported | Supported | Supported |
SharePoint Source Types
SharePoint has two ways to store documents. The Mode dropdown controls which fields appear and how the loader connects to SharePoint.
How to tell the difference: Go to your SharePoint site → Site Contents (gear icon → Site Contents). Each item is labelled as either “Document Library” or “List”.
Document Libraries
Standard file storage, similar to folders on your computer.
SharePoint Site
└── Document Library ("Documents")
├── Folder A
│ ├── file1.pdf
│ └── file2.docx
└── Folder B
└── file3.xlsxUse when you have a standard document library with files organised in folders. The URL typically contains /Shared Documents/ or a similar path.
Use the display name (what you see in the SharePoint UI), not the URL name. For example, use Documents instead of Shared Documents. You can find the display name in Library settings under the gear icon, or in Site Contents.
SharePoint Lists
Database-like storage with rich metadata fields.
SharePoint Site
└── List ("ControlDocument")
├── Item 1 (Status: Current, Author: John)
├── Item 2 (Status: Archived, Author: Jane)
└── Item 3 (Status: Draft, Author: Bob)Use when your documents are stored in a list with custom metadata fields like status, author, or category. The URL typically contains /Lists/.
Folder Path
Navigate to a specific folder within a document library. Only available in Document Library mode.
| Folder Path | Description |
|---|---|
/Reports | Load from “Reports” folder |
/Reports/2024 | Load from “2024” subfolder |
/General/Business Development/Marketing | Deep folder path |
| (empty) | Load from library root (all documents) |
- Use forward slashes
/ - Match folder names exactly (case-sensitive)
- Don’t include the library name in the path
Filtering
Filter documents using semi-colon separated key=value pairs in the Filter field.
Supported Filters
| Filter | Example | Description |
|---|---|---|
RFCStatus | RFCStatus=Current | Filter by status field (SharePoint lists) |
fileTypes | fileTypes=pdf,docx | Only load specific file types |
maxSize | maxSize=50MB | Skip files larger than limit |
modifiedAfter | modifiedAfter=2024-01-01 | Only files modified after date |
modifiedBefore | modifiedBefore=2024-12-31 | Only files modified before date |
Combined Example
RFCStatus=Current; fileTypes=pdf,docx; maxSize=25MB; modifiedAfter=2024-01-01Examples
Load from a Document Library (Online)
Load all documents from the default “Documents” library in SharePoint Online.
| Field | Value |
|---|---|
| Environment | SharePoint Online |
| Mode | Document Library |
| Library Name | Documents |
Leave Library Name empty to use the site’s default library.
Load from a Specific Folder
Load marketing materials from a nested folder.
| Field | Value |
|---|---|
| Environment | SharePoint Online |
| Mode | Document Library |
| Library Name | Documents |
| Folder Path | /General/Business Development/Marketing |
Load from a SharePoint List with Status Filter
Load only “Current” documents from a document control system.
| Field | Value |
|---|---|
| Environment | SharePoint Online |
| Mode | SharePoint List |
| List Name | ControlDocument |
| Filter | RFCStatus=Current |
Load PDFs Modified This Year
| Field | Value |
|---|---|
| Environment | SharePoint Online |
| Mode | Document Library |
| Library Name | Documents |
| Folder Path | /Policies |
| Filter | fileTypes=pdf; modifiedAfter=2025-01-01 |
Large Document Set with Limits
Load up to 5000 documents from a large library in small batches.
| Field | Value |
|---|---|
| Environment | SharePoint Online |
| Mode | Document Library |
| Library Name | Archive |
| Max Documents | 5000 |
| Batch Size | 50 |
| Filter | maxSize=25MB |
Load from SharePoint Server (On-Premises)
Load policies from an on-premises SharePoint Server using NTLM authentication.
| Field | Value |
|---|---|
| Credential | Windows NTLM Authentication |
| Environment | SharePoint Server |
| Mode | Document Library |
| Library Name | Policies |
| Folder Path | /Current |
| Filter | fileTypes=pdf,docx |
Large-Scale Ingestion (1,000+ Documents)
For SharePoint sites with thousands of documents, use the multiple loader instances strategy. Instead of one loader with Max Documents: 10000, add several SharePoint loaders to the same Document Store, each scoped to a different subset.
Why Multiple Loaders?
- Each loader processes independently — one failing doesn’t affect others
- Each loader has its own status indicator in the Document Store table
- Individual loaders can be refreshed without re-processing everything
- Memory stays manageable (~500MB per 500-document loader)
Splitting Strategies
| Strategy | Filter Example | Best For |
|---|---|---|
| By folder | Folder Path: /HR/Policies | Libraries with folder structure |
| By status | Filter: RFCStatus=Current | Lists with status fields |
| By file type | Filter: fileTypes=pdf | Mixed-format libraries |
| By date range | Filter: modifiedAfter=2025-01-01 | Incremental ingestion |
| By Max Documents | Max Documents: 500 (multiple loaders) | Simple numerical partitioning |
Monitoring Progress
Server logs show batch progress and memory usage:
[SharePoint] Processing batch 3/5 (items 201-300 of 500)
[SharePoint] Batch 3 complete. Memory: 450MB / 2048MB heapIf memory exceeds 85% of heap, processing stops gracefully with a warning.
Vector Store Cleanup
Deleting a loader does NOT remove its embeddings from the vector store. To properly clean up:
- Configure a Record Manager when upserting to the vector store
- Use
fullcleanup mode to remove embeddings for documents no longer in the Document Store - After deleting a loader, re-upsert with Record Manager to trigger cleanup
For ongoing updates, use modifiedAfter filters with incremental cleanup mode in the Record Manager so that updated documents are automatically replaced.
Security and Privacy
What Gets Stored
All credentials are stored encrypted at rest. The specific data varies by type:
| Credential Type | Stored Data |
|---|---|
| Device Code Flow | Access token, refresh token, token expiry, user email, scopes |
| OAuth2 (Enterprise) | Client ID, client secret, access token, refresh token, tenant URLs |
| Windows NTLM | Domain, username, password, base URL |
| ADFS OAuth2 | Client ID, ADFS metadata URL, access token, base URL |
| Kerberos | SPN, keytab file path, base URL |
PebbleAgent does not store device codes or Microsoft login passwords. SharePoint documents are only accessed when loading into a vector store.
Revoking Access
In PebbleAgent: Go to Credentials, find your SharePoint credential, and click “Delete”.
For SharePoint Online:
- Device Code Flow: Go to account.microsoft.com/privacy/app-access, find “Azure CLI”, and click “Remove”.
- OAuth2: An Azure AD admin can revoke application access or delete the app registration.
For SharePoint Server:
- NTLM / Kerberos: Deleting the credential in PebbleAgent is sufficient. Optionally, an AD admin can disable the account or revoke the keytab.
- ADFS OAuth2: An ADFS admin can revoke the relying party trust or the user’s session.
Troubleshooting
”Document library not found”
You may be using the URL name instead of the display name. Try Documents instead of Shared Documents. Leave the field empty to use the default library. The error message lists available libraries.
”Access denied” or “403 Forbidden”
Verify you can access the SharePoint site in a browser, check your credential is for the correct site URL, and re-authenticate if the token may have expired.
”Site not found”
Check the Site URL in your credential matches exactly. Include the full path (https://company.sharepoint.com/sites/MySite) without trailing slashes or paths beyond the site name.
No documents found
Try removing the folder path to load from the root, simplify or remove filters, and verify the folder exists in SharePoint. Folder paths are case-sensitive.
Documents loading slowly or memory errors
Reduce Max Documents and Batch Size (try 25–50). Add a maxSize=25MB filter to skip very large files. Use more specific folder paths. For very large sets, see Large-Scale Ingestion.
”Device code expired”
The code must be entered within 15 minutes. Click “Authenticate” again to get a new code. Prepare by opening microsoft.com/devicelogin beforehand.
”Need admin approval”
Your organisation requires admin consent for applications. Contact your IT department and ask them to enable user consent in Azure AD, or configure a custom OAuth2 credential with pre-approved permissions.
”Waiting for authentication…” never completes
Verify you clicked “Accept” in the browser and saw the “You’re all set!” message. If stuck, click “Cancel” and try again. If your server cannot reach login.microsoftonline.com, check outbound network/firewall rules.
Best Practices
- Start small: Begin with Max Documents at 100 to verify configuration, then increase gradually
- Be specific with paths: Target specific folders rather than loading entire libraries
- Filter large libraries: Use
fileTypes,maxSize, andmodifiedAfterfilters to narrow results - Monitor token expiration: Check credential status periodically and re-authenticate before the 90-day refresh token expiry
- Use multiple loaders for scale: For 1,000+ documents, split across multiple loader instances in the same Document Store
FAQs
Why does the permission dialog say “Azure CLI”? PebbleAgent uses Microsoft’s public Azure CLI client ID to enable device code flow without requiring an Azure AD app registration. This is standard practice and secure.
When should I use OAuth2 instead of Device Code Flow? Use OAuth2 when your organisation requires a custom Azure AD app registration, needs centralised admin control over app permissions, or has disabled device code flow via conditional access policies.
Can I use one credential for multiple SharePoint sites? Yes. One credential can access all SharePoint sites the authenticated user has permission to. Configure multiple document stores or loaders pointing to different sites using the same credential.
What happens if I change my Microsoft password? Your refresh token continues to work unless your organisation enforces re-authentication after password changes.
Can multiple users share one credential? No. Each credential is tied to one Microsoft account. Different users should create their own credentials.
Does this work with SAML or Okta SSO? Yes, as long as your identity provider federates with Azure AD. The device code flow redirects through your organisation’s SSO.
Can I authenticate from a headless server? Yes. The server displays the code and you open the browser on any device (laptop, phone, etc.) to complete authentication. This is one of the key advantages of device code flow.
Does this work with on-premises SharePoint Server? Yes. Set the Environment to “SharePoint Server” and use one of the three Server credential types (NTLM, ADFS OAuth2, or Kerberos). SharePoint Server 2016 and later are supported via the SharePoint REST API.
Which SharePoint Server auth method should I choose? Start with NTLM for the simplest setup. Use ADFS OAuth2 if your organisation requires MFA or you prefer not to store passwords. Use Kerberos for the strongest security with full SSO chain preservation — but it requires the most AD admin setup.