Oxylabs Document Loaders

🚀

Enhanced

Direct integration with Langfuse tracing

Oxylabs is a web scraping service that retrieves public web data at scale, with tools designed to navigate regional restrictions.

Oxylabs Docuemnt Loader Node

Features

Retrieve data from Google, Amazon and any other website
Set geolocation
Utilize the browser rendering
Parse the data
Specify User Agent types
Process content with text splitters

Required Parameters

Connect Credential: Oxylabs API credentials
Query: Search query or URL
Source: One of the available sources:
- Universal - scrape any website
- Google Search - scrape Google Search results
- Amazon Product - scrape Amazon Product information
- Amazon Search - scrape Amazon Search results

Optional Parameters

Geolocation: Sets the proxy’s geo location to retrieve data. See documentation for more details.
Render: Enables JavaScript rendering when set to true.
Parse: Returns parsed data when set to true, as long as a dedicated parser exists for the submitted URL’s page type.
User Agent Type: Device type and browser.

Outputs

Document: Array of document objects containing metadata and pageContent
Text: Concatenated string from pageContent of documents

Document Structure

Each document contains:

pageContent: Extracted page content

Notion PDF Document Loader