Oxylabs Document Loaders

🚀

Enhanced

Direct integration with Langfuse tracing

Oxylabs is a web scraping service that retrieves public web data at scale, with tools designed to navigate regional restrictions.

Oxylabs Docuemnt Loader Node

Features

  • Retrieve data from Google, Amazon and any other website
  • Set geolocation
  • Utilize the browser rendering
  • Parse the data
  • Specify User Agent types
  • Process content with text splitters

Required Parameters

  • Connect Credential: Oxylabs API credentials
  • Query: Search query or URL
  • Source: One of the available sources:
    • Universal - scrape any website
    • Google Search - scrape Google Search results
    • Amazon Product - scrape Amazon Product information
    • Amazon Search - scrape Amazon Search results

Optional Parameters

  • Geolocation: Sets the proxy’s geo location to retrieve data. See documentation for more details.
  • Render: Enables JavaScript rendering when set to true.
  • Parse: Returns parsed data when set to true, as long as a dedicated parser exists for the submitted URL’s page type.
  • User Agent Type: Device type and browser.

Outputs

  • Document: Array of document objects containing metadata and pageContent
  • Text: Concatenated string from pageContent of documents

Document Structure

Each document contains:

  • pageContent: Extracted page content