PebbleObserve - AI UsageLangfuse Overview

Langfuse Overview

This page is the original Langfuse project overview, synchronised from the upstream Langfuse documentation. It is kept here as reference for the underlying open-source observability platform that powers PebbleObserve. For the PebbleAI-specific framing of what you can monitor and measure, see PebbleObserve - AI Usage.

Langfuse is an open-source LLM engineering platform (GitHub) that helps teams collaboratively debug, analyze, and iterate on their LLM applications. All platform features are natively integrated to accelerate the development workflow. Langfuse is open, self-hostable, and extensible (why langfuse?).

Observability

Observability is essential for understanding and debugging LLM applications. Unlike traditional software, LLM applications involve complex, non-deterministic interactions that can be challenging to monitor and debug. Langfuse provides comprehensive tracing capabilities that help you understand exactly what’s happening in your application.

  • Traces include all LLM and non-LLM calls, including retrieval, embedding, API calls, and more
  • Support for tracking multi-turn conversations as sessions and user tracking
  • Agents can be represented as graphs
  • Capture traces via native SDKs for Python/JS, 50+ library/framework integrations, OpenTelemetry, or via an LLM Gateway such as LiteLLM
  • Based on OpenTelemetry to increase compatibility and reduce vendor lock-in

Prompt Management

Prompt Management is critical in building effective LLM applications. Langfuse provides tools to help you manage, version, and optimize your prompts throughout the development lifecycle.

  • Get started with prompt management
  • Manage, version, and optimize your prompts throughout the development lifecycle
  • Test prompts interactively in the LLM Playground
  • Run Experiments against datasets to test new prompt versions directly within Langfuse

Evaluation

Evaluation is crucial for ensuring the quality and reliability of your LLM applications. Langfuse provides flexible evaluation tools that adapt to your specific needs, whether you’re testing in development or monitoring production performance.

  • Get started with different evaluation methods: LLM-as-a-judge, user feedback, manual labeling, or custom
  • Identify issues early by running evaluations on production traces
  • Create and manage Datasets for systematic testing in development that ensure your application performs reliably across different scenarios
  • Run Experiments to systematically test your LLM application

Where to start in Langfuse

Setting up the full process of online tracing, prompt management, production evaluations to identify issues, and offline evaluations on datasets takes some time. The right starting point depends on your use case:

  • If you want to understand what your agents are doing in production, start with Observability.
  • If you want to version, test, and roll out prompts, start with Prompt Management.
  • If you want to systematically measure output quality, start with Evaluation.

Why Langfuse?

  • Open source: Fully open source with public API for custom integrations
  • Production optimised: Designed with minimal performance overhead
  • Best-in-class SDKs: Native SDKs for Python and JavaScript
  • Framework support: Integrated with popular frameworks like OpenAI SDK, LangChain, and LlamaIndex
  • Multi-modal: Support for tracing text, images and other modalities
  • Full platform: Suite of tools for the complete LLM application development lifecycle