Observability Data Model
Tracing in Langfuse is a way to log and analyze the execution of your LLM applications. The following reference provides a detailed overview of the data model used. It is inspired by OpenTelemetry.
Traces and Observations
Traces
A trace typically represents a single request or operation.
It contains the overall input and output of the function, as well as metadata about the request ( i.e. user, session, tags, etc.).
Observations
Each trace can contain multiple observations to log the individual steps of the execution. Usually, a trace corresponds to a single api call of an application.
Nesting
Hierarchical structure of traces in Langfuse
Example trace in Langfuse UI
!Trace in Langfuse UI
Types
Langfuse supports a number of LLM application specific observation types:
-
eventis the basic building block. An event is used to track discrete events in a trace. -
spanrepresents durations of units of work in a trace. -
generationlogs generations of AI models incl. prompts, token usage and costs. -
agentdecides on the application flow and can for example use tools with the guidance of a LLM. -
toolrepresents a tool call, for example to a weather API. -
chainis a link between different application steps, like passing context from a retriever to a LLM call. -
retrieverrepresents data retrieval steps, such as a call to a vector store or a database. -
evaluatorrepresents functions that assess relevance/correctness/helpfulness of a LLM’s outputs. -
embeddingis a call to a LLM to generate embeddings and can include model, token usage and costs -
guardrailis a component that protects against malicious content or jailbreaks.
Sessions
Optionally, traces can be grouped into sessions. Sessions are used to group traces that are part of the same user interaction. A common example is a thread in a chat interface.
Please refer to the Sessions documentation to add sessions to your traces.
Optionally, sessions aggregate traces
Example session in Langfuse UI
!Session view
Scores
Scores are flexible objects used to evaluate traces, observations, sessions and dataset runs.
They can be:
- Numeric, categorical, or boolean values
- Associated with a trace, a session, or a dataset run (one and only one is required)
- For trace level scores only: Linked to a specific observation within a trace (optional)
- Annotated with comments for additional context
- Validated against a score configuration schema (optional)
Typically, session-level scores are used for comprehensive evaluation of conversational experiences across multiple interactions, while trace-level scores are used for evaluation of a single interaction. Dataset run level scores are used for overall evaluation of a dataset run, e.g. precision, recall, F1-score.
Please refer to the scores documentation to get started. For more details on score types and attributes, refer to the score data model documentation.
Billable Units
Langfuse Cloud pricing is based on the number of ingested units per billing period.
Units = Traces + Observations + Scores
Use our pricing calculator to estimate your monthly costs based on your expected usage.
FAQ
How can I track my Langfuse Cloud usage? Use the Usage Monitoring Report in the Dashboards tab in Langfuse to analyze your Langfuse Cloud usage.
How can I optimize my Langfuse Cloud usage to reduce cost? If your application scales and you want to optimize Langfuse Cloud cost, please check out this guide.