Langfuse

Name: Langfuse
Rating: 8.5 (1 reviews)

🇩🇪

Open-source LLM observability, tracing, and prompt management platform

8.5/10

EU-BuiltGDPREU DataOpen SourceFree Tier

Langfuse is a Berlin-based open-source LLM engineering platform offering tracing, evaluations, prompt management, and analytics for teams building production AI applications. Backed by Y Combinator and Lightspeed, it is used by thousands of teams to debug, monitor, and improve LLM-powered products.

Headquarters

Berlin, Germany

Founded

2023

Pricing

Freemium

EU Data Hosting

Yes

Employees

11-50

Open Source

Yes

llm-observabilityopen-sourceself-hostablegdpr-complianttracingprompt-managementevals

Ratings

Ease of Use7.5

Feature Depth9.0

Value for Money8.5

EU Compliance9.5

Support Quality7.5

Integration Ecosystem8.0

Features

Core Features

✓LLM tracing with nested spans, inputs, outputs, and token usage
✓Session and user-level analytics for multi-turn conversations
✓Prompt management with versioning, labels, and rollback
✓Datasets for regression testing and benchmark runs
✓LLM-as-judge and custom Python evaluators
✓Human annotation queues for quality review
✓Cost and latency dashboards across models and features
✓Real-time metrics and alerting on traces

Standout Features

★Fully open-source (MIT) with self-hosted option — rare in the observability space
★EU cloud region hosted in Frankfurt on AWS
★Native prompt management tied directly to traces and evals
★OpenTelemetry-compatible ingestion endpoint
★Integrations with LangChain, LlamaIndex, Haystack, and Vercel AI SDK

Compliance

☖GDPR compliant (German company)
☖EU data residency available (Frankfurt)
☖SOC 2 Type II certified
☖ISO 27001 certified
☖Self-hosted option for full data sovereignty
☖Data Processing Agreement available

Pricing

Hobby (Self-hosted & Cloud)

Free

50K observations/month
2 users, 30 days data retention
All core features
Community support

Core

$59/mo

100K observations included
Unlimited users and projects
90 days data retention
Email support

Pro

$199/mo

100K observations included
Unlimited data retention
SSO and team RBAC
Prompt management and evals
Priority support

Enterprise

Contact Sales

Self-hosted deployment support
SLA and dedicated support
SOC 2 reports and DPA
Custom data retention and volumes

Billing: monthly, annual

Integrations & API

LangChainLlamaIndexOpenAIAnthropicMistral AIVercel AI SDKHaystackLiteLLMOpenTelemetryHugging Face

API AvailableWebhook Support

Support

EmailChatGithubDiscordDocs: ExcellentCommunity Forum

Pros

✓Fully open-source (MIT) with a first-class self-hosted option — run the entire platform inside your own VPC with zero data leaving your environment
✓Native EU cloud region hosted in Frankfurt with GDPR-compliant DPA available out of the box, critical for European teams building GenAI products
✓Framework-agnostic SDKs for Python, TypeScript, and REST integrate with LangChain, LlamaIndex, OpenAI, Anthropic, and Vercel AI SDK in a few lines of code
✓Integrated prompt management with versioning, A/B testing, and instant rollbacks eliminates the need for a separate prompt registry
✓Powerful evals and scoring system supports LLM-as-judge, human annotation, and custom Python evaluators, all tied to traces for closed-loop improvement

Cons

✕Steep learning curve — the feature surface (traces, sessions, datasets, evals, prompts) is broad and documentation assumes familiarity with LLM engineering concepts
✕Self-hosted deployment requires running Postgres, ClickHouse, Redis, and MinIO together — significantly more operational overhead than a single-container solution
✕Usage-based cloud pricing can become expensive at high trace volumes, with the Core plan capped at 100K observations/month before usage charges kick in
✕Integration with non-LLM observability stacks (Datadog, Grafana, OpenTelemetry) is improving but still less mature than general-purpose APM tools

Frequently Asked Questions

Yes. Langfuse GmbH is a German company subject to EU data protection law. Langfuse Cloud offers an EU region hosted in Frankfurt, a standard Data Processing Agreement, and is SOC 2 Type II and ISO 27001 certified. For maximum control, the self-hosted option keeps data entirely inside your infrastructure.

Yes. Langfuse is fully open-source under the MIT licence. You can deploy the entire stack via Docker Compose, Helm on Kubernetes, or Terraform. Self-hosting requires Postgres, ClickHouse, Redis, and an S3-compatible object store such as MinIO.

Langfuse is open-source, framework-agnostic, and offers both a managed EU cloud and a self-hosted option. LangSmith is a closed-source US-hosted product tightly coupled to LangChain. Teams requiring data sovereignty or working outside the LangChain ecosystem usually prefer Langfuse.

Yes. The Hobby plan is free forever on both cloud and self-hosted, including 50K observations per month and all core features. Self-hosted deployments have no usage limits beyond the infrastructure you run.

Langfuse Cloud offers EU (Frankfurt, AWS) and US regions that you select on project creation. Self-hosted deployments store data entirely within your own infrastructure — no telemetry is sent back to Langfuse.

The Problem Langfuse Solves

Ask any engineering team that has shipped an LLM-powered feature to production what their biggest pain point is, and you will hear the same answer: observability. Traditional APM tools were built for request-response web services with predictable schemas and deterministic outputs. Large language models break every assumption they make. Responses are non-deterministic. Costs vary wildly per request. Quality is subjective and drifts silently as prompts and models change. A failing trace is not a stack trace — it is a sprawling tree of model calls, retrieved documents, tool invocations, and token usage that refuses to fit into a Datadog dashboard.

Langfuse, a Berlin-based open-source platform founded in 2023, was built from the ground up to solve this specific problem. Rather than retrofitting existing observability tools, its creators designed a data model, UI, and evaluation framework that treats LLM applications as first-class citizens. The result is the closest thing the GenAI engineering community has to a standard for tracing, evaluating, and improving production AI systems — used by teams at Samsara, Khan Academy, Twilio, and thousands of smaller startups.

What Langfuse Actually Does

At its core, Langfuse captures a structured trace of every LLM-powered interaction in your application. A trace is a hierarchical record of everything that happened: the user input, the prompts sent to the model, the model outputs, the retrieved documents in a RAG pipeline, the tool calls, the latency and token usage at each step, and the final response returned to the user. From this foundation, Langfuse builds four tightly integrated capabilities.

Tracing and Analytics

The tracing layer is the foundation. SDKs for Python, TypeScript, and raw REST instrument your application with one or two lines of code. Integrations with LangChain, LlamaIndex, Haystack, Vercel AI SDK, LiteLLM, and the OpenAI and Anthropic SDKs mean most teams do not write instrumentation code at all — they add a decorator or enable an environment variable and traces start flowing. Sessions group multi-turn conversations, users are tagged for cohort analysis, and dashboards show cost, latency, and error rates broken down by model, feature, user, or any custom dimension you choose.

Prompt Management

Langfuse treats prompts as versioned artefacts, not strings hardcoded in application logic. You write prompts in the Langfuse UI or push them via the API, label versions as production or staging, and fetch them at runtime using the SDK. A deployment gone wrong is a one-click rollback. Prompt performance is tied directly to traces, so you can answer questions like "did average latency go up after we shipped prompt v14?" without building your own analytics pipeline.

Evals and Datasets

Datasets are curated collections of inputs — golden questions, edge cases, regression samples — that you run your application against. Each run produces traces you can score automatically using LLM-as-judge, custom Python evaluators, or human annotation queues. This is the closed-loop improvement workflow that serious LLM teams eventually need and that generic observability tools cannot provide.

Cost and Latency Control

Every trace tracks input tokens, output tokens, and cost per model. Dashboards surface which features, users, or prompts are consuming the most budget, and alerting rules let you catch runaway cost regressions before they hit the monthly invoice. For teams operating LLM features at scale, this alone justifies adoption.

Pricing

Langfuse offers a generous free tier on both cloud and self-hosted deployments. The Hobby plan includes 50,000 observations per month, two users, and 30 days of data retention — enough for most early-stage projects. The Core plan at $59/month unlocks unlimited users and projects, 90 days retention, and 100,000 included observations. The Pro plan at $199/month adds unlimited data retention, SSO, team RBAC, and the full prompt management and evals feature set.

Enterprise pricing is custom and bundles self-hosted deployment support, SLAs, SOC 2 reports, and a Data Processing Agreement. For teams that need to run Langfuse entirely inside a customer VPC or air-gapped environment, the enterprise tier is the obvious path.

One important note on pricing mechanics: Langfuse bills on observations, not on traces. A single trace typically contains multiple observations (spans, generations, events), so teams should instrument a representative workload early to avoid pricing surprises. The self-hosted option bypasses this entirely — run as many observations as your infrastructure allows, with no usage-based charges.

EU Compliance and Data Sovereignty

This is where Langfuse has a clear structural advantage over its American competitors. Langfuse GmbH is a German limited company registered in Berlin, fully subject to GDPR and the EU AI Act. Langfuse Cloud offers an EU region hosted in Frankfurt on AWS — you choose EU or US when creating a project and data stays in that region. A standard Data Processing Agreement is available, and the company is SOC 2 Type II and ISO 27001 certified.

For teams with stricter data residency requirements, the self-hosted option is unmatched in the observability space. Because Langfuse is MIT licensed, you can deploy the entire stack inside your own VPC, Kubernetes cluster, or on-premise environment. No telemetry is sent back to Langfuse. No feature is gated behind the cloud version. This is the kind of guarantee that regulated industries — finance, healthcare, government — cannot get from LangSmith, Arize, or Datadog's LLM observability module.

Who Langfuse Is Best For

LLM engineering teams building production GenAI products who have outgrown print-debugging and spreadsheet eval tracking. The combination of tracing, prompts, and evals in one platform eliminates the usual Frankenstein stack of Postgres tables, LangSmith projects, and ad hoc Jupyter notebooks.

EU-regulated organisations that need LLM observability with genuine data sovereignty. The Frankfurt cloud region and MIT-licensed self-hosted option satisfy compliance teams that would otherwise block adoption.

Open-source friendly engineering cultures that dislike vendor lock-in and want the option to audit, fork, or self-host their observability stack. Langfuse is one of the very few platforms in this category where the OSS version is genuinely production-grade, not a crippled community edition.

Teams already using LangChain or LlamaIndex, where the native integrations make onboarding almost frictionless.

Where Langfuse Falls Short

The feature surface is broad and the learning curve is real. Teams new to LLM engineering can find the distinction between traces, sessions, generations, and observations confusing, and documentation assumes a baseline familiarity with these concepts. The self-hosted deployment is production-grade but operationally heavier than a single-container solution — you are running Postgres, ClickHouse, Redis, and an object store, which pushes Langfuse out of reach for solo developers who just want a quick local dashboard.

Integration with general-purpose observability stacks is improving but still behind what a Datadog customer might expect. If your LLM application is one service inside a larger microservices estate, you will likely run Langfuse alongside — not instead of — a traditional APM tool.

The Verdict

Langfuse has become the de facto open-source standard for LLM observability for good reason. The data model is right, the integrations are comprehensive, the evals workflow closes the quality loop, and the MIT licence plus EU cloud region make it the obvious choice for European teams that take data sovereignty seriously. The operational overhead of self-hosting and the breadth of the feature surface mean it is not the right first tool for every project — but for any team scaling an LLM-powered product toward production, it is the most serious option available.

Frequently Asked Questions

Is Langfuse GDPR compliant?

Yes. Langfuse GmbH is a Berlin-registered German company fully subject to GDPR. Langfuse Cloud offers an EU region hosted in Frankfurt on AWS, a standard Data Processing Agreement is available, and the company holds SOC 2 Type II and ISO 27001 certifications. For maximum control, the self-hosted option keeps data entirely inside your own infrastructure.

Can I self-host Langfuse?

Yes. Langfuse is fully open-source under the MIT licence with no feature gates between the OSS and cloud versions. You can deploy the stack via Docker Compose, Helm on Kubernetes, or Terraform. Self-hosting requires Postgres, ClickHouse, Redis, and an S3-compatible object store such as MinIO.

How does Langfuse compare to LangSmith?

Langfuse is open-source, framework-agnostic, and offers both an EU-hosted managed cloud and a self-hosted option. LangSmith is a closed-source US-hosted product tightly coupled to the LangChain ecosystem. Teams requiring data sovereignty, working outside LangChain, or wanting the ability to audit and fork their observability stack tend to prefer Langfuse.

Is there a free tier?

Yes. The Hobby plan is free forever on both cloud and self-hosted, including 50,000 observations per month, two users, and all core features. Self-hosted deployments have no usage limits beyond the infrastructure you run yourself.

Where is my Langfuse data stored?

Langfuse Cloud offers EU (Frankfurt, AWS) and US regions that you choose at project creation. Data never crosses regions. Self-hosted deployments store data entirely within your own infrastructure — Langfuse never receives any telemetry from self-hosted installations.