Open-source LLM observability, tracing, and prompt management platform
Langfuse is a Berlin-based open-source LLM engineering platform offering tracing, evaluations, prompt management, and analytics for teams building production AI applications. Backed by Y Combinator and Lightspeed, it is used by thousands of teams to debug, monitor, and improve LLM-powered products.
Headquarters
Berlin, Germany
Founded
2023
Pricing
EU Data Hosting
Yes
Employees
11-50
Open Source
Yes
Free
$59/mo
$199/mo
Contact Sales
Billing: monthly, annual
Ask any engineering team that has shipped an LLM-powered feature to production what their biggest pain point is, and you will hear the same answer: observability. Traditional APM tools were built for request-response web services with predictable schemas and deterministic outputs. Large language models break every assumption they make. Responses are non-deterministic. Costs vary wildly per request. Quality is subjective and drifts silently as prompts and models change. A failing trace is not a stack trace — it is a sprawling tree of model calls, retrieved documents, tool invocations, and token usage that refuses to fit into a Datadog dashboard.
Langfuse, a Berlin-based open-source platform founded in 2023, was built from the ground up to solve this specific problem. Rather than retrofitting existing observability tools, its creators designed a data model, UI, and evaluation framework that treats LLM applications as first-class citizens. The result is the closest thing the GenAI engineering community has to a standard for tracing, evaluating, and improving production AI systems — used by teams at Samsara, Khan Academy, Twilio, and thousands of smaller startups.
At its core, Langfuse captures a structured trace of every LLM-powered interaction in your application. A trace is a hierarchical record of everything that happened: the user input, the prompts sent to the model, the model outputs, the retrieved documents in a RAG pipeline, the tool calls, the latency and token usage at each step, and the final response returned to the user. From this foundation, Langfuse builds four tightly integrated capabilities.
The tracing layer is the foundation. SDKs for Python, TypeScript, and raw REST instrument your application with one or two lines of code. Integrations with LangChain, LlamaIndex, Haystack, Vercel AI SDK, LiteLLM, and the OpenAI and Anthropic SDKs mean most teams do not write instrumentation code at all — they add a decorator or enable an environment variable and traces start flowing. Sessions group multi-turn conversations, users are tagged for cohort analysis, and dashboards show cost, latency, and error rates broken down by model, feature, user, or any custom dimension you choose.
Langfuse treats prompts as versioned artefacts, not strings hardcoded in application logic. You write prompts in the Langfuse UI or push them via the API, label versions as production or staging, and fetch them at runtime using the SDK. A deployment gone wrong is a one-click rollback. Prompt performance is tied directly to traces, so you can answer questions like "did average latency go up after we shipped prompt v14?" without building your own analytics pipeline.
Datasets are curated collections of inputs — golden questions, edge cases, regression samples — that you run your application against. Each run produces traces you can score automatically using LLM-as-judge, custom Python evaluators, or human annotation queues. This is the closed-loop improvement workflow that serious LLM teams eventually need and that generic observability tools cannot provide.
Every trace tracks input tokens, output tokens, and cost per model. Dashboards surface which features, users, or prompts are consuming the most budget, and alerting rules let you catch runaway cost regressions before they hit the monthly invoice. For teams operating LLM features at scale, this alone justifies adoption.
Langfuse offers a generous free tier on both cloud and self-hosted deployments. The Hobby plan includes 50,000 observations per month, two users, and 30 days of data retention — enough for most early-stage projects. The Core plan at $59/month unlocks unlimited users and projects, 90 days retention, and 100,000 included observations. The Pro plan at $199/month adds unlimited data retention, SSO, team RBAC, and the full prompt management and evals feature set.
Enterprise pricing is custom and bundles self-hosted deployment support, SLAs, SOC 2 reports, and a Data Processing Agreement. For teams that need to run Langfuse entirely inside a customer VPC or air-gapped environment, the enterprise tier is the obvious path.
One important note on pricing mechanics: Langfuse bills on observations, not on traces. A single trace typically contains multiple observations (spans, generations, events), so teams should instrument a representative workload early to avoid pricing surprises. The self-hosted option bypasses this entirely — run as many observations as your infrastructure allows, with no usage-based charges.
This is where Langfuse has a clear structural advantage over its American competitors. Langfuse GmbH is a German limited company registered in Berlin, fully subject to GDPR and the EU AI Act. Langfuse Cloud offers an EU region hosted in Frankfurt on AWS — you choose EU or US when creating a project and data stays in that region. A standard Data Processing Agreement is available, and the company is SOC 2 Type II and ISO 27001 certified.
For teams with stricter data residency requirements, the self-hosted option is unmatched in the observability space. Because Langfuse is MIT licensed, you can deploy the entire stack inside your own VPC, Kubernetes cluster, or on-premise environment. No telemetry is sent back to Langfuse. No feature is gated behind the cloud version. This is the kind of guarantee that regulated industries — finance, healthcare, government — cannot get from LangSmith, Arize, or Datadog's LLM observability module.
LLM engineering teams building production GenAI products who have outgrown print-debugging and spreadsheet eval tracking. The combination of tracing, prompts, and evals in one platform eliminates the usual Frankenstein stack of Postgres tables, LangSmith projects, and ad hoc Jupyter notebooks.
EU-regulated organisations that need LLM observability with genuine data sovereignty. The Frankfurt cloud region and MIT-licensed self-hosted option satisfy compliance teams that would otherwise block adoption.
Open-source friendly engineering cultures that dislike vendor lock-in and want the option to audit, fork, or self-host their observability stack. Langfuse is one of the very few platforms in this category where the OSS version is genuinely production-grade, not a crippled community edition.
Teams already using LangChain or LlamaIndex, where the native integrations make onboarding almost frictionless.
The feature surface is broad and the learning curve is real. Teams new to LLM engineering can find the distinction between traces, sessions, generations, and observations confusing, and documentation assumes a baseline familiarity with these concepts. The self-hosted deployment is production-grade but operationally heavier than a single-container solution — you are running Postgres, ClickHouse, Redis, and an object store, which pushes Langfuse out of reach for solo developers who just want a quick local dashboard.
Integration with general-purpose observability stacks is improving but still behind what a Datadog customer might expect. If your LLM application is one service inside a larger microservices estate, you will likely run Langfuse alongside — not instead of — a traditional APM tool.
Langfuse has become the de facto open-source standard for LLM observability for good reason. The data model is right, the integrations are comprehensive, the evals workflow closes the quality loop, and the MIT licence plus EU cloud region make it the obvious choice for European teams that take data sovereignty seriously. The operational overhead of self-hosting and the breadth of the feature surface mean it is not the right first tool for every project — but for any team scaling an LLM-powered product toward production, it is the most serious option available.
Yes. Langfuse GmbH is a Berlin-registered German company fully subject to GDPR. Langfuse Cloud offers an EU region hosted in Frankfurt on AWS, a standard Data Processing Agreement is available, and the company holds SOC 2 Type II and ISO 27001 certifications. For maximum control, the self-hosted option keeps data entirely inside your own infrastructure.
Yes. Langfuse is fully open-source under the MIT licence with no feature gates between the OSS and cloud versions. You can deploy the stack via Docker Compose, Helm on Kubernetes, or Terraform. Self-hosting requires Postgres, ClickHouse, Redis, and an S3-compatible object store such as MinIO.
Langfuse is open-source, framework-agnostic, and offers both an EU-hosted managed cloud and a self-hosted option. LangSmith is a closed-source US-hosted product tightly coupled to the LangChain ecosystem. Teams requiring data sovereignty, working outside LangChain, or wanting the ability to audit and fork their observability stack tend to prefer Langfuse.
Yes. The Hobby plan is free forever on both cloud and self-hosted, including 50,000 observations per month, two users, and all core features. Self-hosted deployments have no usage limits beyond the infrastructure you run yourself.
Langfuse Cloud offers EU (Frankfurt, AWS) and US regions that you choose at project creation. Data never crosses regions. Self-hosted deployments store data entirely within your own infrastructure — Langfuse never receives any telemetry from self-hosted installations.
Open-source AI framework for building RAG pipelines and search applications
Alternative to Langchain, Llamaindex
AI coding assistant for VS Code and JetBrains powered by Codestral and Devstral
Alternative to Github Copilot, Cursor
High-performance open-source vector database built in Rust