Production traces pile up fast. A high-traffic app can produce millions of observations in a week, and when something looks wrong you need to pull the one trace that says “refund failed” out of hundreds of gigabytes in a fraction of a second. A scroll-and-hope UI does not cut it at that scale.

Day 3 rolls out full-text search to Langfuse Cloud. In our benchmarks, large input/output searches that took 18 seconds and scanned 494 GB now return in under half a second and read less than a gigabyte. Metadata-heavy queries dropped from 1.6s to 0.2s. The UI gets faster for humans hunting a bug, and the new matches operator on Observations API v2 gives agents and scripts the same token-based search programmatically.

Built on top of ClickHouse’s new full-text search release, we worked closely with the ClickHouse team on the underlying engine, so features like this land in Langfuse days after they ship in ClickHouse core.

Read the changelog↗Observations API v2 docs↗ClickHouse FTS GA post↗

Day 02 · Tuesday, May 26, 2026

Langfuse agent skill.

Building an agent is easy. Getting it to production is hard. You set up tracing and evaluators, but how do you know what your agent's real failure modes are? How do you know your LLM-as-a-judge is actually calibrated against your human annotators?

The Langfuse Skill lets you hand your AI coding agent a playbook for working with Langfuse. It teaches Claude Code, Cursor, Codex, etc. how to instrument an app, query traces, manage prompts, and set up evaluators. Drop it into your editor, then describe the job in plain language and the agent runs with it.

In the video below, Marlies uses the LLM-as-a-Judge calibration skill with Codex to produce a full analysis with accuracy, F1, precision, recall, and cost, all graphed directly in the new Langfuse Experiments view.

Read the changelog↗Get started in docs↗View skills on GitHub↗

Day 01 · Monday, May 25, 2026

Experiments in CI/CD.

Run your Langfuse experiments inside GitHub Actions. The new action tests every pull request against a Langfuse dataset, fails the workflow when scores drop below the threshold you set, and posts the result back to the PR as a comment. Every run is tracked in Langfuse so you can dig into regressions later.

Read the changelog↗Get started in docs↗View the action on GitHub↗

Launch Week #5

One drop a day, every day.Monday through Friday.

Full-Text Search.

Langfuse agent skill.

Experiments in CI/CD.

One drop a day, every day.
Monday through Friday.