We launched Langtail into public beta six months ago under this simple premise: Testing is what will make or break an AI app.
The AI space has evolved dramatically over the past 6 months - new frontier models, massive context windows, inexpensive models for certain use cases, and much more.
Want to know what hasn't changed? You still need to test your LLM prompts to make sure they behave the way you intend.
Want to know why that hasn't changed? At their core, LLMs are word predictors. They're inherently untrustworthy.
We are more convinced than ever that to build trustworthy AI apps, you need to test.
For many companies that we've talked to, their story with LLM testing goes something like this:
That's usually when the head of engineering starts frantically searching, finds Langtail, and calls us.
We've been working really hard to answer that question for the past 18 months. We think we have a compelling answer.
Today, we're excited to introduce Langtail 1.0 – a major step forward for building and testing AI apps.
Honestly, the best way to explain it is to show it:
Notice how fast the feedback loop is between editing your prompt and seeing if the test cases pass. Langtail keeps you in flow and focused on optimizing your prompt for your use case.
At the core of Langtail 1.0 is our new spreadsheet-like interface for testing LLM applications. Our goal was to make it feel natural to anyone used to working in Excel or Google Sheets to build up a set of test cases.
Today we're also launching test configurations. Let's say that a new frontier model gets released and you're wondering if it'll benefit your app. It's as simple as a few clicks to create a new test config with that model and see the side-by-side comparison.
Your prompts can now call tools and have the code run directly within Langtail. It makes prototyping and testing much easier since you no longer have to mock the response. Plus, with our secure sandboxed environment powered by QuickJS, you can safely and quickly execute JavaScript code.
Langtail 1.0 introduces Assistants—stateful entities that automatically manage memory and conversation history, so you don't have to. This means less code to write and maintain, and more focus on building. Assistants can be used across models, in tests, deployed as APIs, or integrated with tools.
We're incredibly proud of Langtail 1.0, and we'd love for you to experience it for yourself. Whether you're an AI engineer, a product team pushing the boundaries of LLM apps, or an enterprise seeking more control and security, Langtail 1.0 has the tools you need.
We can't wait to see what you'll build.