Language models are deceptively simple at first glance. One might think it’s just about feeding prompts and receiving text output. But when applied to real-world applications, the picture gets more complex. This post dives into a fictional narrative, revealing the challenges and complexities of creating an LLM chatbot.
Imagine you are creating an educational chatbot. On the surface, it seems like a straightforward task of building an API wrapper on top of OpenAI’s API.
“One to two weeks should cover it,” estimates a developer on the project. Bolstered by this optimism, you develop a Socratic-style prompt within the OpenAI playground, which appears to function well.
Fast forward a week. You’ve spent the time developing a user interface and backend API, and the team now has a functional prototype. The progress seems impressive. A custom educational chatbot completed in just one week, ready to ship. But hold on – how confident are you about its performance and safety?
You’re faced with various questions: How do you handle chat history given the finite prompt size? Have you cleverly summarised previous messages, ensuring no critical details are lost? Can your chatbot decline tasks outside its abilities, or will it fabricate answers? What about security concerns regarding its modification?
Then you learn about GPT-4. It handles complex topics better, but it’s more expensive and slower than GPT-3.5. A developer suggests classifying incoming messages first, then using either GPT-3.5 or GPT-4 based on complexity. You create a prompt to classify inputs and test it in the OpenAI playground. It seems to work, but how certain can you be, based on only a handful of test examples?
Your product launches, and to your delight, users are engaging. But then you realize, you lack a dashboard for tracking conversations, usage stats, and costs. OpenAI’s stats are limited and your database isn’t the most user-friendly for reviewing conversations.
Soon, you start receiving feedback from users. The chatbot sometimes generates nonsensical responses, forgets information, and performs slowly. How can you address these issues?
Additionally, unexpected usage spikes strain your budget. An investigation reveals a user sent hundreds of long messages within an hour. "We need rate limiting," you realize.
In response to faulty bot responses, you adjust the prompt. It fixes the issue, but does it cover similar but different inputs?
Enter Erika, a writer and psychology student introduced by your CEO. "She's a perfect fit," he says. "With her unique skill set, she can definitely help improve our prompts. And it's just English sentences, right?"
Upon hearing this, one of your developers casually suggests, "Well, she could just clone our Python repository, make her changes, and submit a pull request."
The room falls silent for a moment as the suggestion sinks in. Erika, who's not a programmer, is supposed to navigate a Python repository just to fine-tune English prompts? The developer's suggestion, while well-intentioned, highlights a glaring issue: why should editing simple English prompts require programming skills or code repository manipulation?
At this point, a new idea sparks in the developer's mind. "Hmm, the prompts could probably live outside of the codebase. They're sort of similar to translation strings," they think aloud. This could open up a whole new way of dealing with prompts, making them more accessible to non-developers like Erika.
This fictional story ends here, but it's clear we're only at the beginning. To summarize the challenges, the journey of creating a chatbot involves a myriad of complex and interlinked issues. There's a need for tools to make life easier for developers and non-developers alike. Currently, we're working on an MVP version of a solution that we believe can aid developers in navigating these challenges. We're building Langtail with the intention to provide:
We believe Langtail could offer the developers from our earlier story the toolset they need to create chatbots more efficiently and confidently.
As we are getting closer to bring Langtail to life, if what we're building resonates with you, we'd be thrilled if you subscribe to our MVP newsletter.