Blog

Just Get Started with LLM Use Cases

Blog

February 12, 2024
5 minutes read
Author: Jeremy Henry, Ashwath Reddy

If you’re hesitant to get started with Large Language Model (LLM) use cases, don’t be. Despite the hype and proliferation of AI options, today’s technology makes it pretty easy to build your own generative AI tool.

Here is the process that we recommend for getting started with LLMs:

Identify potential use cases
Determine the most viable and valuable option
Build a Proof of Concept (POC)
Review implications before implementation

In this recent example, it took us a relatively short time to build a POC for a Q&A AnswerBot for a financial services company. This is how we did it:

1. Identify Potential Use Cases

We knew that we wanted to use publicly accessible tools (an existing LLM model) and readily available custom data. These were our constraints. While ChatGPT is useful for general questions, GPT models can also be trained on company-specific data. With that in mind, we thought about ways to use a GPT model to make company knowledge more accessible.

Here are the potential use cases we identified for the financial services company:

Expert Q&A Answer Bot for Customer Service – a product expert tool
Debugging Q&A Answer Bot for IT DevOps – find similar bugs and fixes
Data Q&A Answer Bot for Leadership Team – quickly access financials, operations, resource utilization, etc.
Contract Q&A for Customer Success Team – answer detailed questions about particular contracts
Code Generation Q&A for Software Development Teams – improve the efficiency of software developers

2. Don’t Let Good Get in the Way of Passable

The use case that we selected was No. 1 – Expert Q&A Answer Bot for Customer Service.

First, we know that information is widely dispersed across 100s of documents and 1,000s of pages (multiple types of insurance, numerous policies for each insurance type, unique state-based clauses for each, etc.). It is time-consuming for customer service reps to look up questions about individual policies, especially when they’re not necessarily formatted in the same way. A Q&A Answer Bot would provide significant, measurable time-savings and efficiency value.
Second, the user-friendly chatbot interface enables reps to ask questions in plain English instead of structured queries. This is an ideal use case for an LLM, which can understand unstructured prose questions, process millions of documents, and retrieve relevant information within milliseconds.

The other use cases mentioned above were evaluated but were not selected for various reasons:

The Debugging Q&A Answer Bot for IT DevOps turned out not to be a viable use case because of the lack of readily available data. Many of the IT problems an organization encounters every day are captured in emails or trouble tickets. The fixes, meanwhile, are captured elsewhere – in code repositories, operations manuals, readme.txt files, etc. In fact, Stack Overflow – the largest online community for developers to learn and share their programming knowledge – had challenges with this exact use case.
The use case for Creating a Data Q&A Answer Bot for the Leadership Team is already addressed by readily available BI tools. If you want exact information from your database, just stick to your BI.
Creating a Contract Q&A for the Customer Success Team is similar to our selected use case, however, the dataset is smaller and the data around contract terms is more sensitive. Thus, we preferred our selected use case.
Finally, the Code Generation Q&A for Software Development Teams is now already being addressed by co-pilot tools.

3. Build a Proof of Concept (POC)

Model Used: For our POC, we used the GPT-4 model from OpenAI. We accessed this model through Azure OpenAI Studio. The codebase was primarily written in Python, using the langchain library. We communicated with the model through the API endpoints provided by Azure OpenAI.

Training Data: We used a limited test set of documents detailing insurance policies, including all salient terms and conditions. These documents were in pdf format, and no format conversion was performed. Since feeding all the data from the documents (which could have hundreds of thousands of tokens) in a single go was not feasible, we chunked the docs into smaller, predefined sizes. We then fed these chunks into the “ADA” model, another offering from OpenAI, to create vector embeddings, which were in turn used to find the most suitable answer to the questions posed using the FAISS algorithm (Facebook’s algorithm for similarity search). Once we discovered the most appropriate embeddings, we again converted them back to text and fed them to the GPT-4 model. The GPT-4 model would generate contextually coherent answers from the extracted information and reply to the user.

Prompt Engineering: Extensive prompt engineering was not necessary, as the task at hand was pretty straightforward. We just needed the bot to use our docs as the knowledge base and continue to have a conversation chain with the user while retaining the context. The prompt was simple and just posed the user’s query along with the past few exchanges.

Chat Interface: Once the model was trained, we created a chat interface using the React.js framework. This interface allowed the user to type in questions that would appear in a dialogue bubble, similar to the text messaging interface found on most smartphones. The responses from the LLM then appeared in a separate dialogue bubble, so that the user could easily distinguish between their questions and the LLM’s responses.

Accessibility: The user interface was made accessible via a public IP address so that the POC could be accessed from anywhere via the public internet. This allowed for direct access to the POC for demonstrations, presentations, etc.

Accuracy/Tuning: One of the biggest questions surrounding LLMs is their accuracy. If the answers to the questions were wrong, the POC would not be successful. As this model has deterministic data to work with, we opted for the lowest possible temperature setting of 0 to mitigate randomness and prevent hallucinations. While we incrementally tested performance, we decided that a temperature of 0 provided optimal results since we didn’t need creative answers from the model. It’s important to note that data quality is critical to managing hallucinations. In addition, hallucination management should be a part of any comprehensive AI governance and responsible AI strategy.

Additional Considerations: While we were able to rely on the accuracy of the model, there were still limitations. First, we wanted to make sure the Answer Bot would not be relied upon to make judgments, as that would require contextual knowledge related to an individual customer. Even if the information was provided to the Answer Bot through the chat interface, the system was not trained to reason out the appropriate recommendations based on life stage, financial circumstances, etc.

Second, we wanted to make sure that the model would not be used for the explicit purpose of providing financial advice. To help enforce this limitation, the model was trained to never give advice. Instead, it would always ask the user to consult an actual financial advisor. It also never made predictions based on existing data regarding returns on investments.

This was designed as an internal tool to support customer service representatives. While end customers might have been able to use the Answer Bot directly, they could easily ask the wrong questions and therefore make erroneous decisions based on answers to those questions. For all of the above reasons, the Answer Bot was not designed as a replacement for a financial agent, but rather as a tool to enhance the capabilities of an agent.

4. Review Implications Before Implementation

We deemed our POC to be a success. We asked a wide variety of typical questions with a variety of phrasings, and the system returned reasonable answers. Most questions were answered within 1 to 5 seconds. The chat interface was intuitive and attractive. And the system was able to carry on a chain of subsequent questions. There was sufficient complexity that the user could ask follow-up questions, and this required the model to retain context to keep conversations going.

Our biggest challenge was finding the optimal configurations (chunk size, temperature) to make sure that the model’s responses always adhered to the constraints we set. This required numerous iterations of trial and error to reach an optimal set of values.

We would want to test additional functionality before launching the Answer Bot. First, the data we used was a combination of text, images, and tables. The model was able to extract text and tabular data, but not data from images. A future iteration could use more modern libraries to extract data from images, too. (The ability to recognize image input was only made available in most LLMs later.) Also in a future iteration, we could use feedback from actual users to improve the accuracy of responses.

Another consideration not tackled in our POC was integration into the broader customer service and operating systems.

Building vs. Buying an Answer Bot?

There are chatbots on the market that you could just buy. Either way works, but here are some benefits of using your own implementation of a chatbot:

You have access to all the configurations and can create the optimal bot for your use cases
You can easily plug in other GPT models (Azure OpenAI, Google Gemini, Anthropic’s Claude, etc.) at a later date if necessary
The data your users input into your bot stays within your environment and is not sent to an external location/third-party
The solution is inexpensive, as you have to pay only for the GPT subscription
A custom implementation provides you with the versatility to operate either as an independent instance or to integrate seamlessly into your other applications

In summary, the POC was accomplished relatively quickly by a single data engineer, with good results. So if you’re considering getting started with LLM use cases, our takeaway is don’t wait – just jump in and do it.

CoStrategix is a strategic technology consulting and implementation company that bridges the gap between technology and business teams to build value with digital and data solutions. If you are looking for guidance on data management strategies and how to mature your data analytics capabilities, we can help you leverage best practices to enhance the value of your data. Get in touch!

Just Get Started with LLM Use Cases

1. Identify Potential Use Cases

2. Don’t Let Good Get in the Way of Passable

3. Build a Proof of Concept (POC)

4. Review Implications Before Implementation

Building vs. Buying an Answer Bot?

Say hello to Josh.

Josh Pierce, COO

Say hello to Mark.

Mark Johnson, Chief Growth Officer

Say hello to Andy.

Andy Losekamp, Vice President of Sales

Say hello to Nan.

Nan Jayaram, CEO

Say hello to Raghuveer.

Rakesh Raghuveer, Director of Finance, Engineering Office

Say hello to Audrey.

Audrey Ronis-Tobin, Director of Marketing

Say hello to Jeff.

Jeff Fichlie, VP of Technology

Say hello to Lincoln.

Lincoln Chung, Vice President of Delivery

Say hello to Mike.

Mike Grosser, VP of Business Development

Say hello to Tarun.

Tarun Sareen, Associate Software Partner

Say hello to Ramki.

Ramkishan B.K – Director of Technology & Engineering

Say hello to Rich.

Richard (Rich) Wessler, VP, Finance & Operations

Say hello to Malati.

Malati Shamanna, Director, Engineering Office

Just Get Started with LLM Use Cases

1. Identify Potential Use Cases

2. Don’t Let Good Get in the Way of Passable

3. Build a Proof of Concept (POC)

4. Review Implications Before Implementation

Building vs. Buying an Answer Bot?

Related Blog Posts