Chat Model Response Caching With Upstash Redis in LangChain TS
LLMs can be a resource drain in production, consuming both time and money as user queries scale into the millions of tokens. Current chat models lack efficiency, often resulting in subpar user experience. Caching is a potent remedy for these challenges. In this article, we'll take a deep dive into setting up caching for your chat models with Upstash Redis and LangChain in a production ready way.
Prerequisites:
In this blog, we'll be using the LangChain TS ChatOpenAI
class, LLMChain
Upstash Redis.
First, create a project directory, install the following packages and setup git.
For a starter .gitignore
file, copy the contents from this link.
We now have initialized the project and need to setup an Upstash account for our Redis server.
Sign up for an Upstash account by visiting the login page, selecting the "Sign Up" option, and following the prompts.
After signing up you'll be redirected to the Upstash console. Make sure you're on the Redis tab (click Redis in the navigation bar in the top left). Click the "Create Database" button to create a new database.
For this example, we'll name it test-cache-db
, select the N. California (us-west-1)
region, and choose the TLS (SSL) Enabled
checkbox.
Once your database has been created we can move to the next step of setting up our TypeScript project.
Let's create a TypeScript file inside our ./src
directory to initialize and export our Upstash Redis client.
Open the project in your preferred IDE to start coding.
In this file we're initialing and exporting our Upstash Redis client. We're also loading our environment variables from a .env
file and ensuring we have the required environment variables.
To get your REST URL and REST TOKEN go back to your tab where you've created your database, scroll to the section titled REST API
and click to copy the URL and token.
You will also need an OpenAI account and API key. To get a key: create an OpenAI account, navigate to the API section, and generate a new key.
Once you've gathered all your keys, create a .env
file in the root of your project and paste the URL and token into the file.
Next, we need to create the LLM chat function where we can pass in our Upstash Redis client, and start making OpenAI requests.
Import the upstashRedisCache
client we created in the previous step, and the classes you see below from LangChain.
In this file we'll create a function called llmChat
that takes a prompt as an argument and returns a response from the OpenAI API. Before sending our requests to OpenAI we'll also create a chat prompt using the ChatPromptTemplate
from LangChain.
Whew, that was a lot. Let's go over line by line what we just did.
In lines 1-5 we imported our upstashRedisCache
client, the ChatOpenAI
class, the LLMChain
class, and the ChatPromptTemplate
class from LangChain. We also imported the dotenv
package to load our environment variables from our .env
file.
Next, we called dotenv.config()
to load our environment variables from our .env
file.
We then created our llmChat
function that takes a prompt as an argument and returns a response from the OpenAI API.
In line 10 we check if our OpenAI API key is set, and if not we throw an error.
After that we define two variables, one for the system prompt and one for the user prompt. In our system prompt we define the context of our chat, in our case we want our assistant to act like a pet store owner. For our human template we only want to pass in the prompt we defined as an argument to our llmChat
function.
Next, we create the prompt message by passing our system and human templates into the ChatPromptTemplate.fromMessages
method.
In line 24 we create a new ChatOpenAI
instance and pass in our upstashRedisCache
client, our OpenAI API key, and a temperature of 1. The temperature parameter controls the creativity of our responses, the higher the temperature the more creative the responses will be. The cache
parameter defines the cache we want to use for our chat model, in this case we're using Upstash Redis.
Next, we initialize an LLM chain by passing in our ChatOpenAI
instance and our chat prompt.
Finally, we call chain.predict
and pass in our prompt as an argument. This will return a string that contains the generated response from OpenAI.
Now that we've created our llmChat
function, let's create an index
file to pull everything together and test it out.
In order to test that our llmChat
function is caching results, we'll make two calls with identical prompts. If they are cached correctly, the second call's result will match the first.
Run this file in your terminal with the following command
You should then see the following output in your terminal with the same response from OpenAI.
All of the code outlined in this blog can be found in this GitHub repository.