In this post, we will showcase how Upstash Ratelimit can be used in LangChain in Javascript and in Python.
Motivation
Large Language Models (LLMs) are powerful tools but can be costly. To manage their cost, it's essential to limit the number of requests processed, ensuring the use of LLMs remains affordable. This is where Upstash Ratelimit comes into play.
Using Upstash Ratelimit in LangChain allows you to:
- Allow a limited number of chain invocations over a time period.
- Allow a limited number of tokens (either prompt or prompt and completion) over a time period.
Usage
Start by installing @upstash/ratelimit
and @langchain/community
(you can find more information about installing LangChain community here):
If you are using Python, run:
Then, set the environment variables UPSTASH_REDIS_REST_URL
and UPSTASH_REDIS_REST_TOKEN
. You can get these environment variables by going to the Upstash console and creating a redis instance.
Now we can demonstrate how you can add rate limiting to your chain in LangChain.
First, create a rate limit instance as shown below:
This ratelimit
object will use the Redis database to store how many requests were made by different users. Learn more about Ratelimit from its documentation.
Next, create a mock chain in LangChain to showcase the callback:
Finally, create a callback and invoke the chain:
Here is the same code written in Python:
Note that we initialize the callback when we invoke the chain. This is because the handler has a state which needs to be reset with each invocation.
In the configuration above, the handler will allow 10 requests per minute for each user. However, this is not the only way to configure the handler. To learn more about Ratelimit callbacks in LangChain, see the Upstash Ratelimit Callback documentation for TypeScript and Python.
You can also rate limit based on the number of tokens. You can configure the handler to rate limit based on requests, number of tokens, or both. To add token-based rate limiting, use the tokenRatelimit
parameter when initializing the handler:
In the token-based rate limiting, we expect the LLM in your chain to return an LLMResult
in this format:
{
"tokenUsage": {
"totalTokens": 123,
"promptTokens": 456,
"otherFields: "..."
},
"otherFields: "..."
}
Not all LLMs in LangChain use this format, however. If the keys are different, you can use llmOutputTokenUsageField
, llmOutputTotalTokenField
and llmOutputPromptTokenField
fields of UpstashRatelimitHandler
.
Conclusion
The UpstashRatelimitHandler
provides a simple way to add rate limiting to your LangChain applications. With just a few steps, you can control the number of requests and tokens used. For more detailed information and advanced configurations, explore the LangChain documentation.