One of the best things about the serverless is its ability to scale even in case of huge traffic spikes. But unfortunately, scaling is not free both financially and technically. That’s why developers need to control their applications’ scalability. Here the main reasons you will need a rate limiting mechanism in your serverless application:
1- Protect your resources: If you’re providing a public API, traffic spikes can degrade the quality of the service, and may lead to a service outage for all your users. You need to protect your system against such cascading failures as well as self-inflicted Ddos incidents. A bug in your application can trigger such problems in your system. An internal process which retries an endpoint indefinitely in case of a failure can easily exhaust your resources.
2- Manage user quotas: You may want to define quotas for your users for fair use of your services. Also you may need quotas if you provide your services in different pricing tiers.
3- Control the cost: There are many real life examples how an uncontrolled system can cause large bills. This is quite a risk for serverless applications thanks its highly scalable nature. Rate limiting will help you control these costs.
Solutions
There are multiple alternative rate limiting solutions in different layers. I will list 3 main ones with a brief pros/cons analysis.
1- Concurrency Level of Function:
Cloud providers create multiple containers to scale your serverless function executions. You can set a limit for the max number of concurrent containers/instances. Although this can help you to limit concurrency, it does not control how many times your function will be called in a second.
Here how you can limit concurrency for AWS Lambda and Google Cloud Functions.
Pros:
- No overhead
- Easy to configure
Cons:
- Not a complete solution. Only controls concurrency. Number of executions per second is not limited.
2- Rate limiting on API Gateway
If you are accessing your functions through API Gateway, you can apply your rate limiting policy onto API Gateway. Both AWS and GCP have guides to how to configure their solutions.
Pros:
- No overhead
- Easy to configure
Cons:
- Only applies if you are using API Gateway.
- It does not support more sophisticated cases like quotas per user or per IP.
3- Rate limiting with Redis
This is the most complete and powerful solution. There are many Redis based rate limiting libraries available. In Jeremy Daly’s blog post, he rejects Elasticache as a possible solution, saying that* this adds a “non-serverless” component and another thing to manage
*. Here Upstash becomes a very good alternative with its serverless model and per-request-pricing.
Pros:
- Powerful, you can implement a customized logic that fits your user model.
- Scalable solution. See how Github uses Redis for rate limiting
- Rich ecosystem, many open source libraries: redis_rate, redis-cell, node-ratelimiter
Cons:
- Overhead of using Redis.
Code: Rate Limiting with Redis
Thanks to rate limiting libraries, it is very easy to apply rate limiting to your application code. Here below the example code limits execution of AWS Lambda function per IP per second:
Visit the tutorial for the full example.
Reading List
https://cloud.google.com/architecture/rate-limiting-strategies-techniques
https://www.jeremydaly.com/throttling-third-party-api-calls-with-aws-lambda/
https://medium.com/google-cloud/rate-limit-your-api-usage-with-cloud-endpoints-quotas-1270da55d2bf
https://github.blog/2021-04-05-how-we-scaled-github-api-sharded-replicated-rate-limiter-redis/