A while back, a client wanted to create a monthly report from data in their Shopify store. They had tons of products, and getting the data required calling a couple different Shopify endpoints repeatedly. Easy to write, except Shopify also rate-limits their API (as all real APIs do), so I couldn’t just blast the endpoints all at once (or in rapid succession) and expect good results.
I retrofitted some code to mostly help me respect the rate limits, but it was never a great solution. Also annoying to write, and also still wouldn’t let me scale out the solution horizontally. The solution was implemented with Google Cloud Functions, and it would have been great to launch multiple functions at once to crunch through the data, but then they’d all need to talk to each other to make sure they all respected the same rate limit.
There are bits and pieces of libraries and services that can help with these issues across various platforms. Istio might have something with their service mesh stuff, but if you can figure out how to use it, you’re one of the smartest people I know. Polly has a Rate Limit policy available, but it uses a specific algorithm, is actually designed for the server side (not the client side), and can’t support coordinating multiple instances (since it’s a library run locally). On the JS side, bottleneck seems incredibly popular with over 1.4M weekly downloads, but it hasn’t been updated in 3 years, and requires you to setup and manage your own Redis/cache layer in order to coordinate multiple instances.
Basically, nothing I found seemed very easy to configure and use for my dumb brain.
I’m going on a journey now to see if I can help solve some of these challenges with HallPass.
What is a Rate Limit?
If you’re not a developer,
I’m surprised you’re reading this allow me to briefly explain what a rate limit is.
Actually, I can’t explain it any better than this link does.
What is HallPass?
Right now, HallPass is a .NET library in pre-release status that allows developers calling rate-limited APIs to easily respect those API rate limits within a single instance, and using the basic Token Bucket algorithm. Ease of use is the most important motivation behind HallPass, even more so than pristine accuracy.
What’s easier than using the same
HttpClient that we already use in .NET?
(Shoutout to Jarrod for the insight.. let me know if you want your last name mentioned)
I want HallPass to be as unobtrusive as possible. After a simple configuration, it should just work (whatever that means). For .NET, I think this is pretty close to the ideal look.
When branching out to other languages (NodeJS/TypeScript is high on the list of priorities), I’ll try to keep the configuration and usage as native-feeling as I can, as well. Since JS devs use many different HTTP clients, I figure I’ll need to provide some easy hooks for some of the more popular ones (axios for sure).
Anyway, the other nice thing about hooking into .NET’s native
HttpClient is that we can still transparently use other great libraries (like Polly) for things like automatic retries.
Also, most third-party HTTP clients in .NET – like RestSharp – are switching over to using .NET’s native
HttpClient under the hood, so HallPass should (not tested!) be compatible with those, as well.
As of now, HallPass.NET is built assuming that developers are working purely in async/await flows. If you’re calling an external API from .NET code, it would be very rare to do so via synchronous code.
To that end, we only have async versions of methods. Also, it’s built from the start to assume that it will have multiple threads interacting with it, so that it needs to be thread-safe all the way through.
Given it’s a pre-release, I’m still writing tests to make sure this claim holds up, but so far it looks pretty good. Also, all of the SDK’s (for .NET, JS (soon), and other languages) will be open-source… so hopefully we can get some good feedback to fix any holes I undoubtedly will miss.
As an example, here’s one of my tests:
HallPass will soon enable developers to easily respect API rate limits, even if their calling clients are distributed horizontally across multiple instances. For example:
Suppose you have a service implemented as a serverless function, which could easily spin up multiple instances at the same time. And suppose this function calls a rate-limited API. How can you ensure that you easily respect this rate limit, shared across all of your function instances? You could spin up your own DB or cache layer and figure out how to implement a fault-tolerant and concurrent rate-limit strategy yourself. But that’s hard. Remember, HallPass wants to make this all to be easy.
Or suppose you have an application implemented with micro-services, and some of these micro-services call a rate-limited API. They also need to share rate-limit consumption information amongst each other in a robust and performant manner.
To accomplish this, HallPass plans to offer a remote service from which calling clients can request chunks of hall passes, which are then used by the SDK library to get local permission to call the rate-limited APIs in question.
Though there will be a REST API, I’m hoping that the SDK itself is still the preferred way to use the remote service. Here’s an example of how that would look:
- More rate limit algorithms: Leaky Bucket, Sliding Window, Fixed Window …
- Composable rate limits: “global limit of 100/min + endpoint specific limit of 30/sec”
- Easier configuration
- Better performance: more accuracy, less resource consumption, higher throughput …
- Better cloud integrations: “deploy to AWS in zone us-east-1”
- More deployment methods: dedicated cloud hosting, custom configurations for on-prem / hybrid clouds, etc.
How Does it Work?
For full details, check out the source code.
TLDR: Each rate limit has a bucket. When we make an HTTP request using the HallPass
HttpClient, the request has to pass the HallPass
DelegateHandler before proceeding to the actual request. The
DelegateHandler checks if the request should be rate limited. If so, it finds the corresponding bucket and asks it for a hall pass, waiting until it gets one. Once it has a hall pass, it proceeds to the actual HTTP request.
Local vs. Remote
The biggest difference between the Local and Remote buckets are that Remote buckets end up refilling their local stash of hall passes in batches by calling the REST API. Once it has a batch of hall passes, it operates essentially as a local bucket until it needs more.
So the REST API is responsible for the following in order to coordinate everything appropriately:
- Registering individual instances using the same shared key (for “fairly” distributing hall passes to the various instances)
- Refilling the central bucket (atomically)
- Taking from the central bucket (atomically)
To manage state, the REST API is currently using Redis in the cloud. Redis is fast, but it’s slower than in-memory, so there’s some baked-in complexity that I’m not yet sure how to best solve for scenarios that require high throughput (rate limits of 1,000 requests per second, for example)… but I have some ideas that I’ll be testing shortly.
Fun Stuff for Programmers
One thing that was fun to implement was a
ConcurrentSortedStack<T>. I (think I) needed this for the
RemoteTokenBucket, because it’s possible that different threads ask for refills from the remote service at the same time, and each get back hall passes with different valid windows that are not in order.
For example, let’s say threads A and B ask for refills. A gets theirs first on the API server, but B’s come back to the client code first, and B’s hall passes have
ValidFrom times that are later than A’s. If I use a normal
ConcurrentQueue, I would add B’s first and start grabbing them FIFO-style. I would need to wait until they became valid before proceeding, though. Eventually, I would get through the B tickets, and then start pulling the A tickets. But the A tickets would be expired by this point, so that refill was worthless.
If instead of a normal FIFO queue, I had a list sorted by
ValidFrom times, then I would be assured that I’m using as much of the requested hall passes as possible.
To implement the
ConcurrentSortedStack, I ended up using a
LinkedList under the hood rather than a tree or other structure, because I wanted to optimize these two points:
- Removals, specifically from the top
- Insertions of groups
The most common operation would be plucking items from the top of the stack, so that needed to be the greatest priority. A
LinkedList in .NET offers removal from the front or back at
For insertions, I could optimize my custom structure a bit better than the worst case for a sorted
O(n) because I would generally be inserting groups at a time (actually, at its worst case, inserting groups would be
m is the size of the group of new items to be inserted). That means that I could do the following:
- Sort the input group first
- Insert the first item from the new sorted group, starting from the front, and remember the node that it was inserted before
- Insert the next item starting from the last insertion point, knowing that it must be greater than or equal to the first item
- Repeat until all items are inserted
This should bring the efficiency back down to
O(log m), assuming .NET sorts arrays at
But practically speaking, I expect the efficiency to often be
O(log m) thanks to .NET’s
LinkedList also keeping a reference to the last item. Since each new group being inserted will likely belong after the last item, I can check that first before starting item-by-item from the start of the list. So then I only pay for the sorting of the new group.
Anyway, I’m looking for feedback, testers, and contributors. Reach out if you have interest.