Building a Drift Detection Engine with Upstash
In this article, we will go through the steps of building a basic drift detection engine, utilizing the power of Upstash for our online remote state, and building the necessary functions (in TypeScript) for managing the local state computation. Let's get started!
Introduction
One challenge when managing infrastructure as code is drift. Drift Management has been a hot topic in the Infrastructure As Code (IAC) area ever since IAC arose in popularity. While IAC frameworks provide the ability to initialize infrastructure, their state is always dynamic and can change from the original declarations.
Note that while "Drift Management" is commonly applied to the IAC domain, there has been a rise in such applications in many areas. For instance, Vercel's TurboRepo Engine surely includes such an approach to validate and invalidate artifacts that are no longer "useful" or are no longer active.
In this article, we will build a drift management engine using TypeScript. Let's get started!
User Acceptance Criteria:
- Persisting state remotely
- Compute the difference of local vs remote state
- Resolve discrepancies in either remote or local
Persisting state remotely with Upstash
One tremendous benefit of Upstash is the ease of getting started and effectively having both a redis database and an HTTP API URLs. This is essential, as could create local functions that persist state and follow the redis protocol, imitating a CRUD API:
Persisting state locally
For this use case, we will use local json
blobs. Each json
can store its metadata, and the filename will be its primary key:
example.json
:
This resource was given name
, description
and expiry date. Providing an expiry date would be a very useful attribute as in the future we could determine autonomous-depreciation policies based on resource under-utilizations. For instance, if this resource fails our expected usage, it could be flagged and hence removed from the system.
At a later stage, our files will start looking like this:
Drifting local versus remote
We have setup both local and remote state, and now we are ready to start drifting state. Our drifting formula is something along these lines:
DRIFT
should be equal to 0 every single time. There are 2 possible cases other than DRIFT=0
case:
- case A:
remote
contains more resources thanlocal
- case B:
local
contains more resources thanremote
Let's see what do do on any of these cases:
CASE 1: remote
contains more resources than local
due to out-of-sync git
This can usually happen when a developer is working on a branch, and hasn't pulled the latest changes from the trunk. While the trunk has been updated with the latest resources, the local branch hasn't pulled all the recent changes, resulting in discrepancies with the local state. A way to visualize this:
# Branch | Checksum | Resource #
In this case, we don't modify the remote state as it's updated. The local state should remain as-is until the latest commit is pulled from the trunk.
CASE 2: remote
contains more resources than local
due to resource removal
Say our git
branches are in sync, but we find that there are more resources
available remotely than locally. That's an indication of a resource removal that has occurred locally. in this scenario, we want to remove the resource from the remote so that we sync it with the local state. Here's how we can do so:
This function will compute the resources that need to be deleted from the remote state. Note that it will return an array, which we later need to iterate to send a request to upstash
redis server, removing all the keys from the redis DB.
The terraform drift management approach
When it comes to managing drift, few compare to the power of terraform. There have been detailed blogs, like this one, on how terraform manages drift, and it's worth looking into the terraform model. A few key-takeaways:
terraform state
is similar to the "local" state of our exampleterraform refresh/plan/config
are commands that calculate state on demand.terraform apply
executes the configured resources, after alerting of possible add/deletion of resources.
In our example, we attached the apply
mechanism directly when a commit is merged on the trunk. While this is straightforward, there might be cases where treating the trunk as "pristine" might not work, and hence terraform decouples its commands from the predefined branch mechanism.
Many other IAC platforms have similar flavored concepts of managing drift, including AWS Cloudformation, Microsoft Azure, and so on.
The idea of coupling drift to branches could be appealing for certain teams who do trunk-based development, but if you are not falling under this category, a more on-demand (API) driven approach might work for you much better.
Summary
In this article, we built the bare bones of a drift management engine. By using controls for both remote
and local
states, we were able to interchangeably compute desired states and add/remove resources. Something to highlight is how easy it was to achieve the remote
computations by using Upstash
. Obviously, we could contemplate using other databases within AWS
(ie DynamoDB
), having a RESTful API in under 30 seconds was very e.handy to get started.
In the drift management space, there are certain players, like HashiCorp's terraform solution, which are leading the space. Drift management mechanisms could be applied in other domains. The general concept of remote
and local
differentiation is very close to the ones of cache invalidation, remote caching and others. I feel that the industry is just at the begging of realizing how powerful some of these concepts are, and one example is Vercel's Turborepo implementation, my bet is that there will be a steady increase of vendors utilizing such solutions in the near future, which makes the space very exciting to be part of!