CrowdStrike is like all of us

Published July 23, 2024

CrowdStrike failure is a reminder that we are all human. The difference is that nothing I can do can cause such noise because I am not that successful.

For somebody like me always looking for new challenges at scale the idea that something I do can have such impact is fascinating, I don’t want to lie. I still remember the feeling when I realized that the SaaS I was operating counted more than 5000 EC2 a few years ago.

Operating at the scale of “I can brick billions of devices!” sounds interesting, and I don’t know the detail yet about what happened at CrowdStrike, but this is the perfect opportunity to remember to ourselves that mistakes like this happen very often, but there is a minimum number of software that runs in billions of devices with such high permission level.

cURL runs of billions of devices, but it can’t patch itself or the kernel like this antivirus can do. And I am not here to discuss if it is good or bad for a software to have such freedom, probably not that good but this is what cybersecurity is all about, keeping your software stack up to date at any cost. This time the bill surprised many people!

Anyway let’s get a quick and dirty list of good practices that I am sure CrowdStrike knows and adopted 99% of the time, but not this time.

Do not deliver your software all at once

Blue-green deployment, canary release, feature flags and you name it. There are many strategies that are well known to mitigate the risk of shipping a buggy version of your software to all your customers at the same time. Adopt one of those and use it as much as you can, even if it adds a bit of complexity to your delivery pipeline.

Early adopters

Build a community of early adopters. Docker Captains, AWS Hero… Many companies built their own programs with solo developers or partners that are happy to get their hands on a new version of their software as early as they can for many different reasons:

Get one of those partnership going and ship software to them, add their green flag to your release check lists.

Stable and unstable channels

Be clear about your updating strategy and let customers decide where they want to go.

Conclusion

Sleep at night. 99.9% of us operate a system that can not cause so much damage and there are easy solutions that you can adapt to mitigate such risk, but CrowdStrike knew about those as well, and I am sure they adopted them for almost all their rollout, but it didn’t save them from making a mistake because they happen, we just have to figure out if we want to buy their stock at this low price or not.

I can help you introduce one of those practices in your current workflow. Feel free to schedule slot to chat about what you are up to.

Are you having trouble figuring out your way to building automation, release and troubleshoot your software? Let's get actionables lessons learned straight to you via email.