The world watched how several days ago many services have failed: planes were grounded, news outlets couldn’t publish or broadcast, and ATMs refusing to cooperate with customers. Not every country was affected in the same manner, but enough systems around the world have been afflicted by the digital malady to make it a global issue.
We now know that at fault was the security component created by Crowdstrike that would not allow Windows machines to boot completely, and that it was remedied by using relatively simple steps that any sysadmin worth their salt could easily follow. Much bigger issue, though, was that this had to be repeated for every afflicted server – and there are so many around the world.
The storm is now over, many a sysadmin is recovering from the shock and the world is moving on. Here, I’d like to provide an answer to a few questions that have arisen from this, and to express my technological concern regarding this event.
Q: It’s all Microsoft’s fault!
A: It is not – the failing piece of code was provided by a third party company. There’s no blame on Microsoft for this one.
Q: Couldn’t Microsoft have prevented this?
A: Nearly impossible for that architecture (and for many other modern architecture, for that matter). The problem is that it was a piece of security software that malfunctioned, and it had a “special” access to Windows. Ordinary software is not allowed to be integrated into Windows – it has to use services provided by the OS and be happy with what they get. Such buggy software might fail, but it would affect only that particular program and nothing else. The mail client might crash, and you’d simply restart it without thinking twice. That’s because “ordinary” software is not allowed to meddle with Windows. Crowdstrike’s solution, however, is a different beast: it is designed to reach deep within the Windows OS because that’s the only way to provide security services it offers. It can not rely on Windows to get everything it needs, but because it has to protect Windows system itself it has to go “low-level”, integrate with the very kernel (“a core”) of the operating system. And, as you probably surmised, when such application crashes, it breaks everything. Microsoft can prevent your email client from burning down the system, but it can’t do much in case like this one. The best we got is the ability to log into safe mode and fix things manually. Which is great, considering how low-level this software is.
Q: How could that happen to a serious company?
A: We will likely never know. A slip of mind, failure to follow procedures, insufficient testing before release… anything could happen. And it does happen – even to very big, very serious companies. Honestly, after decades the IT industry has treated their customers with beta-quality releases, I think that we all got a little too relaxed.
Q: Was it an act of sabotage?
A: Again, we might never know. My opinion is that this wasn’t a sabotage but a simple and idiotic oversight. However, it could have been a test.
And here I will express my concern regarding what happened…
There’s a term in cybersecurity: “supply chain attack”. It concerns attacks that aren’t focused on a desired target; instead those attacks are focused on a part of the supply chain, all those other places that contribute to the final target in any way, and which can be compromised. Supply chain attacks aren’t new, and aren’t unheard of – just last year, there were 2769 companies affected by the supply chain attack, and the numbers are rising.
What we’ve seen in this incident is a perfect simulation (assuming that the event was a mistake and not an act of sabotage) what happens when an entity in supply chain is attacked or compromised. Many diverse businesses were incapacitated who relied on one single piece of software that went rogue. And those businesses don’t even have to know that they’re using that particular software: it can be well hidden, embedded in another code that has been embedded in another code, that has been… At the end of that chain is the customer that can not get the service they need: a traveller, a hungry patron that can’t pay with their credit card, or a person in IC unit, a patient who can’t get their medication or medical procedure… it can go from mere inconvenience to a total disaster.
This is the main security concern for a supply chain as the weakest link defines the security of the entire chain: it doesn’t matter how much you spend on security if some third party provider does a sloppy job.
My other concern is the digital monoculture, a term not unlike the agricultural one, with pretty much the same outcome. In agriculture, going for a monoculture (just one type of crop) is known to eventually yield multiple issues, from nutrient depletion to increased vulnerability to diseases.
It’s the same with computers. If you’re old enough you might remember the (well, not so distant) past where there were not just Windows, Macintosh and Linux.
In eighties, we had a Cambrian explosion of computer diversity (developed around a handful of CPUs) that weren’t very compatible with each other. Over the years, just a few of them survived: PC because it was a business machine that could interchange data with more serious computers, Mac because it was pretty and (mostly) user friendly; other great computers like Atari ST or Amiga that were way ahead of the former two just couldn’t compete and died out. Linux showed up in 90’s and slowly overtook other Unices (excluding Mac OS that runs Macintosh computers but doesn’t do much in server space), leaving just a few still alive.
This is a problem, as we’ve just witnessed: if the infrastructure were more diverse, the impact of Crowdstrike would have been much smaller. Businesses operating Linux were mostly not affected or affected a little if desktops were incapacitated, but this is not to praise Linux over Windows: this could have easily happen on Linux, too. Diversity is the key even if it is more expensive because it provides the same benefits that polyculture provides in agriculture.
And this is why having Windows, Mac and Linux around and focusing on just that trio is inherently bad for the entire industry.
Views: 65
Comments, rants and vents