By Andrew Plato, CEO, Zenaciti
For many years, I had the poster at the top of this article in my office. I liked the cats. Now, I look at that poster and see more than cats. I see the benefits of a monoculture.
A software monoculture is when an organization standardizes to a single software product, such as Microsoft Windows.
The recent Crowdstrike outage has a lot of IT people, including the Electronic Frontier Foundation (EFF), pushing to break up software monocultures, like Windows. They have also made it to the courts, with Delta Airlines suing CrowdStrike for their role in the massive outage, which crippled Delta’s flight operations for days.
While outages such as the CrowdStrike debacle are irritating, they are not unusual. Crashes, outages, and hacking are a normal part of the computing landscape. Software is complex, bugs are common, and recovering from outages is a typical portion of an IT team’s duties.
Moreover, the benefits to standardizing to a single or small number of software vendors are well-know: efficiency, consistency, and security. As a person who has built security operations and incident handling teams, it is profoundly easier to spot, track, and remediate malicious behaviors in standardized environments. If the cats did not convince you, then maybe your primal brain will.
Primal Contrast
The reason it is easier to spot problems in a standardized monoculture is a simple fact of our biology. Our primal brains are hardwired to prefer high-contrast environments. Consider these two images:
Each block contains one black dot. It is a lot easier to spot the black dot in the image where all the other dots are the same shade of blue versus the image where every dot is a slightly different shade of blue.
When you have hundreds or thousands of computers, all running a common set of software, it becomes easy to identify the computer that starts doing strange things, like downloading scripts from a sketchy site in Russia.
A software monoculture enforces uniformity, which makes security easier and more efficient.
Single Point of Failure
Critics will, rightly, argue that monocultures they create single points of failure (SPOF). This is what happened with the CrowdStrike outage. Companies that standardized to Windows and CrowdStrike were knocked completely offline when the bug hit.
The traditional response to SPOF issues is to build redundancy and diversity into an environment. This is essentially what the EFF is promoting.
However, this focuses on a single monoculture (Windows) while ignoring all the other monocultures. Modern computing environments depend on an enormous collection of standards from things as rudimentary as power delivery to complex encryption algorithms such as RSA. The Internet itself depends on a huge set of standards, such as TCP/IP. All these standards all have weaknesses, which is why we have security technologies, like CrowdStirke.
While standards can create SPOF, they also create uniformity. This allows organizations to build atop those standards with their own resiliency, redundancies, and security.
The Blame Game
Discussions around monocultures invariably devolve into a Microsoft-bashing experience. Microsoft is a big company, with an oversized presence. They are an easy target when something breaks.
To state the obvious, blaming vendors might feel righteous and relevant, but it accomplishes nothing. Rather than blame, IT departments should look at outages as an opportunity to refine recovery processes.
Outages, crashes, and attacks are a normal part of operating any IT environment. It is foolish to think having a more complex environment will somehow diminish outages. What IT leaders need to focus on is building a robust problem-solving culture. The CrowdStrike outage was disruptive, but it was resolvable. Most organizations were able to recover in a few hours.
Incidentally, this is the crux of CrowdStrike’s response to Delta’s lawsuit. CrowdStrike alleges that Delta was more interested in blaming CrowdStrike than fixing their antiquated IT infrastructure.
Conclusion
Standardized software monocultures are not going away. Standardization offers far more benefits than weaknesses. For organizations worried about monocultures, there are numerous steps you can take to minimize the impact of the next big outage:
- Have an incident response plan with detailed processes for recovering impacted systems.
- As part of an incident response plan, include formalized methods to communicate with users to advise them of IT outages and repairs.
- Have reliable, tested, and validated backups of critical systems.
- Require users to store critical content in centralized, redundant storage location (Dropbox, OneDrive, etc.)
- Implement processes to rapidly roll-back systems to a known-good state.
- Automate updates, to rapidly roll out fixes when released.
- For remote workforces, consider adopting virtual desktops or other technologies that reduce reliance on endpoint computers.
- Conduct regular exercises to test backups and recovery procedures.
Plato can be reached at andrew.plato@zenaciti.com