The recent global computer outage caused by a CrowdStrike update stands as a poignant reminder of the profound pressures faced by professionals in this field. While the immediate cause of the outage was reported as a "single content update" conflicting with the Microsoft operating system, leading to the infamous Blue Screen of Death (BSOD) for millions of systems worldwide, the deeper narrative underscores a critical issue: the relentless, unyielding battle against cyber threats and its toll on even the most skilled engineers. It is more than ironic when solutions become the cause of the very problems they aim to solve. 

The Cyber Security Landscape: A Perpetual Battle
The cyber security landscape is characterized by a ceaseless wave of threats. Cyber security professionals, such as the engineers and professionals at CrowdStrike, operate at the frontline of this digital battlefield. At Check Point, we provide a similar suite of cyber security solutions as our peers at CrowdStrike. While some may consider CrowdStrike and Check Point to be competitors from a software sales perspective, it is important to recognize that cyber security solution providers are on the same side: team humanity. Our competitors and adversaries are the nefarious bad actors. 

In the cyber security field, our work is crucial in the aim of protecting sensitive data, maintaining the integrity of critical infrastructure, and ensuring the seamless operation of global networks. The constant vigilance required to counteract these threats leaves little room for respite. Regular updates, patches, and defenses must be deployed with unwavering regularity to stay ahead of malicious actors.

Exhaustion and Its Consequences
An investigation by a third party must be done. A disruption of this magnitude requires a digital forensics investigation of the cause of the issue; the people involved must be questioned, their machines must be imaged, and accountability is part of the cyber security profession. It's possible that there was foul play. It's possible that there was malicious intent behind this cyber disruption. It's too early in this incident to rule out all possibilities. The initial reports about a bad software update causing a global Blue Screen of Death outage for untold systems worldwide are, however, a likely story. Device control and root access endpoint platforms can absolutely cause this chaos with a single misplaced line of code and a bug in a bad update.  

At first glance, with only the initial reports that we have now, those in the cyber security field will likely agree that this disruption was caused by negligence in following procedure: a failure in a quality assurance best practice. Mistakes like this can happen. Was the update pushed out with poor reviews due to exhaustion, ignorance, or malice? The future investigation and press releases will hopefully provide more confidence and lessons to be learned on those details.

The Irony of Information Technology
The Product Development team members at CrowdStrike are among the best in our field—highly skilled, knowledgeable, and dedicated. Their expertise is not in question; their commitment is evident. At Check Point, we know because we deal with the same threats and same goals that they do. 

Remember that the CrowdStrike team is responsible for the Falcon Content Update that caused a global computer failure; this team comprises fellow human beings. The relentless pace of their work, driven by the perpetual threat landscape, inevitably leads to exhaustion. This state of fatigue can result in mistakes, not due to incompetence or lazy negligence, but due to the sheer burden of their responsibilities. Accidents like this can happen for multiple reasons to any organization; small budgets, tight timelines, sparse teams, and constant pressure to protect against the next attack innovation by the bad actors whose goal is fraud and destruction equate to immense stress and anxiety to perform.

This recent incident is not being reported because of a hack or breach. George Kurtz, the CEO of CrowdStrike, said on the Today Show, "The system was sent an update. That update had a bug and caused an issue with the Microsoft operating system."

If the issue stemmed from a simple software update that was pushed through to production, it was likely a mistake made by exhausted professionals under immense pressure. Talent turnover and professional burnout have been reported to be vital issues for most software companies. This does not absolve the responsibility but contextualizes it within the broader scope of their challenging environment. The error, while catastrophic, was a consequence of human error, which is constantly meeting relentless demands.

Our digital world is fragile. Airports worldwide have canceled flights, hospitals have canceled surgeries, point-of-sales systems and credit card processing have been halted for millions of businesses worldwide, and public safety and law enforcement agencies went without communication. Humanity's dependence upon computers has never been more evident. We rely on technology for almost every facet of our lives.

When our technologies are taken away, we are lost without them. 

As a consequence of a mistake made by one team at one company,  members of the cyber security and IT communities throughout the planet have been impacted. Many lost sleep. Countless people felt this mistake, and their plans were detoured. This is the butterfly effect flapping its wings on all of us.  

The Role of Bad Actors: Indirect Culprits
While the immediate fault lies with the update that caused the outage (currently), the root cause extends to the continuous onslaught by fraudulent bad actors. These malicious entities, through their relentless attacks, wear down the defenders. They create an environment where mistakes are more likely due to the overburdened and overstressed state of cyber security teams. In this instance, the bad actors did not need to breach the system; their persistent threats indirectly led to a critical mistake, a fumble in the ongoing cyber war, and dire consequences.

Personally, my vocational stress comes from email-borne threats. Email phishing remains the most common cyber security concern due to its volume, accounting for a significant portion of email threats, with 94% of malware being delivered via email​ evolving daily with new tactics and techniques employed by innovative bad actors. These malicious entities continuously develop sophisticated methods to deceive users, making phishing attacks more challenging to detect and prevent. Every day, cyber security teams face fresh threats, including spear phishing, whaling, impersonation, QR code attacks, business email compromise, and advanced credential harvesting, all designed to exploit vulnerabilities and gain unauthorized access to sensitive information.

Daily, hourly, and minute-by-minute updates and vigilance are imperative to counteract these evolving threats, mitigate risks, and address new challenges in real time. Our systems push out updates, minute-by-minute, in real time, because we have to. Real-time cyber threat pre-delivery protection is a standard that we meet every minute of every day because we must. That is the demand my team at Check Point has aligned to always maintain for all threat vectors.

Providing security solutions involves many complexities. There is a fine line where threat protection requirements meet infrastructure frailty. It is not easy.   

Cyber Security Paradox
The cyber security paradox navigates the tightrope of balancing business convenience with data compliance. It emphasizes the need to protect data without paralyzing the flow of information, ensuring security measures detect threats without delaying essential processes. The challenge lies in filtering threats effectively for some while maintaining seamless access for others. Ultimately, it is a domain where one must expect the unexpected, preparing for unforeseen challenges while maintaining operational efficiency. This constant balancing act highlights the complexities and critical nature of modern cyber security.

The people responsible for figuring out and constantly maintaining these paradoxes in balance are just people. Many are highly skilled and well trained with decades of experience. Some are just breaking in and getting a rude awaking to the everyday challenges, ceaseless alerts, and perpetual stress to protect, report, and repeat. Threat research teams, product developers, analysts, engineers, and others are rarely thanked or remembered while things are going well.

For any organization impacted by this disruption, uncomfortable conversations will ensue. What will we lose? What did it cost? Who do we blame? What do we do now? Who is going to fix it? 

According to George Kurtz, “This is not a security incident or cyberattack. The issue has been identified, isolated, and a fix has been deployed.” (Source: independent.co.uk) The irony of navigating the cyber security paradox is that in the attempt to solve problems, so many others can be created. The best way to deal with an issue is to avoid it in the first place. The greater problem is that in avoiding one problem, other obstacles will inevitably force your method of deployment down a narrow path of code integration landmines. The “fix has been deployed,” said CrowdStrike's CEO, with undeniable damage already being done. The immediate accountability is respectable, and anticipating transparency in further disclosures, this global incident is a learning opportunity for everyone in our field. 

The Need for Sustainable Cyber Security Practices
The CrowdStrike incident highlights a crucial need for sustainable practices in cyber security and in technology development. It is imperative to recognize the human element within cyber security and DevOps teams to implement measures that address their well-being. This includes realistic workloads, adequate rest periods, and mental health support. Organizations must foster environments where cyber security professionals can perform optimally without being pushed to the brink of exhaustion. Withstanding, quality assurance is a must. Auto-updates must correct course and auto-revert when issues arise. These are idealisms easier said than executed.   

With uptime requirements to be always on, always working, and always on high alert, so is the pressure on the teams that support those systems, servers, platforms, processes, and infrastructure.

There is a need for greater automation and smarter systems that can alleviate some of the burdens of human operators. That has always been the case. Artificial intelligence (AI) and machine learning can play a significant role in handling routine tasks, allowing human experts to focus on more complex issues without becoming overwhelmed. Guess what? Even with newer, better AI systems, even with the latest and greatest new solution, that’s yet one more platform to manage, one more piece of infrastructure to maintain, and at the helm a few mere human brains making sense of the bits and bytes that transpose through the system.

AI is an answer. AI is not the only answer.
Artificial intelligence has emerged as a revolutionary force in cyber security, heralded for its potential to keep pace with the ever-evolving landscape of digital threats. Advocates will argue that AI can analyze vast amounts of data, detect anomalies, and respond to threats more swiftly and accurately than human operators. Those of us working to create AI protective systems know that AI-powered platforms are required to defend against AI-created threats and the bombardment of threats that bad actors are creating with the misappropriation of AI systems. At Check Point, we refine our AI ThreatCloud every day to better secure and safeguard against the constantly evolving threats created by bad actors using AI of their own for fraud and deceit.   

Indeed, AI is poised to become a cornerstone of cyber security, providing advanced tools that enhance the capabilities of professionals and hopefully refine practices to reduce the likelihood of incidents like the recent CrowdStrike outage. As of current reporting, CrowdStrike did not have a breach. The incident was caused by something else, a bug, a blunder, that slipped through to production, and my guess is because of professional burnout. There are some problems AI is just not going to fix.

I often like to say, "AI is not actually artificial intelligence; I think of AI as 'Automated Instructions.'" My aim with this phrase is to encapsulate the essence of and goal of AI systems driven by sophisticated equations, algorithms, and pre-determined information designed to perform specific tasks with remarkable efficiency—meanwhile, respecting that software still runs on the machines we created, which are still just objects and tools we use.

It is essential to recognize that AI, despite its incredible potential, is not without its limitations or fallibilities. AI systems are ultimately products of their programming and the data sets they are trained on, making them susceptible to errors and biases inherent in their design. Like any technology, AI is fallible and can introduce new variables and unforeseen consequences into the cyber security landscape. While AI can automate many tasks and augment human capabilities, it remains crucial to maintain a balanced perspective. Human oversight, continuous monitoring, and adaptive learning are necessary to ensure AI functions as a reliable ally rather than a source of new vulnerabilities. In embracing AI, we must remain vigilant, acknowledging both its promise and its potential pitfalls.

Conclusion
This situation is as nonsensical as a cartoon character slipping on a banana peel. How can one code update cause a global system outage for machines all over the world?

The global computer outage is a stark reminder of how dependent we all are on one another. One weak link in the chain can cause the rest of us to break. The modern tower of Babel is the technology stack of interlaced systems written in different languages, conflicting dependencies, and fragile architectures stacked on top of one another. 

The pressures faced by cyber security professionals and DevOps teams to build and maintain these systems in the name of security is incredible. The CrowdStrike disruption underscores the fact that even the most skilled and dedicated teams comprised of talented individuals have faults. The systems we depend on will always have unknown vulnerabilities until they are known. Moving forward, every organization should better consider the sustainability and well-being of its professionals. More rules, more change control steps, additional sign-off requirements, and more policy reviews will not fix or prevent mistakes like this from being made again. The Falcon product team for CrowdStrike is probably comprised of talented, experienced, and skilled members, none of which get enough sleep. 

I would say that it is crucial to recognize the root cause: the perpetual threat posed by bad actors. Relentless attacks and cyber threats have indirect but profound impacts. Even when companies do well to create solutions to combat these threats, the bad actors have proven to be successful at wearing down the very individuals who combat them.

We all must respect just how delicate and fragile our electronic lives really are. As we strive for a more secure digital future, let us strive for a more humane and sustainable approach to cyber security.

Effective Incident Response planning is crucial for minimizing damage during cyber incidents and infrastructure outages. Redundant systems, backup alternatives, and everything else you already cannot afford to maintain operational continuity need to be documented. If possible, have analog processes as part of your contingency plan. That may sound absurd, but I am serious. Your organization's technology dependence is a glass house with an uneven foundation. This disruption was just a crack, so be prepared for when it shatters.  

"The impediment to action advances action. What stands in the way becomes the way."
             – Marcus Aurelius

In the meantime, as a community, we can work together, embrace the challenges we face today, and look forward to tomorrow's opposition. Success is our only choice, and working together is the only way we will succeed.  

 

CrowdStrike Disruptions – Ensuring Business Continuity
Falcon Update workaround. See
https://blog.checkpoint.com/security/crowdstrike-disruptions-ensuring-business-continuity/

For detailed guidance, refer to CrowdStrike’s advisory here.