Need us now? Call (310) 477-9955

Global Outage Caused by CrowdStrike Highlights the Cost of Insufficient Testing

Last week, a faulty CrowdStrike firmware update shut down banking and retail operations, interrupted 911 and other communications systems, grounded or delayed 3,000 flights, and rained chaos everywhere.

This particular glitch left enterprises pointing their fingers at Microsoft until CEO Satya Nadella issued a statement saying, “CrowdStrike released an update that began impacting IT systems globally.” Accurate, and somehow an understatement.

CrowdStrike vaguely attributed the issue to a “defect found in a single content update for Windows.” The “good news,” at least for CrowdStrike, was that the defect did not appear to stem from a cyber threat. The bad news was that the havoc wrought by an undetected issue in what should have been a routine update begs the question:

Why didn’t CrowdStrike find the issue first?

This short blog offers an inescapable explanation and best-practice tips for making your products work as expected and your business avoid making news for all the wrong reasons.

Where did CrowdStrike go wrong?

The official company statements cited the failure of internal quality control mechanism. In this case, CrowdStrike released a sensor configuration update to Windows systems as part of protection mechanisms for its Falcon platform that caused a logic error that led to a system crash and led to 8M+ users facing the dreaded “blue screen of death” (BSOD).

From the vague explanation provided, we can’t specify what, when, and how more testing would have exposed this specific issue. Microsoft Copilot writes:

“The issue arose due to a flaw in the Content Validator, allowing problematic data to slip through safety checks. CrowdStrike has since added a new check to prevent a recurrence.“[i]

The old saying, “measure twice, cut once” applies here. Instead of conducting what CrowdStrike called, “a thorough root cause analysis to determine how this logic flaw occurred” after the fact, more thorough quality testing beforehand would have surfaced the flaw early on in development—when they could have fixed it quietly without the whole world knowing it was there.

The fact that you can’t possibly plan for every contingency notwithstanding, this disaster—most disasters—could have been averted through more rigorous testing. The bottom line:  More testing equals fewer surprises.

[i] MSN.com

Finding issues early safeguards revenues — and saves face

Negative headlines cost companies money – customer loyalty, dips in stock prices, fines under certain circumstances. The takeaway from Friday’s global debacle was that CrowdStrike didn’t conduct proper testing of their upgrade before rolling it out. Thorough testing early on in development saves time, money, and brand reputation damage.

Apposite has worked with numerous world-leading tech companies that can attest to the cost of finding issues early in the development cycle vs. the cost if customers discover them once the product is fully released.

In their experience, it may cost $100 for a developer to find and fix a bug, but that same issue could easily cost 10X if found by the QA team. If found by the system test team, that cost jumps to 50-100X. If a customer finds the issue, costs may skyrocket by more than 10,000X – not to mention the potential harm to brand reputation and trust depending on the severity of the issue.

The failure of CrowdStrike developers to detect the issue underscores the critical importance of thoroughly testing new releases before deploying into production — and the cost of failing to do so.

What happened to the global cybersecurity giant should never happen to a company that big, or to any company whose products can bring their customers’ businesses to a standstill — and it doesn’t have to.

“Measuring twice” to skip bad press isn’t hard

With a programmatic approach and a versatile, easy-to-use platform, testing doesn’t need to be hard. Before pushing new software and firmware releases, organizations should make it a practice to define and conduct a full range of performance, regression, and quality assurance testing against a range of real-world conditions. For example, spikes in traffic, peak or extreme network conditions, sub-par Internet connections, and cyberattacks.

Testing should include generating a large scale of user traffic to simulate typical and unusual use cases including cyber threats and vulnerabilities. And emulating a realistic variety of access and network technologies and media (LANs, wireless, satellite, cable, fiber, etc.) in a controlled lab environment.

Apposite’s “lab in a box” solutions make it easy for engineers, technicians, and IT teams with no specialized test expertise to get up and running quickly and make simple-to-execute testing a core element of development workflows. This vital strategy ensures the success and cybersecurity posture of new products throughout application rollouts, point releases, and system upgrades so you can be sure they perform as expected once deployed.  

Start now

To learn more about testing new technologies fully before pushing product releases, check out the use cases and easy-to-use solutions for emulating real-world network conditions and generating realistic application and cyberattack traffic.

Download Our Network Security Testing Solution Brief

Discover how Apposite helps with the challenges of testing network security defenses and infrastructure.

TRUSTED BY TOP BRANDS