On July 19, 2024, a faulty software configuration update from CrowdStrike resulted in a global outage on Windows systems affecting critical sectors like airlines, banks, hospitals, and emergency services.

It’s impossible to stop outages 100% of the time and in the age of composite software it can be difficult to pinpoint a faulty process or code component, but these risks can be mitigated with automation and visibility. Here are a few key processes and technology improvements that can help minimize business exposure and reduce the likelihood of outages:

  1. Orchestrate software releases and deployments: Release orchestration helps ensure a more controlled and processes-based rollout of updates. By implementing phased deployments and automatic rollback capabilities, issues have a greater chance of being halted during the deployment process, mitigating widespread impact. For example, with the canary deployment pattern, a change is rolled out first to a small group of users and validated, before being pushed out to the rest of the teams.
  2. Governance and security: Security and governance frameworks, implemented & analyzed across the entire software delivery process, help ensure adherence to compliance and security policies, providing an additional layer of checks to prevent incidents from occurring.
  3. Predict change risk: Incorporate AI/ML models specifically designed to predict which changes are prone to failure. By analyzing past and persistent trends in change related incidents, problems, and outages, business can effectively reduce the likelihood of change failures. Leveraging risk prediction technology to analyze massive amounts of code, historical data and patterns related to software changes, enables teams to better predict potential failures and preemptively flag risky software updates.
  4. Automate testing and quality assurance: Automated testing capabilities thoroughly test code in various development and stage environments, helping identify critical issues before they reach production environments or end-users. Integrating strict smoke and sanity tests into major (and minor) software changes can help ensure that critical failures are significantly less likely to occur. This requires a combination of unit, integration, and end-to-end testing.
  5. Comprehensive security: Where continuous testing procedures work as a quality assurance inspector, checking to ensure a new product works as intended, security capabilities inspect and scrutinize the materials and construction of the product, during coding and in production environments, to ensure it’s built with robust security features limiting the impact of bad actors.

By employing Digital.ai solutions that help automate the varied processes and insights required to deliver secure, quality software, businesses have a better chance of reducing the risk of deploying faulty code and avoiding global outages.

Contact us to learn how we can help you reduce failures.

Learn how we can help you reduce failures

Explore

What's New In The World of Digital.ai

July 23, 2024

Obfuscating Code of an Android App

Learn the importance of code obfuscation on Android. Discover the benefits, tools, & best practices to protect your intellectual property and enhance security.

Learn More
July 22, 2024

Summary of the CrowdStrike Incident and Prevention with Digital.ai Solutions

On July 19, 2024, a faulty software configuration update from…

Learn More
July 19, 2024

Guide: How to Obfuscate Code

Learn how to obfuscate code effectively in this comprehensive guide. Discover the importance of code obfuscation and explore different types and techniques.

Learn More