Cloud Detection and Response: How Much Auto-Remediation is Safe?
Incident response (IR) automation seeks to reduce cyber risk by accelerating the identification, containment, and mitigation of cyber attacks, thus limiting disruption and damage. AI-driven cybersecurity tools can execute predefined workflows in near real-time to produce fast, consistent responses—reducing the potential for missteps while freeing up precious human bandwidth to focus on more complex remediation tasks.
But how much IR automation is safe—especially in the cloud with all its dynamic interconnections? Could the automated response component of a cloud detection and response (CDR) solution do more harm than good
This article explores the unique considerations for automated IR in the cloud and argues for a risk-based approach to optimize outcomes.
Why is CDR auto-remediation important?
Rapid IR is especially vital in the cloud, where everything happens at warp speed. The cloud is also a dangerous place where misconfigurations and lack of visibility into interconnected, ephemeral and distributed assets and activities can create hidden vulnerabilities. Cloud environments are also highly complex and require scarce security expertise.
In these challenging conditions, CDR auto-remediation can reduce alert fatigue and help staff prioritize alerts based on criticality so they can focus their remediation efforts more efficiently. CDR automation also cuts the time it takes to analyze high-priority threats and initiate action in circumstances where manual threat analysis is often too slow to be effective at blocking attacks or reducing their impacts.
Is cloud-based auto-remediation riskier than on-prem?
CDR offers unbeatable visibility into multi-cloud environments, including networks, storage, services, identities, APIs, virtual machines (VMs), containers, serverless, and all other cloud workload types. CDR’s ability to detect cloud-specific attack signatures increases the potential benefits of automating cloud IR, while manual threat detection in the cloud can be especially cumbersome
But cloud’s complexity and ephemeralness can raise the risk level from automated actions like isolating workloads or shutting down users
Arick Goomanovsky, Vice President of Product Innovation at Tenable Security, emphasizes: “You have to be very careful about what you’re allowing security tools to automatically do in your environment because if you misuse them, or if there is a problem with the tool, the business disruption damages can be pretty harsh.”
“On-premises environments are split between the user environment and the server side,” Arick explains. “In the cloud, it’s all server side—it’s all production and applications. In an on-prem scenario, if I detect malware on a user endpoint, I can automatically shut it down. But you want to be way more cautious on the server side, because of the potential for business disruption from shutting down servers.”
What are the concerns with cloud auto-remediation?
Unanticipated business disruption is just one of the concerns with automated threat remediation in the cloud. Other key issues include:
- AI trust, accuracy, and bias problems.
CDR solutions leverage AI to drive their IR capabilities. AI systems can reach inaccurate conclusions, make incorrect decisions, or generate elaborate false positives, leading security teams to distrust the automation. - Need for human oversight.
Anytime IR is automated, humans should manage the process to ensure that AI-initiated actions are safe, ethical, and aligned with business priorities. Striking the right balance between human and machine effort is an ongoing challenge. - Risk from novel threats.
Both AI systems and associated IR automation need to evolve alongside the relentless introduction of new cyber threats. Otherwise, an organization may hold a false sense of security while its defenses become increasingly vulnerable.
Can CDR support safer auto-remediation?
With its cloud-native roots, CDR offers broad yet fine-grained visibility into multi-cloud environments. This could improve the overall decision-making context so security teams can more safely extend their IR automation
For example, if a VM is suspected of hosting malware, a sophisticated CDR solution might enable an automated response to block some of that machine’s access permissions rather than shutting it down completely. For a suspected privileged account compromise, a CDR tool could similarly block the account’s elevated permissions without taking it completely offline and potentially disrupting key administrative processes
More refined automated actions like these could mitigate a threat’s worst outcomes, while giving humans time to do further investigation. Conversely, a less “cloud-aware” solution might lack visibility into an asset’s permissions or other parameters, reducing auto-remediation choices to either killing the process or not.
When does breach risk outweigh the potential business impacts of cloud auto-remediation?
Solutions to automate IR on the level of shutting down processes (e.g., Tripwire, Okena Stormwatch) have been on the market for about 20 years. But many companies have used them sparingly if at all, especially where revenue generating systems are concerned.
Meanwhile, the risks and impacts of data breaches have increased stratospherically in that same timeframe. In healthcare, for instance, ransomware and other cyberattacks routinely compromise patient care and threaten lives while exposing hundreds of millions of sensitive patient records.
Wouldn’t it be better to ramp up the IR automation in these high-risk environments? Arick recommends “leaning into as much automation as possible, especially in the cloud where everything is automated already.”
As the recent CrowdStrike outage illustrates, the business disruption from security technology failures can be much worse than the parallel risk from cyber threats. But in cloud-native environments, the ramifications of taking down and spinning up virtualized systems can be overall less weighty than with on-premises servers.
Arick advocates a risk-based approach: “You really have to trust those workflows that you build out for those specific use cases. The more advanced organizations have risk ledgers where they map out specific scenarios that are unbearable for the business, where they will enforce automated remediation either pre-breach or post-breach. In other scenarios they’re saying, ‘We’ve got to have a human being in the loop who makes the decision.’”
Increased cloud usage and complexity of cloud is likely to push even less advanced firms in the direction of IR automation, as attempting to analyze and act on cloud cybersecurity data manually becomes increasingly unworkable.
What’s next?
For more guidance on this topic, listen to Episode 148 of The Virtual CISO Podcast with guest Arick Goomanovsky, Vice President of Product Innovation at Tenable Security.