Gremlin Releases “Scenarios” That Simulate Outages, Helping Companies Avoid Costly Downtime

Scenarios make it safe and easy for engineering teams to simulate real-world website and software failures, in order to proactively prepare for disaster.

San Jose, California, UNITED STATES

San Francisco, Sept. 26, 2019 (GLOBE NEWSWIRE) -- Gremlin, a fully hosted Chaos Engineering service founded by engineers from Netflix and Amazon, today launched Scenarios that make it safe and easy for companies to simulate real-world outages that lead to costly downtime. Similar to a flu shot, Chaos Engineering is the practice of purposefully injecting failure into internet systems, with the goal of identifying weaknesses before they can cause serious harm.

“Since we released Gremlin Free back in February, thousands of customers have signed up to get started with Chaos Engineering. But many organizations are still struggling to decide which experiments to run in order to avoid major downtime and outages,” said Kolton Andrus, CEO and Co-Founder of Gremlin. “That’s exactly why we’ve added templates of real-world outages into our application. Now Gremlin users can easily simulate major failures with a couple of clicks, and validate their systems’ ability to withstand them.”

Out-of-the-box Scenarios

Manage Traffic Spikes
Traffic spikes are one of the most common reasons websites fail. This scenario allows DevOps teams to progressively add CPU load from 10-100% on selected hosts, in order to fine-tune thresholds and test failover architectures.  

Example Outage: Black Friday Technical Issues Cost Retailers Billions

Unreliable Networks

Migrating to microservices relies heavily on frequent and responsive API calls. This scenario makes sure customers are not affected when supporting API calls takes longer to respond.

 Example Outage: Target Stores Suffer Network Outage

Region Evacuation

Starting with one cloud region is natural -- but it’s a single point of failure. This scenario allows businesses to confirm that their services are available in more than one region, and that customers will not be negatively impacted when their traffic is diverted.

Example Outage: AWS celebrates Labor Day weekend by roasting customer data in US-East-1 

Gartner, a global research and advisory firm providing information, advice, and tools for businesses in IT, named Gremlin as a leader in Chaos Engineering in their report Hype Cycle for DevOps 2019. Cowen Equity Research, which provides specialized research and investment analysis, called Gremlin a “category creator” that is “taking Chaos Engineering to the next level” in their report DevOps: The Next Frontier of Automation. DORA (DevOps Research and Assessment) also recently published their State of Devops 2019 report sponsored by Google. In the report, Mike Garcia, VP of Stability and SRE at Capital One, states that “in a heavily regulated industry like banking, we have obligations that require us to prove our level of resiliency in our responsibility to meet the needs of our customers. The idea is to progressively show more advanced capabilities and automatic resiliency through more sophisticated chaos-testing techniques.”


Quotes of Support

Gremlin is the leader in helping businesses avoid disaster by proactively testing their systems. In a world where nearly every business is an online business, Gremlin makes companies more resilient and saves millions of dollars in unnecessary disasters and outages. -- Tomasz Tunguz, Managing Director at Redpoint Ventures

Scenarios feel like an important step in the natural evolution of chaos. Replicating isolated failures will always be helpful, but Scenarios provide the means to ratchet up pressure on our systems in ways that more closely mirror the complex, orchestrated failure states we observe in production environments. -- Matt Simons, Senior Engineering Manager at Workiva

Some organizations will push past Chaos Engineering tools such as Chaos Monkey and Chaos Kong, and move to Gremlin to perform precise experiments on their path to improving resiliency through orchestrated chaos. It's through exploration of the impact of increased latency and methodical failure of specific services that service teams will build confidence in their system's capability to withstand turbulent conditions in production-and begin to sleep more soundly in 2019. -- Lee Calcote, Head of Technology Strategy at SolarWinds



Contact Data