Tuesday’s large-scale internet outage brings to mind a major problem for the web today—a vulnerability that your organization is probably exposed to, even if you don’t use Fastly.
First, some background. As you probably already know, on Tuesday many major websites—including Amazon, Reddit, the Guardian, and the New York Times—were down for about an hour, starting at about 6 AM Eastern time. Visitors received errors such as “connection failure” or “Error 503: Service Unavailable.”
The problem was caused by a configuration error at Fastly, the popular “edge cloud” platform that many major sites rely upon. According to Fastly, there was a bug introduced in a mid-May update, which was triggered on Tuesday when a single customer made a “valid configuration change” to their settings. The problem cascaded instantly across Fastly’s global clusters, causing “85% of [the] network to return errors.”
Fastly handled the incident well, identifying and correcting the problem in less than 50 minutes. They also took full blame for it, and a senior engineering executive said, “we’re truly sorry for the impact to our customers and everyone who relies on them.”
There have been numerous articles and post-mortem discussions about this event, many of which are questioning whether the Internet has become too dependent on a small number of vendors. But this isn’t the question that we should be asking.
The problem isn’t that a small number of vendors provide services to a large number of customers; the problem is the architecture that many of them use.
And whether or not your organization uses Fastly, you’re probably vulnerable to a similar incident from a different source—your security provider. Almost all security solutions today use an architecture that has the same weakness as Fastly’s. They don’t provide dedicated instances to each customer; instead, they offer multi-tenant systems. And this means that a problem originating at one customer can, and usually will, affect many others, just like we saw on Tuesday.
The Problem with Multi-Tenancy
Fastly’s network is a distributed system, and in some respects, it doesn’t necessarily match the definition of pure multi-tenancy. Nevertheless, it clearly includes one major aspect, because its customer instances are not isolated entities. In this incident, configuration changes to one local node quickly poisoned others across the globe.
This outage is the kind of event that many organizations exert significant effort to avoid, by placing a lot of emphasis on redundancy and resiliency. They have at least one source of backup power, plus UPS, for their IT systems. They also observe strict protocols for regular backups. Some even have more than one main internet line, running to different exchange points.
We even see this approach being used in sites and web applications. Multi-cloud architectures are growing increasingly popular, often because the organizations want to distribute their systems across different service providers so that risks are reduced.
But despite all this effort, many organizations still have a single point of failure for one vital function: web security. They entrust their traffic filtering to security vendors whose solutions run on external, multi-tenant infrastructure.
The Problems with Multi-Tenant Security
Security solutions that process customer traffic on shared infrastructure expose those customers to several potential problems.
One of them is similar to the Fastly incident: cross-customer vulnerability. When many customers have their traffic processed on the same infrastructure, then an attack on one customer can affect the others as well.
This is especially true for DDoS events. A large DDoS attack on one customer can degrade performance and availability for all the others.
Note that this is not a hypothetical situation. At Reblaze, some of our customers first approached us when their sites and applications were affected by attacks that weren’t even targeted at them. As you can imagine, these events are extremely frustrating to experience: the attack traffic does not appear in your analytics or logs, so you don’t know what is happening, and you don’t know how long it might continue. Worst of all, there’s nothing you can do to fix the problem (except for switching to a new security provider).
Of course, for most providers, incidents at this scale don’t happen very often. Nevertheless, they do occur, and they can be very expensive (in lost potential revenue, damaged reputation, etc.) for their customers.
In addition to that occasional problem, there are other problems that are chronic and unavoidable, because they are inherent to multi-tenant security. For example, these solutions compromise your data privacy, because your traffic is decrypted and processed outside your environment. They also degrade the performance of your sites, services, and applications, because your traffic is not routed directly to your environment (it goes to an external solution first, which adds routing latency), and it has to go through an extra cycle of decryption and re-encryption (which adds processing latency).
How to Avoid These Problems
The problems above can be avoided by adopting a single-tenant security solution: one that is isolated from other instances, dedicated solely to processing traffic for your environment, and is deployed within your environment.
Reblaze is a cloud native web security platform that offers a full suite of security technologies (next-gen WAF, DDoS protection, advanced bot management, Machine Learning-based UEBA, biometric human verification, Account Takeover prevention, API security, and more), all in a unified, single-tenant solution.
Reblaze runs inside your environment, with a dedicated, isolated instance in each VPC. It can also deploy within container-based architectures and service meshes. For more information on Reblaze, or to get a demo, contact us here.
Also, in addition to the material above, there’s more that could be said about the importance of dedicated single-tenant security. For a more in-depth discussion, see Single-Tenant vs. Multi-Tenant Web Security, Part 1: Protection and Privacy and Part 2, Performance and Cost.