In the past, web security solutions were based on the negative security model: they allowed all incoming HTTP/S requests through, except those that match predefined criteria for exclusion (i.e., threat signatures). More recently, some solutions have also adopted the positive security model: they deny access to all incoming traffic except those that match predefined criteria of legitimacy.
The positive model is harder to implement, but it’s now a must-have due to fundamental changes in the threat landscape. Attack vectors on web assets can be highly targeted, and they can have many variants; with the incorporation of IoT and other edge devices into apps and services, more than half of all endpoint attacks are fileless and unique. Their sources can be widely varied, and rapid IP rotation has become common. Their potential targets are extremely diverse, and attack surfaces have become highly distributed, ranging from corporate servers to employee smartphones.
These and other factors make it extremely difficult to maintain a comprehensive database of threat signatures in real time.
In this article, we discuss why the negative security model is limited, explore the positive security model in more depth, and discuss how web applications and APIs are not adequately protected if your web security solution does not have an effective positive security model.
Limitations of the Negative Security Model
A negative security model is only as good as the database that defines what is “bad.” Your security team must walk a thin line between keeping your web assets secure and keeping them accessible to their intended audience. As a result, negative security rules tend to be basic and limited to blocking obvious attacks.
Therefore, for attacks to succeed, attackers need only a minimal amount of creativity. They need only to vary an attack just enough so that it no longer matches your database of threat signatures, or so that it avoids recognition in other ways. (For example, rate limiting can be evaded by rotating IP addresses.)
Furthermore, the negative security model can only protect against certain types of attacks. Among the OWASP Top 10 risks, three of them (A2 [Broken Authentication], A5 [Broken Access Control], and A7 [Cross-Site Scripting]) are not effectively covered by a negative security approach. And even those risks that could be covered by a negative security model rule, such as A1 [Injection], are often not implemented in enough depth to provide truly robust protection.
Lastly, a negative security model has a weakness related to administration. When a new attack vector arises, criteria for recognizing it must be added to the security solution. There is usually a delay between the threat’s identification and its addition to the solution. Meanwhile, the solution is ineffective against this threat, and the protected network is vulnerable.
Summary: on its own, a negative security model is not going to provide the protection that you seek against the volume, velocity, and sophistication of attacks on your web assets.
The Positive Security Model
Positive security is sometimes referred to as the “whitelisting model.” In a general sense, this is true; one can think of positive security criteria as a whitelist of allowable characteristics. However, in a practical sense whitelisting usually refers to something more specific, as we’ll discuss below. In either sense, whitelisting plays a large role in positive security.
Whitelisting can be applied to different application security layers. For example, when validating inputs, the positive security model creates a whitelist of characteristics that make an input allowable (rather than trying to filter out bad input, as in a blacklist approach). When controlling the access of resources or functions to web apps, the positive security model denies access to all resources or functions except those that have been specifically authorized in a whitelist.
The characteristics used to validate input data include data type, structure or syntax, input length, permitted characters, numbers correctly signed, or valid value ranges. In the example below from AWS, we see how the AWS WAF service applies a whitelisting rule that defines www.example.com as a valid referrer. In this illustration, only request HEADERs that have a matching referrer will be allowed access to the CloudFront CDN.
Figure 1: Whitelisting good users based on valid referrer. Source: AWS Presentation
However, a major challenge of positive security models in general (and for whitelisting specifically) is how to reduce false positives. It is very difficult to anticipate all of the rules that would define acceptable behavior, and it’s quite possible that atypical behaviors that are not threats will be encountered. In the example above, a user accessing the CDN from a previously unknown but benign referrer would be denied access — and the app owner may have just lost a customer.
It all comes down to how to accurately define “normal” — and then keep that definition updated throughout the application lifecycle. The initial definition is based on a deep understanding of the app and the ecosystem within which the app executes. After deployment, production logs and other monitored metrics are used to fine-tune the criteria that define normal behavior.
There’s more on this below in the section about Machine Learning. First, schema enforcement should be discussed.
One specific method of input validation is growing increasingly important today: enforcing web and API security schemas.
Regardless of the communication interface that’s used, both sender and recipient expect the data to be structured in certain ways. Schemas are ways to formally specify those structures. This makes them ideal for a web security solution to use for validating inputs: if data is submitted that conforms to the schema (in its length, number and sequence of elements, data type, etc.), it is accepted. Otherwise, it is rejected.
This idea is straightforward, but its implementation is not. For a security solution to do this, it must be able to read and comprehend schemas in a wide variety of possible formats, while also being able to enforce whatever data specifications are found within.
Many WAFs (Web Application Firewalls) cannot do this at all. Some can do it, but only with a lot of manual configuration and customization. This is better than nothing, but it is still of limited usefulness; it prevents the rollout of any changes to APIs (and thus, the publishing of new apps or significant updates to existing ones) until the WAF is manually reconfigured and tested. In today’s fast-paced era of DevOps, this creates serious problems.
So, it’s not enough to accept the claims from WAF vendors (such as AWS WAF) that their products support schema enforcement. A WAF (Web Application Firewall) also must have the ability to automatically ingest schemas, without a lot of time and manual labor being necessary.
The breadth of schema support is also an important issue. When considering a web security solution, be sure to ask about this. (JSON payload support in particular is still a problem in the industry. Few WAFS provide it.)
Behavioral Profiling and Machine Learning
Behavioral profiling could be called the ultimate form of input validation. It goes beyond examination of the characteristics of the incoming requests (as validated by whitelists) and the allowable structures for data (as defined by schemas). Behavioral profiling is a validation of the requestor’s behavior, by comparing it to profiles built over time from users known to be legitimate.
Relevant profile data includes everything from straightforward analytics (such as the typical environments for legitimate users: their devices, browsers, and so on), all the way to more subtle metrics such as:
- On which web pages do typical users tend to linger? Which ones do they tend to leave quickly? Which entry pages indicate a high likelihood that the visitor is legitimate? Which ones indicate the opposite?
- For pages with a specific CTA (call to action), do legitimate visitors tend to perform other activities first? (For example: scrolling back up to re-read the offer, or watching a product video, or visiting the FAQ page and then returning, etc.) Are these activities usually done in any particular order?
- Which buttons tend to get clicked/tapped a short time after the page loads? For which ones is there usually a delay? Which ones usually will be preceded by page scrolls up and down, and to what degree? Etc.
As you can imagine, this information can be very useful for securing your web assets, and especially for identifying hostile bots that are trying to hide within your web traffic.
Modern bots have grown very sophisticated. They can emulate human users in most ways (in their source IPs, their “browser” environments, their submission of events such as mouse clicks, and so on). But when you create deep behavioral profiles for your web applications and APIs, it creates an insurmountable challenge for threat actors: to evade detection, they must create and fine-tune their bots very precisely, so that the bots conform to behavioral profiles which the threat actors do not have. (And they have to do this separately for each targeted application!)
Obviously, behavioral profiling is a tremendously powerful tool for web security. But this power comes at a cost: no human could possibly create the profiles. That’s where Machine Learning (ML) comes in. ML is the subset of artificial intelligence that allows machines to learn and predict outputs without specific prior programming.
Earlier it was mentioned that whitelisting validation criteria can be fine-tuned with production logs and other traffic metrics. The cutting edge of web security today is to process all incoming traffic with ML-based analysis (some of it in real time, the rest of it by reading production logs), to construct behavioral profiles that are then used for subsequent user validation.
Done correctly, this is a continual process; an initial training period creates a baseline, and then the security platform continually adapts and adjusts to changing traffic patterns as necessary over time. This not only means that the security platform will recognize and harden itself against changes in attack traffic, but also will notice changes in legitimate usage (for example, of an API). This can be advantageous not only when making business decisions, but can also accelerate operations overall (because the WAF can notify staff of the new traffic patterns, and offer to construct and enforce new security rulesets automatically).
Given the sophistication of cybersecurity attackers these days, a purely negative security model cannot provide adequate protection. However, a purely positive model isn’t ideal either, due to the potential for false positives and other issues.
For robust protection and optimal performance, a hybrid approach is best. Negative security can be used to detect and block a large portion of hostile traffic with relatively little processing workload. Whitelisting and input validation can exclude an additional amount, with the most intensive analysis (ML-based behavioral profiling) reserved for only the requests that pass the earlier scrutiny.
Indeed, this is the approach that Reblaze uses: a next-generation WAF, along with DDoS protection and advanced bot management, which uses Machine Learning for web security in order to protect web applications and provide API security.
Much more can be said about how Reblaze uses the techniques discussed above to accurately scrub HTTP/S traffic. See for yourself how you can integrate Reblaze into your security stack for comprehensive machine-intelligent web security.
image credit: Nicolao Negrello