As discussed previously in this series, hostile bots comprise almost 40 percent of all web traffic today, on average. Bots are used in a wide variety of attacks, and cause problems across different industries.
The most recent article showed why traditional methods of bot detection are no longer effective for identifying all forms of automated traffic.
In this article, we’ll discuss current detection methods for bot management that is effective against even the latest and most sophisticated bots.
What’s working today
In combination with the traditional detection methods described previously, a modern bot mitigation solution also incorporates newer techniques.
Traditional detection methods rely on many metrics which are passive (e.g., resource consumption rate), where the IDS (Intrusion Detection System) waits to see what the requestor will do. A better approach is to proactively verify the authenticity and behavior of the requestor.
The techniques described below are not mutually exclusive. In any given situation, the IDS should use as many of them as possible. (Reblaze includes the methods described below, along with many other proprietary techniques not included here.)
Thanks to globally-available third-party server certificates, clients can be assured that when a connection is established, it has been made to the correct, legitimate endpoint. The next logical step is to provide mutual authentication by requiring that client certification be used as well.
This is common in some industries (e.g., FinTech), but is still rare in others. Reblaze provides a client certification mechanism within our platform, and we encourage our customers to use it.
Other Client Authentication
Along with client certificates, clients can be authenticated in other ways, e.g. native/mobile applications can have built-in methods of communicating with the endpoints and verifying their legitimacy.
At Reblaze, we provide an SDK (for both Android and iOS) to our customers, who rebuild and publish their applications with the SDK embedded. In use, it signs the application, authenticates the device, and verifies user identity. All communications occur over TLS and include an HMAC signature (a cryptographic identity mechanism on the client side) to harden communications between the mobile/native application and the microservice/API endpoint. The signatures are non-reproducible, non-guessable, non-repeating (they are unique per session, and sometimes, per request), and are based on dozens of parameters (time-based, location-based, environment-based, and more). They provide a reliable, secure mechanism to verify that the packets are originating from a legitimate user, and not from an emulator or other bot.
Advanced Browser Verification
As a result, some security solutions are now using more rigorous challenges for attempted usage of web applications. For example, a WAF can respond to an incoming request with a JS-based arithmetic challenge. If the client “browser” cannot solve the challenge correctly, it is deemed to be headless.
Other ways of testing browser capabilities can also be used; for example, the browser can be asked to render a sound or an image.
As part of its advanced environmental detection, Reblaze has developed many additional proprietary techniques. (Some are based on JS, while others are not.)
UEBA: User and Entity Behavioral Analytics
UEBA describes a process where data about user behavior is fed back into the intrusion detection process. An IDS uses UEBA to establish a baseline for legitimate user behavior, comparing subsequent users to this baseline in order to assess hostile intent.
UEBA is an approach, not a specific technique. Some security solutions are not yet using it at all; others have adopted it in basic form, while a few are applying it in sophisticated ways.
In its basic form, UEBA contrasts simple metrics from a current user to baseline values.
For example, a web application might require textual entries from users. Human typing tempos range from hunt-and-peck up to 150+ words per minute, generally at an irregular tempo. If a “user” enters long strings at a rapid, mechanically regular pace, or quickly enters strings into several different fields, then it’s probably a bot.
Other interface metrics can be monitored as well. For example, if the same pixel is clicked in multiple checkboxes in rapid succession, and without corresponding mouse movements, this is probably not a human user.
Biometric Behavioral Profiling and Machine Learning (ML)
Many modern bots can mimic human behavior to varying degrees. Some are capable of evading detection by basic UEBA.
To defeat these bots, more advanced uses of UEBA are required. One of the most effective approaches today is to use Machine Learning (ML) to construct behavioral profiles based upon biometrics: measurable characteristics of biological (i.e., human) users.
Conceptually, this is the same idea as basic UEBA: it compares the characteristics of current requestors to predetermined criteria which define ‘good’ users. In practice, it is much more complicated than the examples described earlier. For each protected API or application, comprehensive profiles of legitimate human behaviors are compiled. In this context, “comprehensive” refers to using all the data that is available.
As an illustration, here is how Reblaze does this. Every http/s request that Reblaze receives is anonymized and then analyzed according to numerous factors, including (partial list):
- Device data: the user’s hardware, its screen resolution and orientation, the software being used, etc.
- Session data: the number of requests, the frequency of requests, the number of IPs used, how often they repeat, length of requests, etc.
- User interface events: mouse movements, clicks, zooms, taps, scrolls, etc.
- Consumption analytics: pages viewed, time spent, resources requested, etc.
- Application-specific events and other results of user actions.
- And more.
After an initial learning period, the platform understands the patterns, typical values, and common relationships of these metrics for legitimate users of each protected application and API.
The amount of data that Reblaze processes (over four billion requests per day) is far beyond the capability of human analysts. Therefore, cloud-based compute resources are used, applying ML in order to recognize patterns that analysts could not have identified on their own, or for which they might not have thought to look.
Unlike the basic UEBA described earlier, biometric behavioral profiling usually produces weighted variables. In other words, instead of enforcing a list of static rules which modern bots can evade (e.g.,“if text entry rate > 150 wpm, deny requestor as bot”), behavioral profiling monitors the combined relative weights of a list of behavioral metrics for each current requestor. If a requestor’s combined ‘score’ ever goes above a preset threshold, that requestor is identified as a bot and blocked.
UEBA is meant to detect anomalous behavior of users and machines by comparing current metrics against a baseline of ‘normal’ behavioral events. The analysis is focused on finding anomalies and deviations from what has been established as normal and safe.
This approach is very different than the usual paradigm that older web security products were based upon (i.e., a WAF that enforces static rulesets). Thus, a number of security solutions still do not offer UEBA today.
Behavioral profiling based on Machine Learning can, and should, be performed as granularly as possible: not only per app, but even down to individual pages, screens, and so on. This can be extremely powerful.
For example: if a mobile/native app displays a map, and a high percentage of legitimate users zoom into it as their first action, then API users who do not submit zoom actions are more suspect. On a retail site, if legitimate visitors to a product page often scroll to a certain part of the page (perhaps to confirm that there’s a money-back guarantee) before choosing “add to cart”, then visitors who do not scroll are more suspect. And so on.
Of course, in these examples, not every legitimate user or visitor will perform the actions noted. That’s why these are weighted factors, not binary decisions. But when enough “bot vs. human” indicators accumulate, high levels of accuracy can be achieved.
Using this approach, IDS accuracy is not only high, it is also robust and resistant to reverse-engineering by threat actors. Behavioral profiles are constructed based on private analytics data, and threat actors have no realistic way of obtaining this information.
Plus, unlike the basic UEBA examples discussed previously, many of the legitimate behavioral patterns will be non-obvious. ML often reveals useful patterns and relationships which few human analysts would have even considered.
Summary: Modern Bot Detection
As discussed previously, tracking a requestor based on traffic source (IP and geolocation) is no longer reliable. Today, each requestor’s identity and behavior must be treated as fundamentals.
Behavior is especially important. Even users which are demonstrably human will not necessarily have benign intentions. But all hostile requestors (whether bot or human) must, at some point, deviate from legitimate user behavior. Once they do, behavioral profiling will identify them.
The final article in this series will discuss the future of bot protection: the frontiers of current research, and new techniques for achieving optimal business outcomes.
This article is part 5 of a six-part series. You can download the complete report here: 2019 State of Bot Protection.