A “False Positive” (FP) error occurs when a security system misinterprets a non-malicious activity as an attack. These errors are a critical issue for cybersecurity today.
Although it might seem that FP errors do not necessarily have serious consequences, incorrect security alerts can lead to significant monetary losses. For example, an ecommerce site might incorrectly exclude real online shoppers. Or, it might reject ‘good’ web crawlers, which would reduce its Internet visibility.
FP errors can also have longer-term consequences. For example, many modern security solutions have training periods for developing filtering rules; some even ‘train’ continuously (i.e., they are always analyzing and adapting to current threat conditions). If unrecognized FP errors occur during training, then the rules which caused them will be incorrectly considered as “good,” and will be used as the foundation for future traffic processing — and possibly even future rule development. This can produce cascading error rates.
A further complication arises from the relationship between FPs and FNs (False Negatives, i.e. attacks that go undetected). When attack-detection thresholds are adjusted to minimize FPs, this tends to increase FNs.
Also, the two types of false-alarm errors are asymmetric in their consequences. Generally, FNs incur much higher costs. Therefore, effective FP reduction might actually increase the overall monetary losses from false alarms.
Some analysts have attempted to solve this problem from a business perspective. For example, one study suggested1  that attack-detection thresholds should be adjusted to achieve an optimal ratio of FPs to FNs, where “optimal” refers to the lowest total cost of errors. However, such calculations require knowing the precise monetary costs for both FPs and FNs across an organization. It is unrealistic to expect these numbers to be known accurately, or to expect that these values will not change over time.
The best approach to false-alarm errors is twofold: to aim for the complete elimination of FNs, while also not settling for a “good enough” rate of FPs. The rate of False Positives can, and should, be driven down as close to zero as possible. This article discusses current progress toward achieving this latter goal.
Using Machine Learning in cybersecurity
Attack technologies continue to evolve. Therefore, cybersecurity systems must be able to identify new patterns of malicious traffic and intrusion attempts. One of the most promising ways to do this, while simultaneously minimizing False Negatives and False Positives, is to use machine learning (ML).
There are two main category of ML algorithms: traditional ML (also known as Shallow Learning, or SL), and Deep Learning (DL).
Shallow Learning requires a feature engineer who can extract and represent relevant parameters from the data before execution occurs. It includes both supervised algorithms (which require a large, pre-classified dataset for training), and unsupervised algorithms (which do not require pre-classified data).
Supervised SL algorithms:
- Shallow Neural Network
- Hidden Markov Models (HMM)
- K-Nearest Neighbour (KNN)
- Random Forest (RF)
- Naïve Bayes (NB)
- Logistic Regression
- Support Vector Machines (SVM)
Unsupervised SL algorithms:
Deep Learning consists of a complex multi-layered hierarchical system, and it requires less manual feature selection than SL. It also includes supervised and unsupervised algorithms.
Supervised DL algorithms:
- Fully-connected Feedforward Deep Neural Networks (FNN)
- Recurrent Deep Neural Networks (RNN)
- Convolutional Feedforward Deep Neural Networks (CNN)
Unsupervised DL algorithms:
- Stacked Autoencoders (SAE)
- Deep Belief Networks (DBN)
Note that for both SL and DL, a variety of algorithms are available, each with its own strengths and weaknesses. As we shall see, determining the optimal algorithms for specific situations is an area of intense research today.
Machine Learning and Deep Learning often provide better solutions than traditional computing algorithms. Thus, they have become very popular in many industries, including cybersecurity.
Many researchers are investigating the use of SL and DL for cybersecurity in areas such as intrusion detection, malware analysis, false positive detection, spam identification, and phishing detection. Examples of recent research include:
- Comparing the performance of Random Forest (Shallow Learning) to that of Fully-connected Feedforward Deep Neural Networks (Deep Learning) for intrusion detection. 
- Using multi-layer Deep Neural Networks to predict attacks on Network Intrusion Detection Systems 
- Comparing the accuracy of different ML algorithms when identifying malicious URLs 
- Comparing various shallow and deep networks for traffic analysis with flow-based features 
- Using a hybrid Deep Belief Network for malicious code detection 
Among other results, these studies have shown that all approaches (whether ML or DL) require training and careful parameter tuning. As some researchers commented, “We can anticipate that, at the current state-of-the-art, no algorithm can be considered fully autonomous with no human supervision.” 
Evaluating the effectiveness of False Positive alarm reduction
When comparing different approaches for reducing FPs, several issues must be considered. The first is to ensure that a given technique is effective, regardless of the threat-detection model in which it will be used.
Within cybersecurity, there are two general approaches for threat identification: misuse detection (also known as “blacklisting” or “negative security”), and anomaly detection (“whitelisting” or “positive security”). For misuse detection, a security system looks for usage that is consistent with known patterns of malicious activity. For anomaly detection, the security system looks for deviations from normal usage patterns, and anomalies are treated as evidence of hostile intent.
Each approach has its advantages, but also has weaknesses. Misuse detection cannot detect new attack types. And as ‘normal’ usage patterns evolve, anomaly detection generates high rates of False Positives.
To mitigate these weaknesses, modern security solutions often use both approaches simultaneously. This means that optimal algorithms for FP reduction must be effective under both of them.
The second issue for evaluating FP reduction methods is more straightforward — the need to quantify a technique’s effectiveness. Here, two metrics are important: False Positive Rate (FPR) and Accuracy.
FPR specifies the ratio between the number of negative events wrongly categorized as positive, and the total number of negative events (both true and false). This rate should be as close to zero as possible.
False Positive Rate = FP / (FP+TN)
The second metric is accuracy. This is a proportion between the sum of detected negative and positive events and the sum of the total number of events.
Accuracy = (TP+TN) / (FP+TP+FN+TN)
Obviously, when evaluating a proposed technique for FP detection, it should have an accuracy that is better than (or at least equal to) the baseline. In other words, the reduction of FPs should have minimal impact on FNs. Otherwise, the technique cannot be considered effective.
Methods for reducing False Positive alarms
FP alarm reduction is an area of ongoing investigation. Here are some examples:
- Within an Intrusion Detection System (IDS), parameters such as connection count, IP count, port count, and IP range can be tuned to suppress false alarms.
- False alarms can also be reduced by applying different forms of analysis. One study found that when detecting user-to-root attacks and vulnerability probes, an IDS can minimize False Positives by using rule-based classification. However, when detecting other types of malicious traffic (such as DoS and remote-to-user attacks), using Decision Trees will produce fewer FPs. 
- A data mining technique based on a Growing Hierarchical Self-Organizing Map (GHSOM) neural network model reduced false positives from 15 percent to 4.7 percent. 
- A two-stage alarm correlation and filtering system using SOM neural networks and K-means clustering reduced false alarms by 87 percent. 
Moving from theory to practice
Many promising techniques for False Positive reduction have been, and continue to be, investigated. However, implementation for real-world usage adds some additional challenges.
It has been shown that FPs can be successfully reduced with techniques such as data mining and clustering. But in practice, the amount of mislabeled data that is produced also has a strong dependence on IDS parameters, which in turn are related to the security needs of the individual organization. Therefore, for effective cybersecurity, organizations must acknowledge that their IDS and other infrastructure will require some customization.
Another important consideration is the timing of the analysis. Most FP reduction techniques are not performed during the real-time detection of attempted intrusions; instead, they are applied afterwards to collections of traffic data and generated alerts. Of course, FP detection in ‘offline mode’ is tremendously useful. Nevertheless, there is room for improvement here; there is still a need for automated techniques that reduce False Positives in real time.
Research into FP reduction will continue for the foreseeable future. Even if a zero rate of false alarms were actually possible, it obviously has not yet been achieved. It is also not yet possible to fully automate the reduction of false alarms. Most of the current techniques to reduce them are still dependent on a human analyst in various ways.
Lastly, the need for FP reduction has never been greater. IDS systems can generate thousands of alerts per day, straining the capabilities of human analysts. As Internet traffic continues to increase, effective and automated False Positive reduction will become even more critical than it already is today.
This article has discussed the problem of False Positive errors, and research into reducing them. In the next article (Using Machine Learning to Reduce False Positives, Part 2: Optimizing Outcomes), we’ll discuss how Reblaze addresses this challenge, including discussion of some additional techniques not included above.
1. S.-Y. Lee, B.-H. Lee, Y.-D. Kim, D.-M. Shin, C. Youn A Neural, “The Design and Implementation of Anomaly Traffic Analysis System using Data Mining”, International Journal of Fuzzy and Intelligent System,(2008)
2. G. Apruzzese, M. Colajanni, L. Ferretti, Al. Guido, M. Marchetti, On the Effectiveness of Machine and Deep Learning for Cyber Security, 10th International Conference on Cyber Conflict (2018)
3. R. Vigneswaran, R. Vinayakumar, K. Soman and P. Poornachandran, Evaluating Shallow and Deep Neural Networks for Network Intrusion Detection Systems in Cyber Security, International Conference on Computing, Communication and Networking Technologies (ICCCNT),(2018)
4. R. Vinayakumar, K. P Soman & P. Poornachandran, Evaluating deep learning approaches to characterize and classify malicious URLs, International Conference on Computing, Journal of Intelligent & Fuzzy Systems,(2018)
5. R. Vinayakumar, K. P Soman & P. Poornachandran, Secure shell (ssh) traffic analysis with flow based features using shallow and deep networks, In Advances in Computing, Communications and Informatics (ICACCI),(2017)
6. Y. Li, R. Ma, and R. Jiao, A hybrid malicious code detection method based on deep learning, In Advances in Computing, International Journal of Security and Its Applications,(2015)
7. G.J. Víctor, S. R. Meda, V.C. Venkaiah, False Positives in Intrusion Detection Systems,https://www.academia.edu/1431396/False_Positives_in_Intrusion_Detection_Systems
8. N. B. Anuar, H. Sallehudin, A. Gani, O. Zakari, Identifying False Alarm For Network Intrusion Detection System Using Hybrid Data Mining And Decision Tree, Malaysian Journal of Computer Science, (2008)
9. N. Mansour, M.I. Chehab, A. Faour, Filtering intrusion detection alarms, Cluster Computing, Springer, (2010).
10. G.C. Tjhai, S.M. Furnell, M. Papadaki, N.L. Clarke, A preliminary two-stage alarm correlation and filtering system using SOM neural network and K-means algorithm, Computers & Security, (2010)
image credit: Kevin Horvat