Toward Feasible Machine Learning Model Updates in Network-based Intrusion Detection
Authors: Pedro Horchulhack, Eduardo Viegas, Altair O. Santin
Abstract: Over the last few decades, a plethora of studies have proposed highly accurate machine learning (ML) techniques for network-based intrusion detection systems (NIDS), which are hardly used in production environments. In practice, current proposals for ML-based NIDS are unable to cope with changes in the network traffic behavior over time, requiring frequent and difficult model updates to be frequently conducted. In this study, we propose a new stream learning intrusion detection model with delayed model updates to ease the model update task, which is implemented twofold. First, our model maintains the intrusion detection accuracy, even with outdated underlying ML models, through a classification assessment approach, in a classification with a rejection rationale, thus suppressing potential misclassifications owing to new network traffic behavior. Second, rejected instances are stored for a period of time and used for incremental model updates. As insight, old rejected instances can be easily labeled through publicly available attack repositories without human assistance. Experiments conducted in a novel dataset spanning a year of real network traffic with over 2.6 TB of data have shown that current techniques for intrusion detection are unable to cope with the evolving behavior of network traffic, significantly degrading their accuracy over time. In addition, the proposed model can maintain its classification accuracy for long periods of time without model updates, even improving the false-positive rates by up to 12%, while rejecting only 8% of the instances. By contrast, if periodic model updates are conducted, our proposal can improve the detection accuracy by up to 6% while rejecting only 2% of the instances, demanding only 3.2% of the computational time during the model updates.
Dataset Download:
2014_balanced.zip
2014_unbalanced_1_percent.zip