Data mining is the automatic extraction of patterns, associations, and anomalies from large data sets. An interesting application is the use of data mining to detect network intrusions through the analysis of network packet data or computer audit data. With the rise of the internet this timely detection of intrusions and attacks has grown in importance.
A common technique in data mining is to use classification and regression trees (CART) [Breiman, 1984]. A naïve application of CART and related techniques did not prove successful at detecting network. This seems to be due to several factors, the most important of which are:
To address the first problem, as an alternative to CART for this problem, we tried two related algorithms called ART and ACT, which we introduced in [Grossman, 1997b]. Briefly, instead of using a single classification tree to predict whether a given sequence of packets represents an intrusion, in the ACT (Averaged Classification Trees) algorithm, a collection of classification trees are used, and the results are combined. Different variants of the algorithm use different methods for producing the collection of classification trees and for combining the results.
To address the second problem, we used a specialized data management system called an object warehouse [Grossman et. al., 1997a]. This allowed us to run numerically and statistically intensive numerical computations on very large data sets without the overhead incurred with a traditional database, while still enjoying the benefit that a data management system provides of working with data that is too large to fit into memory.
In this paper, we describe we briefly review the ACT algorithm and then describe some experimental results indicating its successful application to the problem of detecting network intrusions. For our experiments, we used the PATTERN™ system developed by Magnify, Inc. The PATTERN system employs a parallel version of the ACT algorithm and is layered over an object warehouse in order to support the mining of very large data sets.
We believe that this brief note makes two contributions: First, we have demonstrated the effectiveness of using data mining to augment the current methods for detecting network intrusions. Second, this application demonstrates the applicability of the ACT and ART algorithms when detecting rare events in very large data sets.
For related work on network intrusion, see [Sundaram]. This approach described here has been applied to other problems in anomaly detection, including detecting fraudulent credit card transactions and predicting delinquency [Grossman and Poor, 1996a] and [Grossman, Bodek, Northcutt and Poor, 1996b]. A preliminary version of this paper appeared in [Bokek et. al., 1997].
In this section, we briefly review the ACT and ART algorithms, following [Grossman, 1997b]. Here is the basic set up:
Build a tree:
An entropy based criterion is often used to determine the splits in Step 2.
The ACT and ART algorithms add two additional steps:
I. Define a cover. A cover U of the objects x consists of a collection of sets U. Each object x is assigned to one or more of the sets Ue U. We allow a single object to be assigned more than once to a single element U of the cover.
II. Build trees. Build a tree TU as usual for the data assigned to each set Ue U.
III. Average the trees. Fix a finite probability measure aU on the cover U:
aU > 0, S aU = 1.
Given an object x, the ART algorithm uses the score
while the ACT algorithm assigns the classification of the class whose score weighted by the measure is largest.
Special cases include: partitioned learning (the cover is a partition) and local-learning (nearest neighbors are used to define the sets U).
Note that if TU is a regression tree, then the sum the
In this section, we follow [Bodek, 1997] and outline the approach we used for detecting network intrusions using ACT:
| ENTRY (36507051) | ||
| [0] | timestamp: | 17:39:04.463226 | 
| [1] | packet from foreign source: | FALSE | 
| [2] | source port: | non-standard (852) | 
| [3] | destination ip 0-7: | 2 | 
| [4] | destination ip 8-15: | 128 | 
| [5] | destination ip 16-23: | 158 | 
| [6] | destination ip 23-31: | 206 | 
| [7] | destination port: | nfsd (2049) | 
| [8] | flag P: | SET | 
| [9] | flag F: | CLEAR | 
| [10] | flag S: | CLEAR | 
| [11] | flag R: | CLEAR | 
| [12] | Packet type: | tcp | 
| [13] | ack number: | 893 | 
| [14] | window length: | n/a | 
| [15] | sequence number: | 3009 | 
| [16] | data length: | 108 | 
Figure 1. This figure contains a sample network packet. Approximately 50 million such packets were analyzed to build 20 models using ACT. These 20 models were used to model the "normal behavior" of the network. Analysis windows containing 100 packets were then compared to these models. Significant deviations were considered to be network intrusions. With this approach, we achieved an accuracy rate of approximately 95%.
In this section, we follow [Bodek, 1997] and describe an experimental study of our algorithm for network intrusions.
For this study, we used a parallel implementation of the ACT algorithm available in Version 1.1 of the PATTERN™ Data Mining System developed by Magnify, Inc. We performed the study during the Supercomputing 96 conference, which took place in Pittsburgh from November 16 to 23. During the conference, the Magnify network was monitored continuously for network intrusions and the results were displayed in real time on the floor of the conference.
Packet sniffers cannot capture all network packets. This is due to the latency inherent in a multitasking operating system. Approximately 60% of packet stream can be captured, with 40% being dropped by the UNIX kernel. During the modeling phase, 50 million packets were modeled, resulting in an data store approximately 4 gigabytes in size. Packets were collected at the rate of 1 million per day. From this data 20 models were generated. The process of classification took approximately 52 hours on a dual processor Sun Sparcstation 20.
Over this period of time, the normal traffic patterns cause a small number alerts. Only when an intrusive action was intentionally taken by the researchers did the alert monitor warn of anomalous patterns. Intrusive actions that correctly caused an anomalous behavior alert included NFS file transfers to the intranet firewall, telnet sessions passing through the firewall to the network file server, and X windows traffic displaying applications outside of the subnet. It should be noted that similar traffic that was modeled during classification did not cause alerts, for instance, the display of X windows applications within the subnet, or NFS packets passing between nodes on the intranet. The accuracy was over 95% given an analysis window of 100 packets. The misclassification was about 15%.
In this paper, we have described a data mining an approach for detecting network intrusions using ACT. Operating systems and networks produce large amounts of audit data. Hidden with this data are patterns indicative of anomalous behavior, including network intrusions. With the appropriate data mining algorithms, these patterns can be extracted automatically to flag such behavior.
The problem is difficult due to the large size of the audit logs and the rarity of the intrusions. The ACT algorithm appears to perform well in these situations. By using 20 models instead of one model, the ACT algorithm increased its accuracy rate and decreased its rate of false positives.
Although the system is very young, the capabilities demonstration which took place during the Supercomputing 96 conference showed the practicality and effectiveness of this approach.
Bodek, H., R. L. Grossman, and I. Pulleyn (1997). The Data Mining and Analysis of Packet Data for Detecting Network Intrusions: Preliminary Results. Proceedings of the First International Conference on The Practical Application of Knowledge Discovery and Data Mining, Practical Application Company, Ltd, pages 91-95.
Breiman, L, J. H. Friedman, A. Olshen, and C. J. Stone (1984). Classification and Regression Trees,. Wadsworth, Belmont, California.
Grossman, R. L. and H. V. Poor (1996a). Optimization Driven Data Mining and Credit Scoring. Proceedings of the IEEE/IAFE 1996 Conference on Computational Intelligence for Financial Engineering (CIFEr), IEEE, Piscataway, pages 104-110.
Grossman, R. L., H. Bodek, D. Northcutt, and H. V. Poor (1996b). Data Mining and Tree-based Optimization. Proceedings of the Second International Conference on Knowledge Discovery and Data Mining, E. Simoudis, J. Han and U. Fayyad, editors, AAAI Press, Menlo Park, California, pages 323-326.
Grossman, R. L., S. Bailey and D. Hanley, (1997a). Data Mining Using Light Weight Object Management in Clustered Computing Environments, Proceedings of the Seventh International Workshop on Persistent Object Stores. Morgan-Kauffmann, San Mateo, pages 237-249.
Grossman, R. L. (1997b). ACT and ART: Averaging Classification and Regression Trees for Scalable Data Mining. Submitted for publication.
A. Sundaram. An Introduction to Intrusion Detection. Submitted for publication.
A draft of this technical note was distributed at the following events: