Detecting Network Intrusions through the
Data Mining of Network Packet Data
Using the ACT Algorithm

H. Bodek, R. L. Grossman and I. Pulleyn

Magnify, Inc.
100 S. Wacker Suite 1130
Chicago, IL 60606, USA
(+1) 312 214 1420     (+1) 312 214 1429 fax
{haim, rlg, ivan}@magnify.com     http://www.magnify.com

Introduction

Data mining is the automatic extraction of patterns, associations, and anomalies from large data sets. An interesting application is the use of data mining to detect network intrusions through the analysis of network packet data or computer audit data. With the rise of the internet this timely detection of intrusions and attacks has grown in importance.

A common technique in data mining is to use classification and regression trees (CART) [Breiman, 1984]. A naïve application of CART and related techniques did not prove successful at detecting network. This seems to be due to several factors, the most important of which are:

  1. Network intrusions are very rare and detecting rare events is much harder than detecting more common events.
  2. The size of the data analyzed was very large. Sampling the data reduced the performance our system, while working with the entire data set requires some sort of data management facility. Growing a single tree on the entire data was problematic.

To address the first problem, as an alternative to CART for this problem, we tried two related algorithms called ART and ACT, which we introduced in [Grossman, 1997b]. Briefly, instead of using a single classification tree to predict whether a given sequence of packets represents an intrusion, in the ACT (Averaged Classification Trees) algorithm, a collection of classification trees are used, and the results are combined. Different variants of the algorithm use different methods for producing the collection of classification trees and for combining the results.

To address the second problem, we used a specialized data management system called an object warehouse [Grossman et. al., 1997a]. This allowed us to run numerically and statistically intensive numerical computations on very large data sets without the overhead incurred with a traditional database, while still enjoying the benefit that a data management system provides of working with data that is too large to fit into memory.

In this paper, we describe we briefly review the ACT algorithm and then describe some experimental results indicating its successful application to the problem of detecting network intrusions. For our experiments, we used the PATTERN™ system developed by Magnify, Inc. The PATTERN system employs a parallel version of the ACT algorithm and is layered over an object warehouse in order to support the mining of very large data sets.

We believe that this brief note makes two contributions: First, we have demonstrated the effectiveness of using data mining to augment the current methods for detecting network intrusions. Second, this application demonstrates the applicability of the ACT and ART algorithms when detecting rare events in very large data sets.

For related work on network intrusion, see [Sundaram]. This approach described here has been applied to other problems in anomaly detection, including detecting fraudulent credit card transactions and predicting delinquency [Grossman and Poor, 1996a] and [Grossman, Bodek, Northcutt and Poor, 1996b]. A preliminary version of this paper appeared in [Bokek et. al., 1997].

Averaging Classification Trees (ACT)

In this section, we briefly review the ACT and ART algorithms, following [Grossman, 1997b]. Here is the basic set up:

  1. We are given a collection of objects x1, …, xn with attributes xj[1], …, xj[m].
  2. The algorithm builds a finite, rooted tree, each node of which, except for the leaves, is labeled with a decision involving one or more attributes. In the simplest case, a binary tree is built by labeling each node with a threshold comparison of the form: is attribute j < a threshold c? If so, the object is assigned to the left child, if not to the right child. Initially all the objects are assigned to the root of the tree.
The standard CART algorithm involves the following steps [Brieman, 1984]:

    Build a tree:

    1. Assign a collection of objects to the root.
    2. At each (non-leaf) node k, choose an attribute jk and a threshold ck. If object x belongs to the node k and x[jk] < ck, assign x to the left child; otherwise, to the right child.
    3. Each leaf node is then assigned a classification (for example by majority vote of the objects assigned to the leaf) or a regression score (for example by averaging the dependent variable of the objects assigned to the leaf). Let T(x) denote the score assigned to an object x by a tree T.

An entropy based criterion is often used to determine the splits in Step 2.

The ACT and ART algorithms add two additional steps:

    I. Define a cover. A cover U of the objects x consists of a collection of sets U. Each object x is assigned to one or more of the sets Ue U. We allow a single object to be assigned more than once to a single element U of the cover.

    II. Build trees. Build a tree TU as usual for the data assigned to each set Ue U.

    III. Average the trees. Fix a finite probability measure aU on the cover U:

    aU > 0,       S aU = 1.

    Given an object x, the ART algorithm uses the score

    S aU TU(x),

    while the ACT algorithm assigns the classification of the class whose score weighted by the measure is largest.

Special cases include: partitioned learning (the cover is a partition) and local-learning (nearest neighbors are used to define the sets U).

Note that if TU is a regression tree, then the sum the

S aU TU(x),

gives what is called an Averged Regression Tree (ART) [Grossman, 1997b].

Detecting Intrusions Using ACT

In this section, we follow [Bodek, 1997] and outline the approach we used for detecting network intrusions using ACT:

  1. Instead of using ACT to model individual packets directly, we used ACT to model windows containing 100 packets. We broke each packet into 17 attributes, so in essence we modeled the network using attributes of length 1700.
  2. We collected network packet data of the network of interest during a "normal" period. That is a period without network intrusions. Intrusions are modeled simply as statistical deviations from the "normal" behavior of the network.
  3. Next, we defined a cover for the data set and then used the ACT algorithm to average the collection of trees built from the cover.
  4. As a final step, certain anomalous behavior was filtered out since it represented normal, but unusual behavior of the system. For example, network traffic created by the network monitor's use of the network file system.

ENTRY (36507051)
[0]timestamp: 17:39:04.463226
[1]packet from foreign source: FALSE
[2]source port:non-standard (852)
[3]destination ip 0-7: 2
[4]destination ip 8-15: 128
[5]destination ip 16-23:158
[6]destination ip 23-31:206
[7]destination port: nfsd (2049)
[8]flag P:SET
[9]flag F:CLEAR
[10]flag S:CLEAR
[11]flag R:CLEAR
[12]Packet type:tcp
[13] ack number:893
[14]window length:n/a
[15]sequence number:3009
[16]data length:108

Figure 1. This figure contains a sample network packet. Approximately 50 million such packets were analyzed to build 20 models using ACT. These 20 models were used to model the "normal behavior" of the network. Analysis windows containing 100 packets were then compared to these models. Significant deviations were considered to be network intrusions. With this approach, we achieved an accuracy rate of approximately 95%.

Experimental Study

In this section, we follow [Bodek, 1997] and describe an experimental study of our algorithm for network intrusions.

For this study, we used a parallel implementation of the ACT algorithm available in Version 1.1 of the PATTERN™ Data Mining System developed by Magnify, Inc. We performed the study during the Supercomputing 96 conference, which took place in Pittsburgh from November 16 to 23. During the conference, the Magnify network was monitored continuously for network intrusions and the results were displayed in real time on the floor of the conference.

Packet sniffers cannot capture all network packets. This is due to the latency inherent in a multitasking operating system. Approximately 60% of packet stream can be captured, with 40% being dropped by the UNIX kernel. During the modeling phase, 50 million packets were modeled, resulting in an data store approximately 4 gigabytes in size. Packets were collected at the rate of 1 million per day. From this data 20 models were generated. The process of classification took approximately 52 hours on a dual processor Sun Sparcstation 20.

Over this period of time, the normal traffic patterns cause a small number alerts. Only when an intrusive action was intentionally taken by the researchers did the alert monitor warn of anomalous patterns. Intrusive actions that correctly caused an anomalous behavior alert included NFS file transfers to the intranet firewall, telnet sessions passing through the firewall to the network file server, and X windows traffic displaying applications outside of the subnet. It should be noted that similar traffic that was modeled during classification did not cause alerts, for instance, the display of X windows applications within the subnet, or NFS packets passing between nodes on the intranet. The accuracy was over 95% given an analysis window of 100 packets. The misclassification was about 15%.

Summary and Conclusion

In this paper, we have described a data mining an approach for detecting network intrusions using ACT. Operating systems and networks produce large amounts of audit data. Hidden with this data are patterns indicative of anomalous behavior, including network intrusions. With the appropriate data mining algorithms, these patterns can be extracted automatically to flag such behavior.

The problem is difficult due to the large size of the audit logs and the rarity of the intrusions. The ACT algorithm appears to perform well in these situations. By using 20 models instead of one model, the ACT algorithm increased its accuracy rate and decreased its rate of false positives.

Although the system is very young, the capabilities demonstration which took place during the Supercomputing 96 conference showed the practicality and effectiveness of this approach.

References

Bodek, H., R. L. Grossman, and I. Pulleyn (1997). The Data Mining and Analysis of Packet Data for Detecting Network Intrusions: Preliminary Results. Proceedings of the First International Conference on The Practical Application of Knowledge Discovery and Data Mining, Practical Application Company, Ltd, pages 91-95.

Breiman, L, J. H. Friedman, A. Olshen, and C. J. Stone (1984). Classification and Regression Trees,. Wadsworth, Belmont, California.

Grossman, R. L. and H. V. Poor (1996a). Optimization Driven Data Mining and Credit Scoring. Proceedings of the IEEE/IAFE 1996 Conference on Computational Intelligence for Financial Engineering (CIFEr), IEEE, Piscataway, pages 104-110.

Grossman, R. L., H. Bodek, D. Northcutt, and H. V. Poor (1996b). Data Mining and Tree-based Optimization. Proceedings of the Second International Conference on Knowledge Discovery and Data Mining, E. Simoudis, J. Han and U. Fayyad, editors, AAAI Press, Menlo Park, California, pages 323-326.

Grossman, R. L., S. Bailey and D. Hanley, (1997a). Data Mining Using Light Weight Object Management in Clustered Computing Environments, Proceedings of the Seventh International Workshop on Persistent Object Stores. Morgan-Kauffmann, San Mateo, pages 237-249.

Grossman, R. L. (1997b). ACT and ART: Averaging Classification and Regression Trees for Scalable Data Mining. Submitted for publication.

A. Sundaram. An Introduction to Intrusion Detection. Submitted for publication.

Status

A draft of this technical note was distributed at the following events: