Cyber Security and Confusion Matrix

Mayank_Agarwal
4 min readSep 7, 2021

Cyber Security talk about confusion matrix or its two types of error

Confusion matrix is a fairly common term when it comes to machine learning. Today I would be trying to relate the importance of confusion matrix when considering the cyber crimes.

In the field of Machine Learning and specifically the problem of statistical classification, a confusion matrix, also known as an error matrix,

So before we dive deep let’s first understand what a confusion matrix is.

Confusion Matrix …

A Confusion matrix is an N x N matrix used for evaluating the performance of a classification model, where N is the number of target classes. The matrix compares the actual target values with those predicted by the machine learning model. This gives us a holistic view of how well our classification model is performing and what kinds of errors it is making.

True Positive:

Interpretation: You predicted positive and it’s true.

True Negative:

Interpretation: You predicted negative and it’s true.

False Positive: (Type 1 Error)

Interpretation: You predicted positive and it’s false.

False Negative: (Type 2 Error)

Interpretation: You predicted negative and it’s false.

TWO TYPES OF ERROR OF CONFUSION MATRIX :

Confusion matrices have two types of errors: Type I and Type II.

Type I error:

The first way is to rewrite False Negative and False Positive. False Positive is a Type I error because False Positive = False True and that only has one F. False Negative is a Type II error because False Negative = False False so thus there are two F’s making it a Type II. (Kudos to Riley Dallas for this method!)

Type I error:

The second way is to consider the meanings of these words. False Positive contains one negative word (False) so it’s a Type I error. False Negative has two negative words (False + Negative) so it’s a Type II error.

CYBER-SECURITY WITH CONFUSION MATRIX

Cyber Security:

Cyber security is the application of technologies, processes and controls to protect systems, networks, programs, devices and data from cyber attacks. It aims to reduce the risk of cyber attacks and protect against the unauthorized exploitation of systems, networks and technologies.

Case study:

This data set is prepared by Stolfo et al and is built based on the data captured in the DARPA’98 IDS evaluation program . DARPA’98 is about 4 gigabytes of compressed raw (binary) TCP dump data of 7 weeks of network traffic, which can be processed into about 5 million connection records, each with about 100 bytes.

For each TCP/IP connection, 41 various quantitative (continuous data type) and qualitative (discrete data type) features were extracted among the 41 features, 34 features (numeric), and 7 features (symbolic).

To analysis the different results, there are standard metrics that have been developed for evaluating network intrusion detections. Detection Rate (DR) and false alarm rate are the two most famous metrics that have already been used. DR is computed as the ratio between the number of correctly detected attacks and the total number of attacks, while the false alarm (false positive) rate is computed as the ratio between the number of normal connections that is incorrectly misclassified as attacks and the total number of normal connections.

In parallel SVM machine first we reduced non-classified features data by distance matrix of binary pattern. From this concept, the cascade structure is developed by initializing the problem with a number of independent smaller optimizations and the partial results are combined in later stages in a hierarchical way, as shown in figure 1, supposing the training data subsets and are independent among each other.

  • True Positive (TP): The amount of attack detected when it is actually attack.
  • True Negative (TN): The amount of normal detected when it is actually normal.
  • False Positive (FP): The amount of attack detected when it is actually normal (False alarm).
  • False Negative (FN): The amount of normal detected when it is actually attack.

CONCLUSION

We can say that Machine Learning is a very much an important part of the IT industry and it has been used in every domain and it is being developed day by day to meet the need of the industry. We have also well discussed how the confusion matrix work and how it helps in real-world problems.

Thank you !

--

--