Pengaruh Sampling Method dan Feature Extraction untuk Meningkatkan Detection Rate pada Minority Class pada Intrusion Detection System yang Disusun dari Support Vector Machine, Decision Tree, dan Naïve Bayes

Janthake Decuellar(1*), Henry Novianus Palit(2), Justinus Andjarwirawan(3),


(1) Program Studi Informatika
(2) Program Studi Informatika
(3) Program Studi Informatika
(*) Corresponding Author

Abstract


Intrusion detection system (IDS) has started to rely on machine learning to perform misuse detection or anomaly detection. As misuse detection, machine learning must be able to detect various types of intrusions, including those that are rare. However, machine learning has weaknesses, especially when faced with imbalanced datasets. Various methods are used to make machine learning able to perform the classification correctly even though the data provided is imbalanced.

 One of them in this study tries to implement Principal Component Analysis as feature extraction, Tomek Links as under-sampling and ADASYN as over-sampling on datasets. There are two types of datasets used in this research, namely KDD-99 and UNSW-NB15.

The results obtained from research on the KDD’99 dataset are, Support Vector Machine can identify more intrusions than before and True Positive Rate of Decision Tree model for minority classes is increased between 0.03% to 4.762%. The results obtained from research on UNSW-NB15 dataset, accuracies for Support Vector Machine and Naïve Bayes models are increased between 0.045% to 1.513%.

Keywords


intrusion detection system; machine learning; over-sampling; under-sampling; feature extraction

Full Text:

PDF

References


Abdulhammed, Razan, Musafer, Hassan, Alessa, Ali, Faezipour, Miad, and Abuzneid, Abdelshakour. 2019. Features dimensionality reduction approaches for machine learning based network intrusion detection. Electron. 8, 3. DOI:https://doi.org/10.3390/electronics8030322

Atilla, Ozgur and Hamit, Erdem. 2016. A review of KDD99 dataset usage in intrusion detection and machine learning between 2010 and 2015. PeerJ, 0–21. DOI:https://doi.org/10.7287/peerj.preprints.1954v1

Goeschel, Kathleen. 2016. Reducing false positives in intrusion detection systems using data-mining techniques utilizing support vector machines, decision trees, and naive Bayes for off-line analysis. In Conference Proceedings - IEEE SOUTHEASTCON. DOI:https://doi.org/10.1109/SECON.2016.7506774

He, Haibo, Bai, Yang, Garcia, Edwardo A., and Li, Shutao. 2008. ADASYN: Adaptive synthetic sampling approach for imbalanced learning. Proc. Int. Jt. Conf. Neural Networks 3 (2008), 1322–1328. DOI:https://doi.org/10.1109/IJCNN.2008.4633969

Ibrahimi, Khalil and Ouaddane, Mostafa. 2017. Management of intrusion detection systems based-KDD99: Analysis with LDA and PCA. Proc. - 2017 Int. Conf. Wirel. Networks Mob. Commun. WINCOM 2017. DOI:https://doi.org/10.1109/WINCOM.2017.8238171

Ippolito, Pier Paolo. 2019. Feature Extraction Techniques. An end to end guide on how to reduce a… | by Pier Paolo Ippolito | Towards Data Science. URI= https://towardsdatascience.com/feature-extraction-techniques-d619b56e31be

Jollife, Ian T. and Cadima, Jorge. 2016. Principal component analysis: A review and recent developments. Philos. Trans. R. Soc. A Math. Phys. Eng. Sci. 374, 2065. DOI:https://doi.org/10.1098/rsta.2015.0202

Khor, Kok-Chin, Ting, Choo-Yee, and Phon-Amnuaisuk, Somnuk. 2014. The Effectiveness of Sampling Methods for the Imbalanced Network Intrusion Detection Data Set. In Advances in Intelligent Systems and Computing. 613–622. DOI:https://doi.org/10.1007/978-3-319-07692-8_58

Miah, Md Ochiuddin, Khan, Sakib Shahriar, Shatabda, Swakkhar, and Farid, Dewan Md. 2019. Improving Detection Accuracy for Imbalanced Network Intrusion Classification using Cluster-based Under-sampling with Random Forests. 1st Int. Conf. Adv. Sci. Eng. Robot. Technol. 2019, ICASERT 2019, Icasert, 1–5. DOI:https://doi.org/10.1109/ICASERT.2019.8934495

Patil, Aniket. Principal Component Analysis(PCA) | by Aniket Patil | Analytics Vidhya | Medium. URI= https://medium.com/analytics-vidhya/principal-component-analysis-pca-8a0fcba2e30c

Pawlicki, Marek, Choraś, Michał, Kozik, Rafał, and Hołubowicz, Witold. 2020. On the Impact of Network Data Balancing in Cybersecurity Applications. In Krzhizhanovskaya, Valeria V., Závodszky, Gábor, Lees, Michael H., Dongarra, Jack J., Sloot, Peter M. A., Brissos, Sérgio and Teixeira, João (eds.). Springer International Publishing, Cham, 196–210. DOI:https://doi.org/10.1007/978-3-030-50423-6_15

Scikit-learn. sklearn.tree.DecisionTreeClassifier — scikit-learn 0.24.2 documentation. URI= https://scikit-learn.org/stable/modules/generated/sklearn.tree.DecisionTreeClassifier.html#sklearn.tree.DecisionTreeClassifier.predict_proba

Seo, Jae Hyun and Kim, Yong Hyuk. 2018. Machine-learning approach to optimize smote ratio in class imbalance dataset for intrusion detection. Comput. Intell. Neurosci. 2018, (2018). DOI:https://doi.org/10.1155/2018/9704672

Singh, Rohit, Kalra, Mala, and Solanki, Shano. 2019. A hybrid approach for intrusion detection based on machine learning. Proc. Int. Conf. Intell. Sustain. Syst. ICISS 2019 Iciss, 187–192. DOI:https://doi.org/10.1109/ISS1.2019.8908116

Su, Peihuang, Liu, Yanhua, and Song, Xiang. 2018. Research on intrusion detection method based on improved SMOTE and XGBoost. ACM Int. Conf. Proceeding Ser., 42–49. DOI:https://doi.org/10.1145/3290480.3290505

Tavallaee, Mahbod, Bagheri, Ebrahim, Lu, Wei, and Ghorbani, Ali A. 2009. A detailed analysis of the KDD CUP 99 data set. In 2009 IEEE Symposium on Computational Intelligence for Security and Defense Applications, IEEE, 1–6. DOI:https://doi.org/10.1109/CISDA.2009.5356528

Tomek, Ivan. 1976. Two Modifications of Cnn. IEEE Trans. Syst. Man Cybern. SMC-6, 11, 769–772. DOI:https://doi.org/10.1109/TSMC.1976.4309452

Wu, Junqi and Hu, Zhengbing. 2008. Study of Intrusion Detection Systems (IDSs) in Network Security. In 2008 4th International Conference on Wireless Communications, Networking and Mobile Computing, IEEE, 1–4. DOI:https://doi.org/10.1109/WiCom.2008.1085


Refbacks

  • There are currently no refbacks.


Jurnal telah terindeks oleh :