Pengaruh Sampling Method dan Feature Extraction untuk Meningkatkan Detection Rate pada Minority Class pada Intrusion Detection System yang Disusun dari Support Vector Machine, Decision Tree, dan Naïve Bayes
Keywords:
Mesin Casting, Vakum, Pengecoran, Perhimasan, EmasAbstract
Intrusion detection system (IDS) has started to rely on machine learning to perform misuse detection or anomaly detection. As misuse detection, machine learning must be able to detect various types of intrusions, including those that are rare. However, machine learning has weaknesses, especially when faced with imbalanced datasets. Various methods are used to make machine learning able to perform the classification correctly even though the data provided is imbalanced.
One of them in this study tries to implement Principal Component Analysis as feature extraction, Tomek Links as under-sampling and ADASYN as over-sampling on datasets. There are two types of datasets used in this research, namely KDD-99 and UNSW-NB15.
The results obtained from research on the KDD’99 dataset are, Support Vector Machine can identify more intrusions than before and True Positive Rate of Decision Tree model for minority classes is increased between 0.03% to 4.762%. The results obtained from research on UNSW-NB15 dataset, accuracies for Support Vector Machine and Naïve Bayes models are increased between 0.045% to 1.513%.References
[1] Abdulhammed, Razan, Musafer, Hassan, Alessa, Ali, Faezipour, Miad, and Abuzneid, Abdelshakour. 2019. Features dimensionality reduction approaches for machine learning based network intrusion detection. Electron. 8, 3. DOI:https://doi.org/10.3390/electronics8030322
[2] Atilla, Ozgur and Hamit, Erdem. 2016. A review of KDD99 dataset usage in intrusion detection and machine learning between 2010 and 2015. PeerJ, 0–21. DOI:https://doi.org/10.7287/peerj.preprints.1954v1
[3] Goeschel, Kathleen. 2016. Reducing false positives in intrusion detection systems using data-mining techniques utilizing support vector machines, decision trees, and naive Bayes for off-line analysis. In Conference Proceedings - IEEE SOUTHEASTCON. DOI:https://doi.org/10.1109/SECON.2016.7506774
[4] He, Haibo, Bai, Yang, Garcia, Edwardo A., and Li, Shutao. 2008. ADASYN: Adaptive synthetic sampling approach for imbalanced learning. Proc. Int. Jt. Conf. Neural Networks 3 (2008), 1322–1328. DOI:https://doi.org/10.1109/IJCNN.2008.4633969
[5] Ibrahimi, Khalil and Ouaddane, Mostafa. 2017. Management of intrusion detection systems based-KDD99: Analysis with LDA and PCA. Proc. - 2017 Int. Conf. Wirel. Networks Mob. Commun. WINCOM 2017. DOI:https://doi.org/10.1109/WINCOM.2017.8238171
[6] Ippolito, Pier Paolo. 2019. Feature Extraction Techniques. An end to end guide on how to reduce a… | by Pier Paolo Ippolito | Towards Data Science. URI= https://towardsdatascience.com/feature-extraction-techniques-d619b56e31be
[7] Jollife, Ian T. and Cadima, Jorge. 2016. Principal component analysis: A review and recent developments. Philos. Trans. R. Soc. A Math. Phys. Eng. Sci. 374, 2065. DOI:https://doi.org/10.1098/rsta.2015.0202
[8] Khor, Kok-Chin, Ting, Choo-Yee, and Phon-Amnuaisuk, Somnuk. 2014. The Effectiveness of Sampling Methods for the Imbalanced Network Intrusion Detection Data Set. In Advances in Intelligent Systems and Computing. 613–622. DOI:https://doi.org/10.1007/978-3-319-07692-8_58
[9] Miah, Md Ochiuddin, Khan, Sakib Shahriar, Shatabda, Swakkhar, and Farid, Dewan Md. 2019. Improving Detection Accuracy for Imbalanced Network Intrusion Classification using Cluster-based Under-sampling with Random Forests. 1st Int. Conf. Adv. Sci. Eng. Robot. Technol. 2019, ICASERT 2019, Icasert, 1–5. DOI:https://doi.org/10.1109/ICASERT.2019.8934495
[10] Patil, Aniket. Principal Component Analysis(PCA) | by Aniket Patil | Analytics Vidhya | Medium. URI= https://medium.com/analytics-vidhya/principal-component-analysis-pca-8a0fcba2e30c
[11] Pawlicki, Marek, Choraś, Michał, Kozik, Rafał, and Hołubowicz, Witold. 2020. On the Impact of Network Data Balancing in Cybersecurity Applications. In Krzhizhanovskaya, Valeria V., Závodszky, Gábor, Lees, Michael H., Dongarra, Jack J., Sloot, Peter M. A., Brissos, Sérgio and Teixeira, João (eds.). Springer International Publishing, Cham, 196–210. DOI:https://doi.org/10.1007/978-3-030-50423-6_15
[12] Scikit-learn. sklearn.tree.DecisionTreeClassifier — scikit-learn 0.24.2 documentation. URI= https://scikit-learn.org/stable/modules/generated/sklearn.tree.DecisionTreeClassifier.html#sklearn.tree.DecisionTreeClassifier.predict_proba
[13] Seo, Jae Hyun and Kim, Yong Hyuk. 2018. Machine-learning approach to optimize smote ratio in class imbalance dataset for intrusion detection. Comput. Intell. Neurosci. 2018, (2018). DOI:https://doi.org/10.1155/2018/9704672
[14] Singh, Rohit, Kalra, Mala, and Solanki, Shano. 2019. A hybrid approach for intrusion detection based on machine learning. Proc. Int. Conf. Intell. Sustain. Syst. ICISS 2019 Iciss, 187–192. DOI:https://doi.org/10.1109/ISS1.2019.8908116
[15] Su, Peihuang, Liu, Yanhua, and Song, Xiang. 2018. Research on intrusion detection method based on improved SMOTE and XGBoost. ACM Int. Conf. Proceeding Ser., 42–49. DOI:https://doi.org/10.1145/3290480.3290505
[16] Tavallaee, Mahbod, Bagheri, Ebrahim, Lu, Wei, and Ghorbani, Ali A. 2009. A detailed analysis of the KDD CUP 99 data set. In 2009 IEEE Symposium on Computational Intelligence for Security and Defense Applications, IEEE, 1–6. DOI:https://doi.org/10.1109/CISDA.2009.5356528
[17] Tomek, Ivan. 1976. Two Modifications of Cnn. IEEE Trans. Syst. Man Cybern. SMC-6, 11, 769–772. DOI:https://doi.org/10.1109/TSMC.1976.4309452
[18] Wu, Junqi and Hu, Zhengbing. 2008. Study of Intrusion Detection Systems (IDSs) in Network Security. In 2008 4th International Conference on Wireless Communications, Networking and Mobile Computing, IEEE, 1–4. DOI:https://doi.org/10.1109/WiCom.2008.1085