Razoqi, S., Al-Talib, G. (2024). A composite Feature Selection Method to improve Classifying Imbalanced Big Data. , 18(2), 70-81. doi: 10.33899/csmj.2024.149115.1117
Shaymaa Ahmed Razoqi; Ghayda Abdulaziz Al-Talib. "A composite Feature Selection Method to improve Classifying Imbalanced Big Data". , 18, 2, 2024, 70-81. doi: 10.33899/csmj.2024.149115.1117
Razoqi, S., Al-Talib, G. (2024). 'A composite Feature Selection Method to improve Classifying Imbalanced Big Data', , 18(2), pp. 70-81. doi: 10.33899/csmj.2024.149115.1117
Razoqi, S., Al-Talib, G. A composite Feature Selection Method to improve Classifying Imbalanced Big Data. , 2024; 18(2): 70-81. doi: 10.33899/csmj.2024.149115.1117
A composite Feature Selection Method to improve Classifying Imbalanced Big Data
AL-Rafidain Journal of Computer Sciences and Mathematics
1Department of Computer Science, College of Education for Pure Science, University of Mosul, Mosul, Iraq
2Department of Computer Science, College of Computer Science and Mathematics, University of Mosul, Mosul, IRAQ
Abstract
Feature selection is one of the methods used to improve the performance of machine learning algorithms, especially when classifying the big data. the fined of new method was be more needed when dealing with the imbalanced big data. An imbalance in the data appears when there is a discrepancy in the sampling distribution between the two data classes in the training set. To solve the imbalance problem, there are several methods used, some of which depend on redistributing the data and others of which depend on improving the classification algorithm itself. The feature selection can also affect the improvement of imbalanced data classification results when the features are chosen carefully. Therefore, this research proposed a composed feature selection method using the filter feature selection technique and permutation-based important features with the ensemble learning method. Three classifiers were used with three performance metrics (AUC, G-means, and F-score ) to show the effect of proposed feature selection method with imbalanced big data. The results of using proposed method led to improved classification on five standard imbalanced data sets.