ACTIVE SMOTE for Imbalanced Medical Data Classification
Abstract
Classifying imbalanced data is a big challenge for machine learning techniques, especially for medical data. To deal with this challenge, many solutions have been proposed. The most famous methods are based on the Synthetic Minority Over-sampling Technique (SMOTE), which creates new synthetic instances in the minority class. In this paper, we study the efficiency of the SMOTE-based methods on some imbalanced data sets. We then propose extending these techniques with Active Learning to control the evolution of the minority class better. Active Learning uses uncertainty and diversity sampling to choose wisely the data points from which the synthetic samples will be generated. To evaluate our approach, we make comprehensive experimental studies on two medical data sets for diabetes diagnosis and breast cancer diagnosis.
Keywords
Imbalanced medical data Machine Learning
SMOTE
Active Learning
Diversity Sampling
Uncertainty Sampling
Diabetes Diagnosis
Breast Cancer Detection
Imbalanced medical data Machine Learning SMOTE Active Learning Diversity Sampling Uncertainty Sampling Diabetes Diagnosis Breast Cancer Detection
Imbalanced medical data
Machine Learning
Domains
Computer Science [cs]Origin | Files produced by the author(s) |
---|