Breast Cancer Diagnosis from Perspective of Class Imbalance

Document Type : Original Paper


1 Scholl of Information and Technology, Northwest University, Xi'an,China

2 shool of Information and Technology, Northwest Nniversity, Xi'an, Chian


Introduction: Breast cancer is the second cause of mortality among women. Early detection is the only rescue to reduce the risk of breast cancer mortality. Traditional methods cannot effectively diagnose tumor since they are based on the assumption of well-balanced dataset.. However, a hybrid method can help to alleviate the two-class imbalance problem existing in the diagnosis of breast cancer and establish a more accurate diagnosis.
Material and Methods: The proposed hybrid approach was based on improved Laplacian score (LS) andK-nearest neighbor (KNN) algorithms called LS-KNN. An improved LS algorithm was used for obtaining the optimal feature subset. The KNN with automatic K was utilized for classifying the data which guaranteed the effectiveness of the proposed method by reducing the computational effort and making the classification more faster. The effectiveness of LS-KNN was also examined on two biased-representative breast cancer datasets using classification accuracy, sensitivity, specificity, G-mean, and Matthews correlation coefficient.
Results: Applying the proposed algorithm on two breast cancer datasets indicated that the efficiency of the new method was higher than the previously introduced methods. The obtained values of accuracy, sensitivity, specificity, G-mean, and Matthews correlation coefficient were 99.27%, 99.12%, 99.51%, 99.42%, respectively.
Conclusion: Experimental results showed that the proposed approach worked well with breast cancer datasets and could be a good alternative to the well-known machine learning methods


Main Subjects

  1. References


    1. Mohammadpoor M, Shoeibi A, Shojaee H. A Hierarchical Classification Method for Breast Tumor Detection. Iranian Journal of Medical Physics. 2016;13(4):261-8.
    2. Sahan S, Polat K, Kodaz H. A new hybrid method based on fuzzy-artificial immune system and k-nn algorithm for breast cancer diagnosis. Computers in Biology and Medicine. 2007;37(3): 415-23.
    3. Mehmet Fatih Akay. Support vector machines combined with feature selection for breast cancer diagnosis. Expert Systems with Applications. 2009;36(2):3240-7.
    4. Chen Hui-Ling, Yang Bo, Liu Jie, Liu Da-You. A support vector machine classifier with rough set-based feature selection for breast cancer diagnosis. Expert Systems with Applications. 2011;38(7):9014-22.
    5. El-Baz A H. Hybrid intelligent system-based rough set and ensemble classifier for breast cancer diagnosis. Neural Computing and Applications. 2015;26(2):437-46.
    6. Bichen Zheng, Sang Won Yoon, Sarah S Lam. Breast cancer diagnosis based on feature extraction using a hybrid of k-means and support vector machine algorithms. Expert Systems with Applications. 2014;41(1):1476-82.
    7. Pashaei E, Ozen M, Aydin N, editors. Improving medical diagnosis reliability using Boosted C5.0 decision tree empowered by Particle Swarm Optimization. Engineering in Medicine and Biology Society. 37th Annual International Conference of the IEEE. 2015.
    8. Peng L, Chen W, Zhou W, Li F, Yang J, Zhang J. An immune-inspired semi-supervised algorithm for breast cancer diagnosis. Computer Methods and Programs in Biomedicine. 2016;134(C):259-65.
    9. Sheikhpour R, Sarram M A, Sheikhpour R. Particle swarm optimization for bandwidth determination and feature selection of kernel density estimation based classifiers in diagnosis of breast cancer. Applied Soft Computing. 2016;40:113-31.
    10. Bayan C, Fisher R. Classifying imbalanced data sets using similarity based hierarchical decomposition. Pattern Recognit. 2015;48:1653–72.
    11. Galar M, Fernandez A, Barrenechea E, Bustince H, Herrera F. A review on ensembles for the class imbalance problem: bagging-, boosting-, and hybrid-based approaches. IEEE Trans. Syst. Man Cybern. – Part C. 2012;42(4):463–84.
    12. Shirkavand A, Mohammadreza H. Detection of Melanoma Skin Cancer by Elastic Scattering Spectra: A Proposed Classification Method. Iranian Journal of Medical Physics. 2017;14(3):162-6.
    13. Zhang J, Mani I, editors. kNN Approach to Unbalanced Data Distributions: A Case Study involving Information Extraction. workshop on Learning from imbalanced Datasets. in Proceedings of the International Conference on Machine Learning. 2003: AAAI Press; 2003: 42-8.
    14. Zhang Y, Lu S, Zhou X, Yang M, Wu L, Liu B. Comparison of machine learning methods for stationary wavelet entropy-based multiple sclerosis detection: decision tree, k-nearest neighbors, and support vector machine. Imulation. 2016;92(9):861-71.
    15. He X, Cai D, Niyogi P. Laplacian score for feature selection. Advances in neural information processing systems. Neural Inofrmation Processing Systems. 2006.
    16. Belkin M, Niyogi P. Laplacian eigenmaps and spectral techniques for embedding and clustering. Advances in Neural Information Processing Systems. 2009;14(6):585-91.
    17. Han J. Data mining: concepts and techniques. San Francisco: Morgan Kaufmann Publishers Inc. 2005.
    18. Kohavi R, John G H. Wrappers for feature subset selection. Artificial Intelligence. 1997;91(1-2):273–324.
    19. Raeder T, Forman G, Chawla N V. Learning from Imbalanced Data: Evaluation matters. In: Dawn E. Holmes, Lakhmi C. Jain. Data Mining: Foundations and Intelligent Paradigms. Berlin Heidelberg: Springer Berlin Heidelberg. 2012:315–31.
    20. Dehghani-Bidgoli Z, Baygi MHM, Kabir E, Malekfar R. Common Raman Spectral Markers among Different Tissues for Cancer Detection. Iranian Journal of Medical Physics. 2014;11(4):308-15.
    21. Dastjerdi MV, Zadeh ZD, Mousavi SJ, Askari HR, Soltanolkotabi M. Hair analysis by means of laser induced breakdown spectroscopy technique and support vector machine model for diagnosing addiction. Iranian Journal of Physics Research. 2018;17(5):661-7.
    22. Pashaei E, Aydin N. Binary black hole algorithm for feature selection and classification on biological data. Applied Soft Computing. 2017; 56: 94-106.
    23. Maarten van Someren, Gerhard Widmer. Learning When Negative Examples Abound: 1997: 9th European Conference on Machine Learning Prague; 1998 April 23-25; Berlin, Germany. 1997:1224:146-53.
    24. Kubat M, Matwin S. Addressing the Curse of Imbalanced Training Sets: One-Sided Selection. In Proceedings of the Fourteenth International Conference on Machine Learning. 1997; 179-86.
    25. Das SR, Panigrahi PK, Das K, Mishra D. Improving RBF Kernel Function of Support Vector Machine using Particle Swarm Optimization. International Journal of Advanced Computer Research. 2012;2(7):130-5.
    26. Palaniappan S, Pushparaj T. A novel prediction on breast cancer from the basis of association rules and neural network. International Journal of Computer Science and Mobile Computing. 2013;2(4):269-77.
    27. Ahmad F, Isa NA, Hussain Z, Sulaiman SN. A genetic algorithm-based multi-objective optimization of an artificial neural network classifier for breast cancer diagnosis. Neural Computing and Applications. 2013;23(5):1427–35.
    28. Li J-B, Peng Y, Liu D. Quasiconformal kernel common locality discriminant analysis with application to breast cancer diagnosis. Information Sciences. 2013;232(2):256-69.
    29. Bamakan S M H, Gholami P. A Novel Feature Selection Method based on an Integrated Data Envelopment Analysis and Entropy Model. Procedia Computer Science. 2014;31:632-8.
    30. Xue B, Zhang M, Browne W N. Particle swarm optimisation for feature selection in classification: Novel initialisation and updating mechanisms. Applied Soft Computing. 2014;18(4):261-76.
    31. Sridevi T, Murugan A. A Novel Feature Selection Method for Effective Breast Cancer Diagnosis and Prognosis. International Journal of Computer Applications. 2014;88(11):28-33.