Objectives Breast cancer poses a significant global health challenge, characterized by complex origins and the potential for life-threatening metastasis. The critical need for early and accurate detection is underscored by the 685,000 lives claimed by the disease worldwide in 2020. Deep learning has made strides in advancing the prompt diagnosis of breast cancer. However, obstacles persist, such as dealing with high-dimensional data and the risk of overfitting, necessitating fresh approaches to improve accuracy and real-world applicability.
Methods In response to these challenges, we propose BCED-Net, which stands for Breast Cancer Ensemble Diagnosis Network. This innovative framework leverages transfer learning and the extreme gradient boosting (XGBoost) classifier on the Breast Cancer RSNA dataset. Our methodology involved feature extraction using pre-trained models—namely, Resnet50, EfficientnetB3, VGG19, Densenet121, and ConvNeXtTiny—followed by the concatenation of the extracted features. Our most promising configuration combined features extracted from deep convolutional neural networks—namely Resnet50, EfficientnetB3, and ConvNeXtTiny—that were classified using the XGBoost classifier.
Results The ensemble approach demonstrated strong overall performance with an accuracy of 0.89. The precision, recall, and F1-score values, which were all at 0.86, highlight a balanced trade-off between correctly identified positive instances and the ability to capture all actual positive samples.
Conclusion BCED-Net represents a significant leap forward in addressing persistent issues such as the high dimensionality of features and the risk of overfitting.
Objectives
Many studies based on microRNA (miRNA) expression profiles showed a new aspect of cancer classification. Because one characteristic of miRNA expression data is the high dimensionality, feature selection methods have been used to facilitate dimensionality reduction. The feature selection methods have one shortcoming thus far: they just consider the problem of where feature to class is 1:1 or n:1. However, because one miRNA may influence more than one type of cancer, human miRNA is considered to be ranked low in traditional feature selection methods and are removed most of the time. In view of the limitation of the miRNA number, low-ranking miRNAs are also important to cancer classification. Methods
We considered both high- and low-ranking features to cover all problems (1:1, n:1, 1:n, and m:n) in cancer classification. First, we used the correlation-based feature selection method to select the high-ranking miRNAs, and chose the support vector machine, Bayes network, decision tree, k-nearest-neighbor, and logistic classifier to construct cancer classification. Then, we chose Chi-square test, information gain, gain ratio, and Pearson's correlation feature selection methods to build the m:n feature subset, and used the selected miRNAs to determine cancer classification. Results
The low-ranking miRNA expression profiles achieved higher classification accuracy compared with just using high-ranking miRNAs in traditional feature selection methods. Conclusion
Our results demonstrate that the m:n feature subset made a positive impression of low-ranking miRNAs in cancer classification.
Citations
Citations to this article as recorded by
Cancer hallmark analysis using semantic classification with enhanced topic modelling on biomedical literature Supriya Gupta, Aakanksha Sharaff, Naresh Kumar Nagwani Multimedia Tools and Applications.2024; 83(31): 76429. CrossRef
Multi-Task Topic Analysis Framework for Hallmarks of Cancer with Weak Supervision Erdenebileg Batbaatar, Van-Huy Pham, Keun Ho Ryu Applied Sciences.2020; 10(3): 834. CrossRef
Microarray cancer feature selection: Review, challenges and research directions Moshood A. Hambali, Tinuke O. Oladele, Kayode S. Adewole International Journal of Cognitive Computing in En.2020; 1: 78. CrossRef
Identification of miRNA Biomarkers for Diverse Cancer Types Using Statistical Learning Methods at the Whole-Genome Scale Jnanendra Prasad Sarkar, Indrajit Saha, Adrian Lancucki, Nimisha Ghosh, Michal Wlasnowolski, Grzegorz Bokota, Ashmita Dey, Piotr Lipinski, Dariusz Plewczynski Frontiers in Genetics.2020;[Epub] CrossRef
Class-Incremental Learning With Deep Generative Feature Replay for DNA Methylation-Based Cancer Classification Erdenebileg Batbaatar, Kwang Ho Park, Tsatsral Amarbayasgalan, Khishigsuren Davagdorj, Lkhagvadorj Munkhdalai, Van-Huy Pham, Keun Ho Ryu IEEE Access.2020; 8: 210800. CrossRef
MicroRNA-449a enhances radiosensitivity by downregulation of c-Myc in prostate cancer cells Aihong Mao, Qiuyue Zhao, Xin Zhou, Chao Sun, Jing Si, Rong Zhou, Lu Gan, Hong Zhang Scientific Reports.2016;[Epub] CrossRef
Comparing the normalization methods for the differential analysis of Illumina high-throughput RNA-Seq data Peipei Li, Yongjun Piao, Ho Sun Shon, Keun Ho Ryu BMC Bioinformatics.2015;[Epub] CrossRef
Juyoung Lee, Bhumsuk Keam, Eun Jung Jang, Mi Sun Park, Ji Young Lee, Dan Bi Kim, Chang-Hoon Lee, Tak Kim, Bermseok Oh, Heon Jin Park, Kyu-Bum Kwack, Chaeshin Chu, Hyung-Lae Kim
Osong Public Health Res Perspect. 2011;2(2):75-82. Published online June 30, 2011
Objectives
Recent genetic association studies have provided convincing evidence that several novel loci and single nucleotide polymorphisms (SNPs) are associated with the risk of developing type 2 diabetes mellitus (T2DM). The aims of this study were: 1) to develop a predictive model of T2DM using genetic and clinical data; and 2) to compare misclassification rates of different models. Methods
We selected 212 individuals with newly diagnosed T2DM and 472 controls aged in their 60s from the Korean Genome and Epidemiology Study. A total of 499 known SNPs from 87 T2DM-related genes were genotyped using germline DNA. SNPs were analyzed for significant association with T2DM using various classification algorithms including Quest (Quick, Unbiased, Efficient, Statistical tree), Support Vector Machine, C4.5, logistic regression, and K-nearest neighbor. Results
We tested these models using the complete Korean Genome and Epidemiology Study cohort (n = 10,038) and computed the T2DM misclassification rates for each model. Average misclassification rates ranged at 28.2–52.7%. The misclassification rates for the logistic and machine-learning algorithms were lower than the statistical tree algorithms. Using 1-to-1 matched data, the misclassification rate of the statistical tree QUEST algorithm using body mass index and SNP variables was the lowest, but overall the logistic regression performed best. Conclusions
The K-nearest neighbor method exhibited more robust results than other algorithms. For clinical and genetic data, our “multistage adjustment” model outperformed other models in yielding lower rates of misclassification. To improve the performance of these models, further studies using warranted, strategies to estimate better classifiers for the quantification of SNPs need to be developed.
Citations
Citations to this article as recorded by
Genetic biomarkers and machine learning techniques for predicting diabetes: systematic review Sulaiman Khan, Farida Mohsen, Zubair Shah Artificial Intelligence Review.2024;[Epub] CrossRef
Population stratification in type 2 diabetes mellitus: A systematic review Sam Hodgson, Sukhmani Cheema, Zareena Rana, Doyinsola Olaniyan, Ellen O’Leary, Hermione Price, Hajira Dambha‐Miller Diabetic Medicine.2022;[Epub] CrossRef
The Prediction of Diabetes Lalit Kumar, Prashant Johri International Journal of Reliable and Quality E-He.2022; 11(1): 1. CrossRef
Hypertension: Constraining the Expression of ACE-II by Adopting Optimal Macronutrients Diet Predicted via Support Vector Machine Mohammad Farhan Khan, Gazal Kalyan, Sohom Chakrabarty, M. Mursaleen Nutrients.2022; 14(14): 2794. CrossRef
Supervised and unsupervised algorithms for bioinformatics and data science Ayesha Sohail, Fatima Arif Progress in Biophysics and Molecular Biology.2020; 151: 14. CrossRef
Medical Internet of things using machine learning algorithms for lung cancer detection Kanchan Pradhan, Priyanka Chawla Journal of Management Analytics.2020; 7(4): 591. CrossRef
Perspective: Advancing Understanding of Population Nutrient–Health Relations via Metabolomics and Precision Phenotypes Stephanie Andraos, Melissa Wake, Richard Saffery, David Burgner, Martin Kussmann, Justin O'Sullivan Advances in Nutrition.2019; 10(6): 944. CrossRef
Stacked classifiers for individualized prediction of glycemic control following initiation of metformin therapy in type 2 diabetes Dennis H. Murphree, Elaheh Arabmakki, Che Ngufor, Curtis B. Storlie, Rozalina G. McCoy Computers in Biology and Medicine.2018; 103: 109. CrossRef
Machine Learning and Data Mining Methods in Diabetes Research Ioannis Kavakiotis, Olga Tsave, Athanasios Salifoglou, Nicos Maglaveras, Ioannis Vlahavas, Ioanna Chouvarda Computational and Structural Biotechnology Journal.2017; 15: 104. CrossRef
Survey on clinical prediction models for diabetes prediction N. Jayanthi, B. Vijaya Babu, N. Sambasiva Rao Journal of Big Data.2017;[Epub] CrossRef
Rule Extraction From Support Vector Machines Using Ensemble Learning Approach: An Application for Diagnosis of Diabetes Longfei Han, Senlin Luo, Jianmin Yu, Limin Pan, Songjing Chen IEEE Journal of Biomedical and Health Informatics.2015; 19(2): 728. CrossRef
Depression among Korean Adults with Type 2 Diabetes Mellitus: Ansan-Community-Based Epidemiological Study Chan Young Park, So Young Kim, Jong Won Gil, Min Hee Park, Jong-Hyock Park, Yeonjung Kim Osong Public Health and Research Perspectives.2015; 6(4): 224. CrossRef