- Application of Gap-Constraints Given Sequential Frequent Pattern Mining for Protein Function Prediction
-
Hyeon Ah Park, Taewook Kim, Meijing Li, Ho Sun Shon, Jeong Seok Park, Keun Ho Ryu
-
Osong Public Health Res Perspect. 2015;6(2):112-120. Published online April 30, 2015
-
DOI: https://doi.org/10.1016/j.phrp.2015.01.006
-
-
Abstract
PDF
- Objectives
Predicting protein function from the protein–protein interaction network is challenging due to its complexity and huge scale of protein interaction process along with inconsistent pattern. Previously proposed methods such as neighbor counting, network analysis, and graph pattern mining has predicted functions by calculating the rules and probability of patterns inside network. Although these methods have shown good prediction, difficulty still exists in searching several functions that are exceptional from simple rules and patterns as a result of not considering the inconsistent aspect of the interaction network. Methods
In this article, we propose a novel approach using the sequential pattern mining method with gap-constraints. To overcome the inconsistency problem, we suggest frequent functional patterns to include every possible functional sequence—including patterns for which search is limited by the structure of connection or level of neighborhood layer. We also constructed a tree-graph with the most crucial interaction information of the target protein, and generated candidate sets to assign by sequential pattern mining allowing gaps. Results
The parameters of pattern length, maximum gaps, and minimum support were given to find the best setting for the most accurate prediction. The highest accuracy rate was 0.972, which showed better results than the simple neighbor counting approach and link-based approach. Conclusion
The results comparison with other approaches has confirmed that the proposed approach could reach more function candidates that previous methods could not obtain.
- A New Direction of Cancer Classification: Positive Effect of Low-Ranking MicroRNAs
-
Feifei Li, Minghao Piao, Yongjun Piao, Meijing Li, Keun Ho Ryu
-
Osong Public Health Res Perspect. 2014;5(5):279-285. Published online October 31, 2014
-
DOI: https://doi.org/10.1016/j.phrp.2014.08.004
-
-
3,312
View
-
17
Download
-
7
Crossref
-
Abstract
PDF
- Objectives
Many studies based on microRNA (miRNA) expression profiles showed a new aspect of cancer classification. Because one characteristic of miRNA expression data is the high dimensionality, feature selection methods have been used to facilitate dimensionality reduction. The feature selection methods have one shortcoming thus far: they just consider the problem of where feature to class is 1:1 or n:1. However, because one miRNA may influence more than one type of cancer, human miRNA is considered to be ranked low in traditional feature selection methods and are removed most of the time. In view of the limitation of the miRNA number, low-ranking miRNAs are also important to cancer classification. Methods
We considered both high- and low-ranking features to cover all problems (1:1, n:1, 1:n, and m:n) in cancer classification. First, we used the correlation-based feature selection method to select the high-ranking miRNAs, and chose the support vector machine, Bayes network, decision tree, k-nearest-neighbor, and logistic classifier to construct cancer classification. Then, we chose Chi-square test, information gain, gain ratio, and Pearson's correlation feature selection methods to build the m:n feature subset, and used the selected miRNAs to determine cancer classification. Results
The low-ranking miRNA expression profiles achieved higher classification accuracy compared with just using high-ranking miRNAs in traditional feature selection methods. Conclusion
Our results demonstrate that the m:n feature subset made a positive impression of low-ranking miRNAs in cancer classification.
-
Citations
Citations to this article as recorded by
- Cancer hallmark analysis using semantic classification with enhanced topic modelling on biomedical literature
Supriya Gupta, Aakanksha Sharaff, Naresh Kumar Nagwani Multimedia Tools and Applications.2024; 83(31): 76429. CrossRef - Multi-Task Topic Analysis Framework for Hallmarks of Cancer with Weak Supervision
Erdenebileg Batbaatar, Van-Huy Pham, Keun Ho Ryu Applied Sciences.2020; 10(3): 834. CrossRef - Microarray cancer feature selection: Review, challenges and research directions
Moshood A. Hambali, Tinuke O. Oladele, Kayode S. Adewole International Journal of Cognitive Computing in En.2020; 1: 78. CrossRef - Identification of miRNA Biomarkers for Diverse Cancer Types Using Statistical Learning Methods at the Whole-Genome Scale
Jnanendra Prasad Sarkar, Indrajit Saha, Adrian Lancucki, Nimisha Ghosh, Michal Wlasnowolski, Grzegorz Bokota, Ashmita Dey, Piotr Lipinski, Dariusz Plewczynski Frontiers in Genetics.2020;[Epub] CrossRef - Class-Incremental Learning With Deep Generative Feature Replay for DNA Methylation-Based Cancer Classification
Erdenebileg Batbaatar, Kwang Ho Park, Tsatsral Amarbayasgalan, Khishigsuren Davagdorj, Lkhagvadorj Munkhdalai, Van-Huy Pham, Keun Ho Ryu IEEE Access.2020; 8: 210800. CrossRef - MicroRNA-449a enhances radiosensitivity by downregulation of c-Myc in prostate cancer cells
Aihong Mao, Qiuyue Zhao, Xin Zhou, Chao Sun, Jing Si, Rong Zhou, Lu Gan, Hong Zhang Scientific Reports.2016;[Epub] CrossRef - Comparing the normalization methods for the differential analysis of Illumina high-throughput RNA-Seq data
Peipei Li, Yongjun Piao, Ho Sun Shon, Keun Ho Ryu BMC Bioinformatics.2015;[Epub] CrossRef
- A Novel Approach for Predicting Disordered Regions in A Protein Sequence
-
Meijing Li, Seong Beom Cho, Keun Ho Ryu
-
Osong Public Health Res Perspect. 2014;5(4):211-218. Published online August 31, 2014
-
DOI: https://doi.org/10.1016/j.phrp.2014.06.006
-
-
3,388
View
-
22
Download
-
3
Crossref
-
Abstract
PDF
- Objectives
A number of published predictors are based on various algorithms and disordered protein sequence properties. Although many predictors have been published, the study of protein disordered region prediction is ongoing because different prediction methods can find different disordered regions in a protein sequence. Methods
Therefore we have used a new approach to find the more varying disordered regions for more efficient and accurate prediction of protein structures. In this study, we propose a novel approach called “emerging subsequence (ES) mining” without using the characteristics of the disordered protein. We first adapted the approach to generate emerging protein subsequences on public protein sequence data. Second, the disordered and ordered regions in a protein sequence were predicted by searching the generated emerging protein subsequence with a sliding window, which tends to overlap. Third, the scores of the overlapping regions were calculated based on support and growthrate values in both classes. Finally, the score of predicted regions in the target class were compared with the score of the source class, and the class having a higher score was selected. Results
In this experiment, disordered sequence data and ordered sequence data was extracted from DisProt 6.02 and PDB respectively and used as training data. The test data come from CASP 9 and CASP 10 where disordered and ordered regions are known. Conclusion
Comparing with several published predictors, the results of the experiment show higher accuracy rates than with other existing methods.
-
Citations
Citations to this article as recorded by
- Prediction of interface between regions of varying degrees of order or disorderness in intrinsically disordered proteins from dihedral angles
Babli Sharma, Venkata Satish Kumar Mattaparthi Journal of Biomolecular Structure and Dynamics.2023; : 1. CrossRef - Cell Wall Anchoring of the Campylobacter Antigens to Lactococcus lactis
Patrycja A. Kobierecka, Barbara Olech, Monika Książek, Katarzyna Derlatka, Iwona Adamska, Paweł M. Majewski, Elżbieta K. Jagusztyn-Krynicka, Agnieszka K. Wyszyńska Frontiers in Microbiology.2016;[Epub] CrossRef - Comparing the normalization methods for the differential analysis of Illumina high-throughput RNA-Seq data
Peipei Li, Yongjun Piao, Ho Sun Shon, Keun Ho Ryu BMC Bioinformatics.2015;[Epub] CrossRef
|