Bioinformatics Tools for Data Processing and Prediction of Protein Function

Green Arther Sandag; Semmy Wellem Taju

doi:10.31154/cogito.v4i2.137.305-315

Authors

Green Arther Sandag Universitas Klabat
Semmy Wellem Taju

DOI:

https://doi.org/10.31154/cogito.v4i2.137.305-315

Abstract

Bioinformatika semakin populer karena kemampuannya untuk menganalisis dan memproses data biologis dengan cepat dan efektif. Bagian penting dari bioinformatika adalah untuk mengidentifikasi fungsi dan karakteristik protein dengan membangun metode prediksi menggunakan algoritma pembelajaran mesin. Ini termasuk bagaimana pembelajaran mesin dapat digunakan untuk menganalisis dan mengklasifikasikan fungsi protein yang cocok untuk digunakan sebagai deteksi penyakit, merancang perawatan medis yang tepat untuk pasien, dan mengembangkan obat untuk beberapa penyakit. Permintaan untuk pembuatan predictive tools dalam menentukan model protein-ligand dan fungsi protein meningkat untuk mempromosikan penelitian biologi dalam lingkungan desain obat yang inovatif. Namun, dibutuhkan banyak waktu dan upaya untuk mengembangkan alat prediksi yang dapat diterapkan pada protein. Dalam penelitian ini kami mengembangkan tools bioinformatika yang dapat secara otomatis mengembalikan data protein dalam bentuk komposisi asam amino (AAC), komposisi pasangan dipeptida (DPC), dan matriks penentuan spesifikasi posisi (PSSM). Data protein, telah kita ambil dari database uniprot yang berisi file fasta. Penelitian ini, kami membuat alat untuk memfasilitasi ilmuwan dalam memproses atau menganalisis data protein dan juga dapat memprediksi fungsi protein menggunakan algoritma pembelajaran mesin seperti Neural Network dan Random Forest. Kata Kunci—Bionformatika, AAC, DPC, PSSM

References

M. Pop, and S.L. Salzberg, Bioinformatics challenges of new sequencing technology. Trends in Genetics, 2008. 24(3): p. 142-149.

Martí-Renom, M.A., et al., "Comparative protein structure modeling of genes and genomes," Annual review of biophysics and biomolecular structure, 2000. 29(1): p. 291-325.

M. B. Eisen , et al., "Cluster analysis and display of genome-wide expression patterns," Proceedings of the National Academy of Sciences, 1998. 95(25): p. 14863-14868.

R. Wernersson, and A.G. Pedersen, "RevTrans: multiple alignment of coding DNA from aligned amino acid sequences," Nucleic acids research, 2003. 31(13): p. 3537-3539.

L. Holm, and C. Sander, "Protein structure comparison by alignment of distance matrices" Journal of molecular biology, 1993. 233(1): p. 123-138.

D. T. Jones, "Protein secondary structure prediction based on position-specific scoring matrices," Journal of molecular biology, 1999. 292(2): p. 195-202.

S. Li, D.K. Pearl, and H. Doss, "Phylogenetic tree construction using Markov chain Monte Carlo," Journal of the American Statistical Association, 2000. 95(450): p. 493-508.

D. T. -H. Chang, et al., "Prediction of protein secondary structures with a novel kernel density estimation based classifier," BMC research notes, 2008. 1(1): p. 51.

M. Zuker, D.H. Mathews, and D.H. Turner, "Algorithms and thermodynamics for RNA secondary structure prediction: a practical guide, in RNA biochemistry and biotechnology," 1999, Springer. p. 11-43.

H. N. Chua, W.-K. Sung, and L. Wong, "Exploiting indirect neighbours and topological weight to predict protein function from protein–protein interactions," Bioinformatics, 2006. 22(13): p. 1623-1630.

M. P. Brown, et al., "Knowledge-based analysis of microarray gene expression data by using support vector machines," Proceedings of the National Academy of Sciences, 2000. 97(1): p. 262-267.

H. S. Bilofsky, and B. Christian, The GenBank® genetic sequence data bank. Nucleic acids research, 1988. 16(5): p. 1861-1863.

U. Consortium, Reorganizing the protein space at the Universal Protein Resource (UniProt). Nucleic acids research, 2011: p. gkr981.

J. H. Moore, F.W. Asselbergs, and S.M. Williams, "Bioinformatics challenges for genome-wide association studies," Bioinformatics, 2010. 26(4): p. 445-455.

S. Kotsiantis, "Feature selection for machine learning classification problems: a recent overview," Artificial Intelligence Review, 2011: p. 1-20.

A. L. Swan, et al., "Application of machine learning to proteomics data: classification and biomarker identification in postgenomics biology," Omics: a journal of integrative biology, 2013. 17(12): p. 595-610.

J. Yang, et al., Image super-resolution via sparse representation, IEEE transactions on image processing, 2010. 19(11): p. 2861-2873.

M. Kantardzic, Data mining: concepts, models, methods, and algorithms. 2011: John Wiley & Sons.

Y.-Y. Ou, QuickRBF: an efficient RBFN package. software available at : http://csie/.org/~ yien/quickrbf/quickstart. php, 2005.

Z. R. Yang, and R. Thomson, "Bio-basis function neural network for prediction of protease cleavage sites in proteins," IEEE Transactions on Neural Networks, 2005. 16(1): p. 263-274.

G.-Z. Zhang, and D.-S. Huang, "Prediction of inter-residue contacts map based on genetic algorithm optimized radial basis function neural network and binary input encoding scheme," Journal of computer-aided molecular design, 2004. 18(12): p. 797-810.

C.-T. Su, C.-Y. Chen, and Y.-Y. Ou, "Protein disorder prediction by condensed PSSM considering propensity for order or disorder," Bmc Bioinformatics, 2006. 7(1): p. 319.

Y.-Y. Ou, et al., "TMBETADISC-RBF: discrimination of-barrel membrane proteins using RBF networks and PSSM profiles," Computational biology and chemistry, 2008. 32(3): p. 227-231.

N.Q.K. Le, G. A. Sandag, and Y.-Y. Ou. "Incorporating post translational modification information for enhancing the predictive performance of membrane transport proteins," Computational biology and chemistry 77 (2018): 251-260.

L. Breiman, Random forests. Machine learning, 2001. 45(1): p. 5-32.

S.-A. Chen, et al., "Prediction of transporter targets using efficient RBF networks with PSSM profiles and biochemical properties," Bioinformatics, 2011. 27(15): p. 2062-2067.

Y.-W. Chen, and C.-J. Lin, Combining SVMs with various feature selection strategies, in Feature extraction. 2006, Springer. p. 315-324.

G. Zhang, et al., "Artificial neural networks in bankruptcy prediction: General framework and cross-validation analysis," European journal of operational research, 1999. 116(1): p. 16-32.