2Pirogov Russian National Research Medical University, 117997 Moscow, Russia
3Talrose Institute for Energy Problems of Chemical Physics, Russian Academy of Sciences, 119334 Moscow, Russia
4Moscow Institute of Physics and Technology (State University), 141701 Dolgoprudny, Moscow Region, Russia
* To whom correspondence should be addressed.
Received October 5, 2017; Revision received November 16, 2017
An important aim of proteogenomics, which combines data of high throughput nucleic acid and protein analysis, is to reliably identify single amino acid substitutions representing a main type of coding genome variants. Exact knowledge of deviations from the consensus genome can be utilized in several biomedical fields, such as studies of expression of mutated proteins in cancer, deciphering heterozygosity mechanisms, identification of neoantigens in anticancer vaccine production, search for RNA editing sites at the level of the proteome, etc. Generation of this new knowledge requires processing of large data arrays from high-resolution mass spectrometry, where information on single-point protein variation is often difficult to extract. Accordingly, a significant problem in proteogenomic analysis is the presence of high levels of false positive results for variant-containing peptides in the produced results. Here we review recently suggested approaches of high quality proteomics data processing that may provide more reliable identification of single amino acid substitutions, especially contrary to residue modifications occurring in vitro and in vivo. Optimized methods for assessment of false discovery rate save instrumental and computational time spent for validation of interesting findings of amino acid polymorphism by orthogonal methods.
KEY WORDS: proteogenomics, proteomics, mass spectrometry, single amino acid polymorphism, single nucleotide polymorphism