INTRODUCTION:

Introduction

    In the eukaryotic cells, protein phosphorylation is one of the most ubiquitous posttranslational modifications of proteins, orchestrating most of the cellular processes, including the cell cycle (Lou Y, et al., 2004), transcriptional (Uddin S, et al., 2003) and translational regulations (Yoshizawa F, et al., 2002), metabolic pathways (Meijer AJ, et al., 2004), signal transductions (Choudhary S, et al., 2004), and the memory (Dash PK, et al., 2004), etc. About 2% of the human and mouse proteomes encode protein kinases (PKs) with 518 and 540 distinct PKs determined in human (Manning G, et al., 2002) and mouse (Caenepeel S, et al., 2004) respectively, among which 510 are the reciprocal orthology pairs. It was estimated that one-third of all the proteins could be phosphorylated, and about half of kinome were disease- or cancer-related by chromosomal mapping (Manning G, et al., 2002). So it is in urgent need to identify the substrates accompanied with their phosphorylation sites in large-scale Phosphoproteome, which would help the drug design greatly. To date, several large-scale phosphoproteomics researches have been published for yeast (Ficarro SB, et al., 2002), mouse (Ballif BA, et al., 2004), human (Beausoleil SA, et al., 2004, Lim YP, et al., 2003) or plant (Nuhse TS, et al., 2004), etc.

    In silico prediction of phosphorylation sites with their specific kinases may help and alleviate the labor-intensive in vivo or in vitro identification of phosphorylation sites greatly. For two peptides with only one pair of different amino acids according to their positions, we may assume with confidence that they have similar 3D structures and biochemical characteristics, especially when the two different amino acids are a conserved pair, e.g. isoleucine (I) and valine (V), or serine (S) and threonine (T). Based on this observation, we design a simple scoring method, GPS (Group-based Phosphorylation Scoring method), with more meaningful information to biologists while possessing satisfying performance compared to two phosphorylation sites prediction systems, ScanSite 2.0 (Obenauer JC, et al., 2003), and PredPhospho (Kim JH, et al., 2004). With data from the public database Phospho.ELM/PhosphoBase (Diella F, et al., 2004, Kreegipuu A, et al., 1999)and extensive literature curation, our prediction was enlarged into 71 Protein Kinase (PK) families/PK groups (ver 1.10, and 52 PK groups in ver 1.0).