However, the majority of somatic mutations detected in cancer are passenger ra ther than driver mutations. Identifying the functional consequences of somatic mutations during tumorigenesis and tumor progression remains a monumental challenge to cancer genomic studies. As of April 2014, approximately 100,000 three dimensional structures have been included in the Protein Data Bank database, including approximately 22,000 human protein and nucleic acid 3D structures. Protein structure and function are closely related, especially in the case of protein pockets, which are local regions that perform a variety of critical functions in cells, including binding with small molecules, enzymes, and nucleic acids. Thus, protein pockets are central, structural units in proteins that provide site specific information as to how a protein interacts with small molecules.
With an increasing amount of both protein structural data in the PDB database and somatic mutation data generated by next generation sequencing experiments, the in tegration of protein structural information and large scale somatic mutations offers an alternative, promising approach to uncovering functionally important somatic mutations in cancer. Several recent studies have dem onstrated that disease causing mutations commonly alter protein folding, protein stability, and protein protein interactions, often leading to new dis ease phenotypes. Espinosa et al. proposed a predictor, InCa that inte grates somatic mutation profiles from the Catalogue of Somatic Mutations in Cancer database and the neutral mutations from the 1000 Genomes project into protein structure and interaction interface informa tion.
Using these data, they developed the InCa classifier model to predict cancer related mutations with 83% speci ficity and 77% sensitivity. Ryslik et al. developed an approach, SpacePAC, to identify mutational clustering by directly consider ing the protein tertiary structure in 3D space. Utilizing the mutational data from the COSMIC and protein structure information from the PDB, they identified several novel mutation clusters using SpacePAC. Ghersi and Singh reported that residues located in nucleic acids, small mol ecules, ions, and peptide binding sites are more likely to be affected by somatic mutations than other residues. Furthermore, protein pocket regions play an important functional role in drug design and development through the ligand dependent mechanism that affects small mol ecule binding. For example, several independent re search groups found that the presence of mutations in the EGFR gene could activate the gene by altering the ATP bind ing site, Cilengitide ultimately leading to an enhancement of the ge fitinib response.