Analysis of Protein Sequence/Structure Similarity Relationships





Current analysis of protein sequence/structure relationships has focused on expected similarity relationships for structurally similar proteins. We present a general sequence/structure map that covers all combinations of similarity/dissimilarity relationships, and explore structural/functional relationships emerging from the map. In addition, we show that the empirical Chothia-Lesk sequence/structure relation can be derived based on the requirement of protein stability and the energetics of sequence and structural changes. To aid our analysis, we define four regions of similarity relationships in the map: expected/unexpected similarity (S and S?) and expected/unexpected dissimilarity (D and D?) relationships. In the expected (high sequence and structural) similarity region, we use a quantitative measure to show that the extent of relatedness among protein functional families is correlated with the fraction of shared fold classes. Our survey shows that proteins in the unexpected (low sequence but high structural) similarity region consist mainly of all-alpha and all-beta folds. Interestingly, sequence/structure relationships in the unexpected dissimilarity region (high sequence but low structural similarity) are well represented by proteins that can accommodate large structural changes due to: existence of flexibly linked regions; conformational changes induced by ligands; mutations in linker regions; conformational plasticity vital to protein function; and interactions between domains. Our analyses imply that the complexity of protein relationships requires consideration of four broad categories of sequence/structure similarity relationships; we suggest that protein energetics provide a basis for understanding these relationships,an area requiring further investigations.



Click to go back to the publication list