Nucleos

Background

Nucleotides are ubiquitous and key molecules involved in key cellular processes. They were one of the earliest cofactor bound by protein during evolution; therefore the study of the interaction between protein structures and nucleotides has great biological significance.
Nucleotides can be bound in a large number of different conformations by protein so that their binding site is not always easily detectable. Nucleos is a webserver that identifies nucleotide-binding sites in protein structures by identifying binding sites for nucleotide modules and combinations of these in order to form complete nucleotide-binding sites.
The method (Parca et al. 2012) is based on the concept of nucleotide modularity that describes nucleotides as composed of modules: the nucleobase, the carbohydrate and the phosphate. Also the corresponding binding sites can be considered as composed of binding modules (Gherardini et al. 2010, go to References for more information)

Webserver pipeline

The webserver input is represented by one of multiple PDB codes and/or one or multiple user-uploaded structures as specified in the Usage page. Quality controls on the input are applied in order for Nucleos to go on with the analysis.

A structural comparison algorithm is used to compare the query structure with three datasets of template binding sites for nucleotide modules (the nucleobase, the carbohydrate and the phosphate).

Whenever a structural similarity is found the nucleotide module bound by the template binding site is tranferred onto the query structure. Structural matches between amino acids are allowed if the Root Mean Square Deviation of the match is equal or lower than 0.6Å and if matching residues share a BLOSUM62 matrix score >= 1.

Binding sites identified inside the protein surface, or outside but too close to it, are discarded since it is unlikely that they represent real binding sites.

Binding sites for the same nucleotide module are clustered with a hierarchical clustering procedure (centroid-linkage and 2Å of threshold).

Each binding site is assigned a score that is the sum of two terms: a clustering score and a conservation score. The clustering score is the number of predictions in the cluster of a specific prediction. The conservation score is the average conservation score of the residue involved in the structural match. The conservation score of each amino acid is calculated from the PFAM multiple alignments of the domains contained in the query structure. This score is the percentage of chemically similar residues in a specific column of the alignment and is normalized taking into account the distribution of conservation scores in the domain so that conservation scores from different domains and proteins can be compared.

Finally nucleotide modules are combined to build complete nucleotide-binding sites of four different types (AMP-like, ADP-like, ATP-like and NAD-like). Nucleotide modules are joined if they respect distance thresholds empirically derived from distribution of distances between nucleotide modules in X-ray structures of nucleotide-protein complexes. First, Nucleos tries to build nucleobase-carbohydrate, carbohydrate-phosphate and phosphate-phosphate combinations. Subsequentially these pairs are iteratively extended in order to build the most complete binding site of a given type; if a complete reconstruction is not possible Nucleos tries to build the biggest sub-architecture for that nucleotide type (ex. a predicted ADP is a sub-architecture of an ATP).

Module pair	Minimum distance (Å)	Maximum distance (Å)
Nucleobase-Carbohydrate	3.937	5.275
Phosphate-Carbohydrate	3.079	5.151
Nucleobase-Phosphate	5.189	19.063
Nucleobase-Nucleobase	8.846	19.540
Carbohydrate-Carbohydrate	5.528	16.333
Phosphate-Phosphate	2.654	7.148

Each predicted nucleotide-binding sites has a score that is the sum of the scores of its constituent nucleotide modules. In Parca et al. (2012) we demonstrate that, when considering pairs of nucleotide modules, a scoring threshold of 166.33 discriminates nucleotide-binding proteins from proteins not binding nucleotides with a MCC (Matthews Correlation Coefficient) of 0.6, with an average sensitivity of 0.64 and specificity of 0.93.

Overview