Residue-wise structural aspects

Predicted structural aspects

A public service to compute all these structural aspect from a given protein structure is under development and will be available soon. Scripts to compute these metrics from a PDB file are available here

All structural aspects are originally defined as integer/continuous variables. We are predicting them as discrete variables by splitting their domain into 2, 3 and 5 states, using a uniform-frequency discretisation scheme applied over our training set. The cut-points for each feature are available here.

Prediction representation

All predictors share the same representation (adapted from the representation of our initial RCH predictor). We characterise each residue for which we are predicting one or more of its structural aspects as: In total 190 variables were used to characterise each residue in our representation.

The protein-wise average structural aspect value is predicted, for each protein chain, from its sequence length and the column-wise average of the PSSM profiles of the residues in the chain (as suggested by Kinjo et al.).

Training process

The list of protein chains used to train the current versions of the predictors is available here. The training set files (using the ARFF format of the WEKA package) for all structural aspects are available here (292 MB)

The predictors were trained using the BioHEL rule-based machine learning system. BioHEL's prediction were enhanced using the ensemble mechanisms for ordinal and consensus prediction studied here.

Format of the prediction results

The predictions provided by the system will be annotated using colours assigned to each predicted state. Red colour means high density of contacts/buried residue. Blue means low density of contacts/exposed residue. Whenever two or more different number of states are predicted for a given structural aspect, the consensus between them is also provided. The overall consensus across structural features is also computed. At any time the user can show/hide any of the predicted structural aspects or the consensus predictions. All predictions can also be downloaded in text format.

Contact Map predictor

The contact map predictor follows the CASP rules and file format. The description of our Contact Map predictor is available here. The output of the contact map predictor are the list of predicted contacts (following the CASP format) and a visual representation of the predicted contact map, where the colour of the predicted contacts indicates the confidence of the prediction (red = high confidence, green = low confidence).