Residue-wise structural aspects
Predicted structural aspects
- The definition of cCN is available here
- The definition of DT, GG, RNG, MST and dCN is available here
- The definition of SA and RCH is available here
A public service to compute all these structural aspect from a given protein structure is under development and will be available soon.
Scripts to compute these metrics from a PDB file are available
here
All structural aspects are originally defined as integer/continuous variables. We are predicting them as discrete variables
by splitting their domain into 2, 3 and 5 states, using a uniform-frequency discretisation scheme applied over our training set.
The cut-points for each feature are available
here.
Prediction representation
All predictors share the same representation (adapted from the representation of our initial
RCH predictor). We characterise each residue for which we are predicting one or more
of its structural aspects as:
- The position-specific scoring matrix (PSSM) profiles of a window of ±4 residues around the target. The PSSM profiles
are generated using PSI-BLAST and the
NR database.
- The predicted secondary structure of a window of ±4 residues around the target. Predictions were done using
the standalone version of psipred using the NR database
- The prediction of the protein-wise average value for the feature being predicted (more details below).
In total 190 variables were used to characterise each residue in our representation.
The protein-wise average structural aspect value is predicted, for each protein chain, from its sequence length and the column-wise
average of the PSSM profiles of the residues in the chain (as suggested by
Kinjo et al.).
Training process
The list of protein chains used to train the current versions of the predictors is available
here. The training set files (using the
ARFF format of the
WEKA package) for all structural aspects are available
here (292 MB)
The predictors were trained using the
BioHEL rule-based machine learning
system. BioHEL's prediction were enhanced using the ensemble mechanisms for ordinal and consensus prediction studied
here.
Format of the prediction results
The predictions provided by the system will be annotated using colours assigned to each predicted state. Red colour means high density
of contacts/buried residue. Blue means low density of contacts/exposed residue. Whenever two or more different number of states are
predicted for a given structural aspect, the consensus between them is also provided. The overall consensus across structural features
is also computed. At any time the user can show/hide any of the predicted structural aspects or the consensus predictions. All predictions
can also be downloaded in text format.
Contact Map predictor
The contact map predictor follows the
CASP rules and
file format. The description of our Contact Map predictor is available
here.
The output of the contact map predictor are the list of predicted contacts (following the CASP format) and a visual representation
of the predicted contact map, where the colour of the predicted contacts indicates the confidence of the prediction (red = high
confidence, green = low confidence).