Evolutionary Computation for Big Data and Big Learning Workshop
|Instructions and data||Submit predictions||New Participant||Ranking|
The results of the competition were presented at the ECBDL'14 workshop within GECCO-2014 in Vancouver, July 13th, 2014
AimThe data mining competition of the Evolutionary Computation for Big Data and Big Learning workshop has the aim of assessing the state of the art in evolutionary computation methods for big data and big learning.
The main competition/exercice of the workshop, the Deployment-as-a-Service track provides a framework that enables large-scale data mining tasks to be distributed in cloud environments with minimal changes in the core machine learning methods. It allows us to perform a very fair comparison exercice because we can ensure that all methods are allocated uniform amount of resources. The framework controls the overall learning strategy, and the participants just provide the methods.
In contrast, the aim of this self-deployment track is to give total flexibility to the participants so that they can use any training strategy with their own resources. We just provide a large dataset (details below) and receive predictions from the participants.
DatasetThe dataset select for this competition comes from the Protein Structure Prediction field, and it was originally generated to train a predictor for the residue-residue contact prediction track of the CASP9 competiton. The dataset has 32 million instances, 631 attributes, 2 classes, 98% of negative examples and occupies, when uncompressed, about 56GB of disk space. The details of the dataset generation and a learning strategy used to train a method for this problem using evolutionary computation are available at http://bioinformatics.oxfordjournals.org/content/28/19/2441. The dataset is available in the ARFF format of the WEKA machine learning package.
EvaluationFor each prediction we will compute four metrics: true positive rate (TPR), true negative rate (TNR), accuracy, and the final score of TPR · TNR. We have chosen this final score because of the huge class imbalance of the dataset. We want to reward methods that try to predict well the minority class of the problem. During the workshop we will evaluate qualitatively the balance between the final scores and the amount of resources used in each predictor.