Recent efforts in tool development has led to the emergence of multiple prediction programs in various life sciences problem domains. Meta-prediction seeks to harness the combined strengths of multiple predicting programs with the hope of achieving predicting performance surpassing that of all existing predictors in a defined problem domain. We investigated meta-prediction strategies for the eukaryotic protein subcellular localization problem. We compiled an unbiased protein subcelluar localization datasets consisting of nearly 1,700 nuclear, cytoplasmic, mitochondrial and extracellular proteins. Then, using this dataset, we assessed the predicting performance of 12 predictors from 8 independent subcellular localization predicting programs, and determined that Proteome Analyst overall offered the most accurate predictions in this 4-compartment prediction problem. Subsequently, we explored several voting based strategies in constructing meta-predictors, and showed that a reduced voting strategy yields a meta-predictor (RAW-RAG-6) with a remarkable predicting performance that substantially exceeds that of all existing subcellular localization predictors.
Questions and comments please direct to meta_pred@biolead.org.
|