A molecular feature selection tool for identifying candidate biomarkers in “-omics” data
Background
Complex high dimensional datasets produced through -omics profiling is challenging to analyze. In biomarker discovery studies, it is often desired to identify a small set of molecular features from large -omics datasets that can be used to detect conditions of interest. However, conventional univariable approaches to identify molecular features using differential expression, do not consider the interaction between features. These approaches are confounded by the sample size, leading to the poor and sometimes misleading selection of potential biomarkers. Therefore, effective alternative approaches are urgently needed to identify features of interest in -omics datasets for biomarker discovery.
Technology Overview
Queen’s researchers have developed a machine learning technique for identifying features of importance in high-dimensional data, which is implemented in a graphical user interface.
The developed Molecular Feature Selection Tool (MFeaST) ranks the features based on their ability to discriminate between conditions of interest. This is an ensemble-type feature selection tool that utilizes multiple univariable and multivariable, filter-, wrapper- and embedded-type feature selection techniques. The algorithm applies a greedy method to rank all available features based on the ensemble results of a diverse selection of methods and families of predictors. The final shortlist of selected features is a non-redundant list of the most variable predictors. By reducing the number of features, feature selection helps prevent overfitting during classification, reduces computational costs, and provides insight into the processes underlying the generated data.
MFeast has successfully been used to identify features of interest in many different -omics datasets including:
The results have shown that biomarkers selecting MFeaST can be used to build classification models that perform better than other models.
Benefits
- The univariable differential expression approach cannot detect high dimensional interactions and linear patterns. The algorithm developed looks at data in higher dimensions and captures feature interactions and detects molecular patterns.
- Feature selection in -omics studies use only one feature selection algorithm, which may work for some datasets and not others. Ensemble feature selection provides a more generalizable approach to identify features of importance.
- This is the first feature selection approach that combines univariable and multivariable filter-, wrapper- and embedded-type feature selection algorithms
- Allows to visualize features to show and assess their discriminatory abilities
- Features selected using MFeaST can be used to build classification models for more generalizable and accurate prediction.
- Can be applied on any high-dimensional dataset, molecular or otherwise
Applications
The method used in this research is suitable for identification of biomarkers in a range of biomedical and clinical studies. It is broadly applicable and can be used in biomedical or other research to reduce high-dimensional feature space to most valuable predictors.
Opportunity
Queen’s University is seeking companies interested in licensing, implementing and/or commercializing this technology.
Patents
- US Provisional Patent Application No. 63/052,267
- Canadian Patent Application No. 3,186,044
IP Status
Patent application submitted
Seeking
- Development partner
- Commercial partner
- Licensing
Posted
May 31, 2023