Machine Learning Technique for Identifying Biomarkers in Complex High-Dimensional Data

A molecular feature selection tool for identifying candidate biomarkers in “-omics” data

Stock photot of lab techs looking at screen together — ��˴�ƬLife, https://stock.adobe.com/uk/54161127, stock.adobe.com

Background

Complex high dimensional datasets produced through -omics profiling is challenging to analyze. In biomarker discovery studies, it is often desired to identify a small set of molecular features from large -omics datasets that can be used to detect conditions of interest. However, conventional univariable approaches to identify molecular features using differential expression, do not consider the interaction between features. These approaches are confounded by the sample size, leading to the poor and sometimes misleading selection of potential biomarkers. Therefore, effective alternative approaches are urgently needed to identify features of interest in -omics datasets for biomarker discovery.

Technology Overview

Queen’s researchers have developed a machine learning technique for identifying features of importance in high-dimensional data, which is implemented in a graphical user interface.

The developed Molecular Feature Selection Tool (MFeaST) ranks the features based on their ability to discriminate between conditions of interest. This is an ensemble-type feature selection tool that utilizes multiple univariable and multivariable, filter-, wrapper- and embedded-type feature selection techniques. The algorithm applies a greedy method to rank all available features based on the ensemble results of a diverse selection of methods and families of predictors. The final shortlist of selected features is a non-redundant list of the most variable predictors. By reducing the number of features, feature selection helps prevent overfitting during classification, reduces computational costs, and provides insight into the processes underlying the generated data.

MFeast has successfully been used to identify features of interest in many different -omics datasets including:

The results have shown that biomarkers selecting MFeaST can be used to build classification models that perform better than other models.

Benefits

The univariable differential expression approach cannot detect high dimensional interactions and linear patterns. The algorithm developed looks at data in higher dimensions and captures feature interactions and detects molecular patterns.
Feature selection in -omics studies use only one feature selection algorithm, which may work for some datasets and not others. Ensemble feature selection provides a more generalizable approach to identify features of importance.
This is the first feature selection approach that combines univariable and multivariable filter-, wrapper- and embedded-type feature selection algorithms
Allows to visualize features to show and assess their discriminatory abilities
Features selected using MFeaST can be used to build classification models for more generalizable and accurate prediction.
Can be applied on any high-dimensional dataset, molecular or otherwise

Applications

The method used in this research is suitable for identification of biomarkers in a range of biomedical and clinical studies. It is broadly applicable and can be used in biomedical or other research to reduce high-dimensional feature space to most valuable predictors.

Opportunity

Queen’s University is seeking companies interested in licensing, implementing and/or commercializing this technology.

Patents

US Provisional Patent Application No. 63/052,267
Canadian Patent Application No. 3,186,044

IP Status

Patent application submitted

Seeking

Development partner
Commercial partner
Licensing

Posted

May 31, 2023

��˴�Ƭ-��ۺ��-��޳��

Partnerships and Innovation | Vice-Principal Research