Abstract:
Molecular proÿling technologies monitor many thousands of transcripts, proteins, metabolites or other species concurrently in a biological sample of interest. Given such high-dimensional data for diierent types of samples, classiÿcation methods aim to assign specimens to known categories. Relevant feature identiÿcation methods seek to deÿne a subset of molecules that diierentiate the samples. This work describes LIKNON, a speciÿc implementation of a statistical approach for creating a classiÿer and identifying a small number of relevant features simultaneously. Given two-class data, LIKNON estimates a sparse linear classiÿer by exploiting the simple and well-known property that minimising an L1 norm (via linear programming) yields a sparse hyperplane. It performs well when used for retrospective analysis of three cancer biology proÿling data sets, (i) small, round, blue cell tumour transcript proÿles from tumour biopsies and cell lines, (ii) sporadic breast carcinoma transcript proÿles from patients with distant metastases ¡5 years and those with no distant metastases ¿5 years and (iii) serum sample protein proÿles from unaaected and ovarian cancer patients. Computationally, LIKNON is less demanding than the prevailing ÿlter-wrapper strategy; this approach generates many feature subsets and equates relevant features with the subset yielding a classiÿer with the lowest generalisation error. Biologically, the results suggest a role for the cellular microenvironment in innuencing disease outcome and its importance in developing clinical decision support systems. ?