2015-September-24: GRridge: Using co-data to improve clinical prediction with -omics variables

by Prof. Mark van de Wiel (VU/VUmc Department Biostatistics and Epidemiology)

11:00-11:50, room CCA 1.06

For many -omics studies, additional information on the features, like genomic annotation, external p-values, or correlation to another type of genomic variable, is available. In the context of binary (case-control), survival and continuous prediction (or classification), we introduce a method which makes structural use of such additional information, termed ‘co-data’. The co-data is used to a priori group the variables. Then, the method estimates group-specific penalties (or weights), which may lead to improved prediction performance. The method has several nice properties: i) it adapts to the informativeness of the co-data for the data at hand; ii) it can deal with multiple sources of co-data; iii) it is fast.

We show that the group-specific weights may facilitate post-hoc feature selection. The method, termed GRridge, is implemented in an easy-to-use R-package. It is demonstrated on two cancer genomics studies, which both concern the discrimination of precancerous cervical lesions from normal cervical tissues using methylation microarray data. For the first study, we use genomic annotation concerning the type of genomic region in which the probe is located (e.g. a CpG-island) to define the groups. For the second study, which concerns clinically relevant but impure samples, we use the p-values of the first study as a basis for the groups. For both examples, GRridge clearly improves the predictive performance of well-known alternatives. In addition, we show that for the second study the relatively good predictive performance is maintained when selecting only 42 probes.