Predicting survival from gene expression data by generalized partial least squares regression
© BioMed Central 2005
Published: 17 June 2005
There is considerable interest in linking microarray-based gene expression profiles to clinical endpoint variables such as survival. Standard statistical methodologies typically fail when the number of covariates (genes) far exceeds the number of samples (patients). For example, the standard Cox proportional hazards model cannot be directly applied to data of this form. Several methods have been proposed for dealing with this problem in Cox regression, including partial least squares regression (PLS) . Nguyen and Rocke  proposed first applying PLS in order to derive a small set of covariates, and then performing proportional hazards regression on the reduced set of covariates. In their approach, however, PLS is applied to survival times without taking into consideration the fact that the latter may be censored. A further problem with their approach is that the PLS step of their procedure is based on the assumption of a Gaussian (normal) likelihood.
Here, we propose a novel method for combining Cox proportional hazards regression and PLS. This method is a direct generalization of PLS to arbitrary likelihoods, whereas the original PLS method (including that used by Nguyen and Rocke) is designed for Gaussian likelihoods only. Furthermore, in our method PLS is directly integrated with the optimization of the Cox partial likelihood. Specifically, we propose to utilize the equivalence between PLS and a modification of the well-known numerical optimization method called the conjugate gradients (CG) algorithm: applying the modified CG algorithm to a Gaussian likelihood yields PLS. We propose instead to apply the modified CG algorithm to the Cox partial likelihood, hence directly generalizing the PLS algorithm to the Cox likelihood. Our method will take into account the censoring of the outputs, as only the original data will be used during the estimation. Our method also easily generalizes to other likelihoods than the Cox proportional hazards likelihood.
We present results from the use of these methods for a dataset containing gene expression data and survival outcome from patients with breast cancer published by Sørlie and colleagues .
We have presented a method for generalizing PLS that utilizes the equivalence between PLS and the well-known conjugate gradients method. We have applied this method to a Cox partial likelihood to predict survival outcome for patients based on gene expression data. The generalized PLS method presented could easily be applied to other likelihoods as well.
- Lingjærde OC, Christophersen N: Shrinkage structures of partial least squares. Scand J Stat. 2000, 27: 459-473. 10.1111/1467-9469.00201.View ArticleGoogle Scholar
- Nguyen DV, Rocke DM: Partial least squares proportional hazards regression for application to DNA microarray survival data. Bioinformatics. 2002, 18: 1625-1632. 10.1093/bioinformatics/18.12.1625.View ArticlePubMedGoogle Scholar
- Sørlie T, Perou CM, Tibshirani R, Aas T, Geisler S, Johansen H, Hastie T, Eisen MB, van de Rijn M, Jeffrey SS, et al: Gene expression patterns of breast carcinomas distinguish tumor subclasses with clinical implications. Proc Natl Acad Sci USA. 2001, 98: 10869-10874. 10.1073/pnas.191367098.View ArticlePubMedPubMed CentralGoogle Scholar