Outcome signature genes in breast cancer: is there a unique set?

MOTIVATION
Predicting the metastatic potential of primary malignant tissues has direct bearing on the choice of therapy. Several microarray studies yielded gene sets whose expression profiles successfully predicted survival. Nevertheless, the overlap between these gene sets is almost zero. Such small overlaps were observed also in other complex diseases, and the variables that could account for the differences had evoked a wide interest. One of the main open questions in this context is whether the disparity can be attributed only to trivial reasons such as different technologies, different patients and different types of analyses.


RESULTS
To answer this question, we concentrated on a single breast cancer dataset, and analyzed it by a single method, the one which was used by van't Veer et al. to produce a set of outcome-predictive genes. We showed that, in fact, the resulting set of genes is not unique; it is strongly influenced by the subset of patients used for gene selection. Many equally predictive lists could have been produced from the same analysis. Three main properties of the data explain this sensitivity: (1) many genes are correlated with survival; (2) the differences between these correlations are small; (3) the correlations fluctuate strongly when measured over different subsets of patients. A possible biological explanation for these properties is discussed.


CONTACT
eytan.domany@weizmann.ac.il


SUPPLEMENTARY INFORMATION
http://www.weizmann.ac.il/physics/complex/compphys/downloads/liate/


INTRODUC
Several attem Tibshirani, 20 Rosenwald et 2003; Sorlie profiling.Sor clustering, to assign breast carcinoma tissues to one of five different subtypes, each with a distinctive expression profile.Robustness of these survival related subclasses was demonstrated (Sorlie et al., 2003) by applying the same analysis procedure to two independent breast carcinoma data sets (van 't Veer et al., 2002;West et al., 2001).van't Veer et al.(van 't Veer et al., 2002) applied a supervised approach to identify a gene-expression signature, based on 70 genes, capable of predicting a short interval to development of distant metastases.First, they randomly selected a set of 78 patients, a training set, which was used to measure the correlation between each gene's of Physics of Complex Systems,Weizmann Institute of Science, Rehovo of Molecular Cell Biology, Weizmann Institute of Science, Rehovot Predicting the metastatic potential of primary malignant tissues has on choice of therapy.Several microarray studies yielded gene sets sion profiles successfully predicted survival (Ramaswamy et al., 2003;001;van 't Veer et al., 2002).Nevertheless, the overlap between these lmost zero.Such small overlaps were observed also in other complex for the differences had evoked a wide interest.One of the main open this context is whether the disparity can be attributed only to trivial as different technologies, different patients and different types of nswer this question we concentrated on one single breast cancer datase it by one single method, the one which was used by van't Veer et al.(van 002) to produce a set of outcome predictive genes.We showed that in ing set of genes is not unique; it is strongly influenced by the subse the sam ) many genes are c ons are small; (c) the correlations fluctuate strongly when measured subsets of patients.A possible biological explanation for these iscussed.

TION
pts were made to predict survival of cancer patients in general (Bair and 04;Beer et al., 2002;Khan et al., 2001;Nguyen and Rocke, 2002;al., 2002), and of breast cancer patients in particular (Ramaswamy et al., et al., 2001;van 't Veer et al., 2002) on the basis of gene expression lie et al. (Sorlie et al., 2001)

Public data-s
The data (van tumors, from size, and N0 ( es).34 of the 96 sporadic patients were treated by modified and 62 underwent breast-conserving treatment, including axi ratios were m cRNA from a genes(Hughes

Preprocessin
The full expr columns (sam ria, based on the entire set of 117 samples, yielding 5852 genes that exhibited two-fold change of expression with a pvalue less tha samples (van't Veer et al applied the same filtering criteria on 98 genes).We di than 20% mi r analysis).Lik nts free of BRCA1/2 germ line mutations.

Correlation analysis
For each gene we test the null hypothesis that its gene expression profile is uncorrelated with the survival vector (over all 96 samples).We randomly permuted the survival vector (10 5 times) and calculated the correlation of the expression of each gene with the randomized survival vector.The p-value is the fraction of times one gets an absolute correlation larger or equal to the absolute correlation of the un-d disease outcome.The genes were ranked according to this correlation, ost correlated genes were used to construct a classifier discriminating nts with good and poor prognosis.The remaining 19 patients served as validate their prognosis classifier.A follow-up study (van de Vijver et ved the efficiency of this classifier as survival predictor on a large set of ecimens.In a third study, Ramaswamy et al.(Ramaswamy et al., 2003) t of 128 genes separating metastases from primary tumors.A refined set ere found to distinguish successfully patients with good versus poor e success of these studies was frustrated by the fact that the sets of ed genes identified by these three studies had only a few genes in y 17 genes appeared in both the list of 456 genes of Sorlie et al

Measuring th
We assumed that the degree of the polynomial fit for the average STD curve (see Fig. 5) is the degre lynomial fit to the STD curve of each individual gene.Using this assumption, we found the polynomial fit to the STD curve of each gene in the data, and used it to estimate their STD values in a sample size of 77.

RESULTS
. Correction for multiple comparisons was performed using the False te (FDR) method.Bounding the expected false discovery rate by 10% f 1234 genes for which the null hypothesis can be rejected.Histogr on (measured for 5852 genes) with the true surviva i data into ten different divisions of 77/19 ow different experiments of 77 samples, influence the composition of correlated genes with survival, we used the bootstrapping method 993).Bootstrapping is a computer simulation enabling to overcome fects.It assumes that the sample is a good approximation of the y generating a large number of new samples from the original sample estimate the statistics parameters of the population.To keep the ognosis ratio of the original training set (33/44) we divided the 96 etitions a random set of 33 samples from the poor pro prognosis.We repeated this procedure ten times and found the top 70 'training set' composition.all blue are far).The genes create an elongated structure at an angle <π/2 with s, implying that a large number of genes exhibit non-vanishing correlations with survival.

Many genes a
This lack of sample prepar to genuine di these sources al. , 2002).Th outcome is r representing d 0 representing expression ve vector (s) and We chose to characterized The 5852 gen equator.If su lies on the pl correlation of geometrical m he expression vectors of very many genes (1234 -at an FDR of 10%, see Methods) are related to survival.According to our model, if the experiment is repeated on a different group of patients (with the same clinical characteristics), the overall appearance of the new "globe" will be quite similar, but the positions of individual genes will swarm around.This swarming suffices to change drastically the relative ranking of the genes on the basis of their correlation with survival.e histogram of the genes' correlation with the real survival vector to the vertical s axis -red curve), and with a random permutation of the r (blue curve).B. Globe of genes in the ``world" spanned by the rvival (s), BUB1 (b) and ESR1 (e).The survival is located at the north-UB1 (chosen from a large cluster of genes characterized by negative ith survival) and ESR1 (chosen from a large cluster of genes by positive correlation with survival) are on the sphere's surface and locations are determined by their angles with survival and with each er (normalized) genes are represented by spots whose size and color lose the gene is to the surface (large red spots are close and sm c re related to survival agreement can be attributed to different chips, different methods of ation, mRNA extraction and analysis of the data and, most importantlyfferences between the patients (tumor grade, stage etc).To eliminate of variation, we focused on data from a single experiment(van 't Veer et e data consist of 96 samples and 5852 genes (see Methods).Disease epresented by a survival vector s, of 96 binary components, with 1 good prognosis (metastasis free time interval >5 years), an poor prognosis (<5 years).The projection of the 96-dimensional ctor of each gene onto a 3-dimensional space (spanned by the survival the expression vectors of ESR1 (e) and BUB1 (b)) is shown in Fig. 1B.use ESR1 and BUB1 as representative genes of two large clusters by positive and negative correlation with survival, respectively.es comprise an oblate spheroid shaped cloud, tilted with respect to the rvival is replaced by a random binary vector, the oblate spheroid cloud ane of the equator.Since the vertical component of each gene is the its expression with survival (see Fig. 1A), this difference is a striking anifestation of the fact that t erage performance of series of classifiers generated by consecutive sets

Many sets of
This data-set correlated wit and third, the set (shown lat to others in predicting disease outcome.To test this hypothesis, we selected the same 77 patients (out of 78, see Methods, and van 't Veer et al., 2002) and ranked all genes according to their correlation with survival.We used the 5852 genes to build a series of classifiers (following the method used by van 't Veer et al. location of these 7 sets on the globe and their predicting performance is 9 and Fig. 11, respectively (see supplementary inform .To ensure tha and test sets s for 1000 diff samples).Eac of classifiers measured for each classifier.Note, that when repeating this procedure for a randomized survival vector, the training error curve fluctuates around 37.5 mistakes (50% rate of errors) while the test error fluctuates around 9.5 mistakes, independent on the genes' rank.The results shown in Fig. 2 imply that indeed, for each of the training sets, classifiers based on very low ranked genes are capable of predicting survival with quality similar to the high ranking ones.To give a quantitative meaning to this claim we generated the histogram presented in the inset of Fig. 2, which shows that more than 70% of the 1000 training sets produced at least one classifier with the assifiers as obtained from classifying all 96 samples.Upper curves probability of remaining free of metastasis in the group of samples having good prognosis signature, while the lower curves describe the s group.(Bertucci et al., 2000;Guerin et al., 1990) which is positively correlated with outcome.None of the genes listed above is ranked among the top 70!Note that as opposed to claims made in (Gruvberger et al., 2003), the success of the classifier is not due to the correlation of outcome to ER status.Creating a data set which lacks this correlation, our 7 classifiers, as well as van't Veer et al.'s, kept their prognostic capabilities (see supplementary information).r) performance as the one based on its own top 70 genes.The average h classifiers is 4. The surprising summary of these observations is that "top 70 genes" of highest correlation with survival depends strongly on t of (77) patients on which the correlation was measured and (b) even anked genes with as good a prognostic performance as that of the top ply that although the top 70 genes may provide good prediction, other genes may do the same.Hence, these 70 genes cannot be considered as didates for targeting anti-cancer treatment.Such candidates should be the much longer list of genes related to survival, as demonstra of cancer-related genes, present in the 7 classifiers mentioned above.al of these genes, and indicate next to each one its correlation rank (in measured on the training set selected by van 't Veer et al. elation with survival: IL-6 (rank=502) is anti-apoptotic, and therefore r survival (Lotem et al., 2003); CDC25B (402) (Nilsson and Hoffmann,(297) (Urbanowicz-Kachnowicz et al., 1999), CDC2 (229) (Winters et CDC20 (341) (Singhal et al., 2003) are known to function in cell cycle DNA replication; oncogenes NRAS (260) (Boon et al., 2003), EZH2 lly et al., 2002).lation with survival may be caused by some indirect relation to tumor ting survival through indirect mechanisms like immunity, apoptosis or oncogenes.Examples: BIN1/AMPH2 (477) by binding to MYC e a tumor suppressor (Sakamuro et al., 1996); BIK (342) is prot al., 2003) via binding to BCL2 (1106) (Li et al., 2003).The positive FLT3 ( 220) is due to its strong effect on dendritic cells and T-cells to umor immunity (Ciavarra et al., 2003).BRAK ( 237) is highly expressed tissues but low in malignant cells (Hromas et al., 1999); IGFBP4 ( 225) osis (Byron and Yee, 2003;Zhou et al., 2003).Expression of GATA3 ly correlated with ER status (Bertucci et al., 2000).Similarly, MYB ositively correlated with breast cancer outcome s

A gene's rank
Say we measu sample of N characteristics these statistic from high in o fluctuations o composition o selected differ the overall go that have the highest correlation with survival.The significant variation of the membership of the top 70 genes is clearly shown in Fig. 10 of the Supplementary Information.Note that every pair of these training sets has at least 58 samples in common, which significantly reduces the fluctuations of r and variation of the genes' ranks.In spite of this, the average overlap between two such gene groups is only 33.7/70.To better estimate the "true" fluctuations of r for independent subgroups of 77 we used bootstrapping (Tibshirani, 1993), drawing subgroups from the 96 samples with repeats (see Methods).This reduces the expected overlap of two top-70-gene en another training-set is used.The two rightmost columns (columns 11 the those of the 70 genes published by van 't Veer et al. and the 128 ng in (Ramaswamy et al., 2003) that are among the top 1000 of our first may fluctuate re the correlation r of a gene's expression with survival on the basis of a patients drawn at random from a larger group with similar clinical .If a different set of N is drawn, the correlation will be different.If al fluctuations of r are sizeable, they may change the ranking of a gene ne sample to a much lower rank in another; the smaller N, the larger the f r.In order to estimate the effect of these fluctuations on the f gene lists such as those of (van 't Veer et al., 2002), we repeatedly ent subgroups of 77 samples out of the 96 (in each group we maintained od/poor prognosis ratio) and for each subgroup identified the 70 genes lists to 12.2/7 10 subgroups.are likely to b from a clinica ld different lists of "top 70 genes" with respect to correlation with survival.0. Fig. 4 shows how large are the variation of gene rank, measured for Genes whose correlation with survival ranked high over one subgroup ecome low ranked in another.Hence different sets of 77 patients, drawn ly similar pool, will yie l Fig. 5. Standa ith disease outcome, averaged over 5852 genes polynomial fi extrapolate th to calculate th

Measuring th
In order to st sample-size K available sam measured ove deviation (STD) of the correlation.We repeated this procedure 5 times (each time creating a different set of subgroups), to obtain rd deviation of gene's correlation w (y-axis) as a function of sample size K (x-axis).The curve is the t to the results obtained for K between 2 to 48.This curve was used to e STD to larger values of K (The values extrapolated to K=77 were used e error-bars presented in Fig. 6).
e correlation fluctuations udy how the fluctuations of the correlation with survival vary with the , we created n K non-overlapping subgroups of size K from the 96 ples.We calculated the correlation of each gene g with survival, r each subgroup, and from these n K values we estimated the standard-0 to 96 (see Fig. 5).As shown in σ g (K), the average STD, for each of the 5852 genes, for K ranging from 2 to 48 (the maximal K allowing for nonoverlapping subgroups).Finally, we extrapolated the correlation noise (estimated by <σ >, the STD averaged over the genes), from K = Fig. 5 the correlation noise decreases as the samples-size increases.For sample-size of 77 (the size of the training set), the expected average noise is ~0.1, whereas the significant genes found by van't Veer et al and by our study show correlation between 0.3 to 0.5.In ll signal to noise ratio, the phenomenon shown in Fig. 4 is not surprising.sample-size K=77 (see Fig. 6), one can see that even relatively low (around 1000), may have a non negligible probability to be included Taking into account the correlation noise, we defined an alternative gene score (instead of correlation coefficient), by calculating its probability to have a correlation above a given threshold for a given sample size (see Supplementary Information).Fig. 8 presents the probability of genes to have correlation higher than a given threshold (y-axis) calculated on the basis of the noise derived for samples size of 77.The x-axis represents the genes' ranks according to their correlation coefficient with all 96 samples.

DISCUSSION
In this work w an attempt to from differen outcome, for between these basis of corre used.These l ranked genes experiment.In quite good.T training and t large ensembl and perform sensitivity of e.g. the list of between lists of survival related genes generated from the same data set, the disagreement between lists obtained from different data sets is not surprising.A possible biological explanation for this may be the individual variations and heterogeneities associated with markers for outcome, even within a clinically homogenous group of patients.Perhaps one has to divide the patients into smaller subgroups (Sorlie et al., 2003) on the basis of some yet unknown attribute and for each subgroup of tumors look for it's much sought "primary master genes" that control metastatic potential.The gene ranks according to their correlation with all 96 samples.The left nds to threshold of 0.4 and the right curve to threshold 0.1.e investigated a single breast cancer data set (van 't Veer et al., 2002) in explain the inconsistency between lists of survival related genes derived t experiments.While no single gene has a very high correlation with many the correlation has intermediate values (Fig. 1).The differences correlation values are small, and the relative ranking of genes on the lation with survival changes drastically when a different training-set is arge fluctuations in gene rank indicate that the identities of the top 70 are not robust, and hence will not be reproduced in a different spite of this sensitivity, the predictive power of several sets of genes is he main lesson is that whenever any arbitrary decision (e.g.choice of est set) is taken throughout analysis of the data, one has to generate a e of the different ways in which this arbitrary decision could be taken, a statistical analysis of the results obtained over this ensemble.High the results to the arbitrary decisions may indicate that the conclusions, survival related genes, are not unequivocal.In light of the inconsistency correlations w and low in oth with survival, of how many note that suc r gene will not necessarily be top-ranked with respect to correlation m subgroups.Since one ma homogenous separate two i prognostic to prognostic too of them will genes for ind necessarily in to study the e must scan the entire, wide list of survival-related genes.By focusing only on those genes that were singled out from one data set a breast and als  Oncogene, 22, 7687-7694. Byron, S.A. and Yee, D. (2003) Potential therapeutic strategies to interrupt insulinlike growth factor signaling in breast cancer.Semin Oncol, 30, 125-132.Ciavarra, R.P., Brown, R.R., Holterman, D.A., Garrett, M., Glass, W.F., 2nd, Wright, G.L., Jr., Schellhammer, P.F. and Somers, K.D. (2003) Impact of the tumor microenvironment on host infiltrating cells and the efficacy of flt3-ligand combination immunotherapy evaluated in a treatment model of mouse prostate cancer.Cancer Immunol Immunother, 52, 535-545.

ACKNOWLEDGEMENTS
ith survival of such a master gene may be very high in its own subgroup ers.The large fluctuations in the correlation of such a gene's expression measured over different training sets, are due to the fluctuating fraction members of the gene's subgroup are in the training set.It is important to h a maste easured in a very large sampling of patients, composed of a mixture of y need much larger numbers of patients to identify such survival-wisesubgroups and their associated potentially master genes, one should ssues: the quest for survival-related master genes and the construction of ols on the basis of a short gene-list.One can produce fairly reliable ls; many genes are related to survival, and using a large enough subset compensate for the fluctuations in the predictive power of individual ividual patients.Membership in a prognostic list, however, is not dicative of the gene's importance for cancer pathology.Rather, in order otential targets for treatment, on p s its preferred prognostic tool, one may miss important key players, in o in other types of cancer.

Fig
Fig. 1. A. Th (projection on survival vecto normalized su pole while B correlation w characterized their relative other.All oth illustrate howall blue are far).The genes create an elongated structure at an angle <π/2 with s, implying that a large number of genes exhibit non-vanishing correlations with survival.
Fig. 2. The av of 70 genes.The fluctuating curves present the number of errors produced by the classifiers res training errors the rank of th black 'x's; the errors.Inset: histogram of the number of classifiers whose training and test errors are at least as low as thos h highest correlation to survival).Of ~6% 5 were f classifier with classifiers is a ), based on consecutive groups of 70 genes.For each classifier we measured the training and the test error, and found seven other sets of 70 genes, producing classifiers with the same prognostic capabilities as those based on the top 70.The genes of some of these seven classifiers appeared way down in the correlation-ranked list; the 70 genes of the first classifier ulting from one particular selection of training and test sets (upperout of 77 samples; lower -test errors, out of 19.The x-axis represents e genes in the classifiers.The average over 1000 partitions is plotted as two grey areas are the 95% confidence intervals of the training and test e of the first classifier (based on the 70 genes wit the 1000 partitions, for ~28% no such classifier was found, whereas for ound.Note that more than 70% of the training sets produce at least one the same performance as the top 70 genes; the expected number of such round 4.70 genes can be used to predict survivalis characterized by three main properties: first, many genes are h survival.Second, the differences between these correlations are small correlation-based rankings of the genes depend strongly on the training er).These properties may indicate that the top 70 genes are not superior are ranked be 281-350, clas 701-770.The shown in Fig ation), and the corresponding Kaplan-Meier plots are shown below (see Fig.3).tween 71 to140, classifier2: 141-210, classifier3: 211-280, classifier4:  sifier4: 351-420, classifier5: 421-490, classifier6: 561-630, classifier7:

Fig. 3 .
Fig. 3. Kaplan-Meier analysis of van't Veer et al.'s classifier and of the seven alternative cl describe the classified as poor prognosi t the aforementioned phenomenon is not unique to the specific training elected by van 't Veer et al., we repeated the procedure described above erent compositions of training sets (of 77 samples) and test sets (19 h training set was used to rank the genes, and for each case the sequence described above was constructed, and the training and test errors were Fig. 4. 10 sets patients (usin column -a tra the first traini ost column).For each training set, the 70 top-ranked genes are colored black.The genes that were top ranked in one training-set can have a much lower rank wh and 12) mark genes appeari training set..

Fig. 6 .
Fig. 6.Correlation of genes with survival vs their ranks.The correlation of each gene (y-axis) was m to their correl gene based on Figure 7 Pro ranked on th samples of on in the top 70 bability of genes to be included in a list of top 70.The genes were e basis of their correlation with outcome, as measured over the 77 e particular (randomly chosen) training set.

Fig. 8 .
Fig. 8. Probability (y-axis) of genes to have correlation higher than a given threshold, calculated on the basis of noise derived for a training set of 77 samples.The x-axis represents the curve correspo