Surface-enhanced laser desorption/ionization time-of-flight proteomic profiling of breast carcinomas identifies clinicopathologically relevant groups of patients similar to previously defined clusters from cDNA expression

Introduction Microarray-based gene expression profiling represents a major breakthrough for understanding the molecular complexity of breast cancer. cDNA expression profiles cannot detect changes in activities that arise from post-translational modifications, however, and therefore do not provide a complete picture of all biologically important changes that occur in tumors. Additional opportunities to identify and/or validate molecular signatures of breast carcinomas are provided by proteomic approaches. Surface-enhanced laser desorption/ionization time-of-flight mass spectrometry (SELDI-TOF MS) offers high-throughput protein profiling, leading to extraction of protein array data, calling for effective and appropriate use of bioinformatics and statistical tools. Methods Whole tissue lysates of 105 breast carcinomas were analyzed on IMAC 30 ProteinChip Arrays (Bio-Rad, Hercules, CA, USA) using the ProteinChip Reader Model PBS IIc (Bio-Rad) and Ciphergen ProteinChip software (Bio-Rad, Hercules, CA, USA). Cluster analysis of protein spectra was performed to identify protein patterns potentially related to established clinicopathological variables and/or tumor markers. Results Unsupervised hierarchical clustering of 130 peaks detected in spectra from breast cancer tissue lysates provided six clusters of peaks and five groups of patients differing significantly in tumor type, nuclear grade, presence of hormonal receptors, mucin 1 and cytokeratin 5/6 or cytokeratin 14. These tumor groups resembled closely luminal types A and B, basal and HER2-like carcinomas. Conclusion Our results show similar clustering of tumors to those provided by cDNA expression profiles of breast carcinomas. This fact testifies the validity of the SELDI-TOF MS proteomic approach in such a type of study. As SELDI-TOF MS provides different information from cDNA expression profiles, the results suggest the technique's potential to supplement and expand our knowledge of breast cancer, to identify novel biomarkers and to produce clinically useful classifications of breast carcinomas.


Introduction
Extensive progress has been achieved towards understanding the epidemiology, clinical course, and basic biology of breast cancer. Several clinicopathologic factors -such as tumor grade, anatomical extent, presence/absence of lymph node metastases, presence of hormonal receptors and HER2/neu oncogene amplification -have been recognized as having prognostic and predictive value, influencing the management of patients suffering from breast cancer.
Microarray-based gene expression profiling represents another major breakthrough in the understanding of the ER = estrogen receptor; HPLC = high-performance liquid chromatography; SELDI-TOF MS = surface-enhanced laser desorption/ionization time-offlight mass spectrometry. molecular complexity of breast cancer [1,2]. Gene expression signatures have been identified that are associated with the presence of hormonal receptors, tumor grade and ability to metastasize [3][4][5][6]. These approaches can also identify gene expression signatures that predict response to specific chemotherapies or hormone-based therapies [7,8]. cDNA expression profiles cannot detect changes in activities that arise from post-translational modifications, however, and therefore do not provide a complete picture of all biologically important changes that occur in tumors.
Additional opportunities to identify and/or validate molecular signatures of breast carcinomas are provided by high-throughput proteomic approaches. Tissue microarrays represent the most developed high-throughput proteomic technology used to refine our knowledge of breast carcinoma. Immunohistochemical studies in tissue microarrays have confirmed the results of cDNA expression profiling and have identified identical breast carcinoma phenotypes; that is, two hormonal receptor-positive groups with luminal epithelial differentiation, a group with dominant HER2/neu expression, and a group with basal epithelial characteristics [9].
Hierarchical clustering of protein profiles obtained by immunohistochemistry also exhibits prognostic significance [10]. As immunohistochemical studies are able to evaluate only those proteins already described, another approach is necessary to identify novel proteins not yet associated with tumor clinicopathological characteristics. Surface-enhanced laser desorption/ionization time-of-flight mass spectrometry (SELDI-TOF MS) represents a high-throughput proteomic platform suitable for these types of study. SELDI-TOF MS is based on the surface capture of proteins or peptides from a biologic sample using defined chemical interactions with a solid surface [11]. The specific detection of ionized protein molecules is based on time-of-flight mass spectrometry.
The development of SELDI-TOF MS has overcome limitations of other proteomic approaches in terms of the inability to analyze hundreds of samples within a short time [12], which is essential for obtaining biologically and statistically relevant data in medical proteomic research. In addition, SELDI-TOF MS requires several times less starting material in comparison with two-dimensional polyacrylamide gel electrophoresis [13]. SELDI-TOF MS thus offers high-throughput protein profiling, leading to extraction of protein array data, which are often characterized by a large number of variables (the mass peaks), calling for effective and appropriate use of bioinformatics and statistical tools.
SELDI-TOF MS has been used to generate protein profiles of several cancer types, including breast cancer, to discriminate between malignant tumors and nonmalignant tumors with good sensitivity and specificity [14][15][16][17]. The majority of studies have analyzed body fluid samples such as serum [18], nipple aspirate fluid [14,19], or ductal lavage fluid [20]. Ricolleau and colleagues detected two prognostic biomarkers, ubiquitin and ferritin light chain, in node-negative breast cancer tumors [21]. Nakagawa and colleagues identified differences in the protein profiles of microdissected primary breast cancer tissue samples with and without axillary lymph node metastasis [17].
The aim of the present study was to evaluate tissue lysates of breast cancers by SELDI-TOF MS to identify protein patterns related to clinicopathological variables and/or tumor markers. To reveal similar protein expression profiles within 105 patients, unsupervised hierarchical clustering with a distance measure based on Spearman correlation and the Ward method of linkage of clusters was applied both to protein patterns (to reveal subgroups of patients) and to peaks (to reveal groups of peaks). The data show that this high-throughput protein profiling technique identifies patterns of expression that discriminate different types of breast tumors that group according to clinicopathological variables, and provides similar classification groups to cDNA expression profiling. The main clinicopathological variables -including tumor type, grade and nuclear grade according to Elston and Ellis [22], and estrogen receptor (ER), progesterone receptor and HER2/neu status -were extracted from pathological records. Additional immunohistochemistry and fluorescence in situ hybridization were performed in tissue microarrays to estimate cyclin D 1 overexpression and amplification, the presence of cytokeratin 5/6 and cytokeratin 14, mucin 1 and gross cystic fluid protein.

Patient selection and characterization of specimens
Information about the antibodies, probes and protocols applied is presented in Table 1. In our set of samples, the pT2 (57 cases, 54%) and pT3 (6 cases, 6%) tumors predominated above pT1 tumors (42 cases, 40%) because larger tumors were preferred due to availability of redundant tissue aliquots in the tissue bank. Consequently, metastasis in one or more axillary lymph node was seen in 64 cases (61%). The distribution in accordance with grade was uniform, with a slight predominance of moderately and poorly differentiated tumors (grade 1, 28%; grade 2, 35%; grade 3, 37%). The average age of the patients was identical to their median age (59 years). The series contained 66 ductal carcinomas, 22 lobular carcinomas, five atypical medullary carcinomas, two metaplastic carcinomas, three mucinous carcinomas, two papillary carcinomas, and five mixed ductal and lobular carcinomas. ER expression was positive in 85 cases (81%) and progesterone receptor expression was positive in 81 cases (77%). Initial screening for overexpression of HER2/neu by immunohistochemistry identified 20 cases (19%) with strong membrane staining (2+ or 3+). Gene amplification was subsequently validated using fluorescence in situ hybridization in 14 of these cases (13%).

Sample processing
The lumpectomy or mastectomy resection specimens were received within 20 minutes of surgical removal and were immediately evaluated by a pathologist. Tissue pieces of approximately 3 mm × 3 mm × 8 mm were cut from redundant tumor tissue after standard surgicopathological processing, were snap-frozen in liquid nitrogen and were stored at -80°C. All samples analyzed were stored for no longer than 2 years, and were thawed once immediately before analysis using SELDI-TOF MS in January 2006.
The tissue microarrays were constructed from routinely prepared formalin-fixed paraffin-embedded tissue blocks taken in parallel, using manual tissue arrayer TA1 (A Fintajsl, Czech Republic). Tissue lysis was performed in guanidine buffer (100 mM phosphate buffer, pH 6.6; 6 M guanidine-HCl, 1% Triton X-100) [23] with vigorous shaking for 1 hour at room temperature followed by centrifugation for 30 minutes at 14,000 × g. The total protein concentration was measured by Bradford assay (Bio-Rad, Hercules, CA, USA), and the lysates were aliquoted and stored at -80°C.

SELDI-TOF MS analysis
A pilot study was performed using IMAC30 chips, CM10 (with cation exchange surface, buffer pH 4) and Q10 arrays (with anion exchange surface, buffer pH 9). The final selection was made on the basis of number of peaks detected in spectra (IMAC30 and CM10 displayed a similar number of peaks, Q10 displayed about one-half the number of peaks), on the overall intensity of spectra (the IMAC30 has higher intensity than CM10) and in agreement with published results of other studies previously performed with breast tissue lysates using IMAC30 arrays [17,21]. Protein profiling was performed on IMAC30 ProteinChip Arrays (Bio-Rad) preactivated with copper and following the manufacturer's instructions. A total of 20 μg protein was applied to each spot, diluted 16 times in binding buffer. Finally, 2 × 1 μl sinapinic acid (saturated solution in 0.5% trifluoroacetic acid/50% acetonitrile) was applied. Chips were analyzed on the ProteinChip Reader Model PBS IIc (Bio-Rad) in positive linear mode. Each sample was applied randomly and measured twice. The laser intensity was set at 190 (in relative units, with a range from 0 to 300), detector sensitivity at 5 (in relative units, within the range of 1 to 10) and focus mass at 10 kDa to obtain mass data within the range of 3 to 100 kDa. Each spot (divided into 100 positions) was measured from positions 20 to 80 with steps of four positions. At each single position, 13 transients resulting in 195 shots per spot were measured.
All spectra were collected together into one experimental file. Two spectra, which did not contain any peaks in consequence of an array processing error, were eliminated. To ensure the mass accuracy, all spectra were calibrated using external mass standards: hirudin BHVK (6,964 Da), equine myoglobin (cardiac, 16,951.5 Da), and bovine carbonic anhydrase RBC (29,023.7 Da). To avoid interspectra variability, the intensities of all spectra were normalized to the total ion current using an external normalization coefficient (Ciphergen ProteinChip software 3.2.1; Bio-Rad, Hercules, CA, USA). The value of this coefficient was set at 0.09 (the total ion current value of the least intensive spectrum) to keep the normalization factor of all spectra no higher than one. The normalization process takes the total ion current used for all the spots, averages the intensity and adjusts the intensity scales for all the spots, enabling one to display all the data on the same scale. The baseline ion current values were automatically subtracted to calculate the total ion current measurements; the spectra were normalized in a range from 2 to 100 kDa. Molecular masses below 2 kDa were not analyzed due to masking of peptide signals by peaks from the sinapinic acid matrix.
Finally, we performed internal mass calibration using an endogenous peak at 15,841 Da that appears in all the selected spectra. This step adjusts the mass scaling (x axis) of the entire spectrum on the basis of the naturally occurring sample peaks. External calibration was performed once before the study and was controlled during the study period without any significant changes. Next, peak clustering was calculated with Biomarker Wizard software (Bio-Rad, Hercules, CA, USA) with a signal/noise ratio >5 and 5% minimum spectra detection in the first pass, and then peaks with a signal/noise ratio >3 in a cluster mass window of range 0.3% were added; the valley depth was set at three times the noise. Peak intensities from duplicate samples were then averaged. Ultimately, a final set of 130 identified peak clusters was used for analysis.

Statistical analysis of SELDI-TOF MS mass spectra
The dataset comprised protein expression profiles of 105 samples, each represented by 130 peaks. Together with expression profiles, additional information about 17 clinicopathological variables was available. We performed unsupervised clustering of carcinoma cases according to their protein expression profiles, and we tested distribution of categories of clinical and molecular variables in these groups.
To reveal subgroups of peaks and patients, the dataset was analyzed by hierarchical clustering with the Ward method of linkage of clusters [24] and the distance measure was derived from the Spearman correlation matrix. The Ward method attempts to minimize the sum of squares of any two (hypothetical) clusters that can be formed at each step. The distance measure was derived as absolute value of (Spearman correlation-1). The maximum distance of two therefore represents a Spearman correlation of -1, and the minimal distance of zero represents a Spearman correlation of 1.
For analysis of the relationship between categorical clinicopathological variables and the different groups of women (as estimated by cluster analysis), a Fisher exact test (for 2 × 2 contingency tables) and a maximum likelihood chi-square test (for n × n contingency tables) were performed. Kruskall-Wallis analysis of variance was used to compare the age distribution between groups. All hypotheses were tested at significance level α = 0.05. To avoid a multiple testing problem, the α value was adjusted by a Bonferroni correction for the appropriate number of clinicopathological variables, thus obtaining α = 0.05/17 = 0.0029.
The parameters were tested against three main clusters (A, B, C) and against five smaller clusters (I, I, II, IV, V). For some parameters, dividing patients into five clusters resulted in relatively low numbers in some categories; it is therefore not possible to draw any general conclusions on the results of these parameters, but these P values are included for informative purposes.  [26].
The gel -consisting of 4% sample gel, 10% spacer gel and 16% separation gel -was stained using colloidal Commassie Blue. The bands with appropriate molecular weight were cut out and digested by trypsin according to Havlasova and colleagues [27]. Mass spectra were acquired using a 4800 MALDI TOF/TOF™ spectrometer (Applied Biosystems, Foster City, CA, USA) in both positive reflectron and MS/MS modes. GPS Explorer™ software (version 3.6; Applied Biosystems, Foster City, CA, USA) was used for the evaluation of mass spectra and the identification of proteins.

Results
IMAC 30 (immobilized metal affinity capture) ProteinChip arrays were used to analyze tissue lysates from 105 breast cancer patients. After data processing by Biomarker Wizard software, a total of 130 peaks were selected. Information about the peaks is presented in Additional file 1. The normalized linear intensities of peaks analyzed by hierarchical clustering revealed subgroups of peaks and of patients. The graphical representation of Spearman correlation matrix of peaks is shown in Figure 1. These data clearly demonstrate the groups of peaks that are highly positively correlated, indicating coexpression of these peaks in individual tumors. The hierarchical clustering combines peaks into two, three, or six potential groups, respectively, according to their level of positive correlation. In each of these categories we can find groups of adjacent correlated peaks, as apparent in Figure 1. The highest correlation can be found between peaks from 80 to 82, peaks from 99 to 105, 117 and 118, and peaks from 121 to 124, which form the first group of six categorizations. Descriptive statistics for peaks classified into the six groups are summarized in Table 2. Note that the groups are listed according to decreasing minimal correlation within the group. Using the Spearman correlation matrix to derive the distance matrix for hierarchical clustering, the categorization of patients was most strongly affected by groups of highly correlated peaks.   in the first cluster of peaks. The third group of patients is characterized by higher values in the third cluster of peaks, especially peaks between 1 and 9 and between peaks 23 and 40. The fourth group of patients reveals very high values in the first cluster of peaks, and the fifth group of patients is mainly determined by high values in the second cluster of peaks.
The relative frequencies of selected clinicopathological parameters are illustrated in Figure 3 and a complete distribution of clinical variables and molecular markers within the groups of patients is presented in Table 3. The 105 cases are separated into three main clusters (A, B, C) or into five smaller clusters (I to V), significantly associated with tumor type, ER status and nuclear grade. The older patients with tumors expressing hormonal receptors and a high level of mucin 1 tend to group into clusters I to III (or A and B), while the carcinomas of younger patients exhibiting triple negative (that is, ER, progesterone receptor, HER2/neu) phenotype, high nuclear grade and basal cytokeratin 5/6 and/or cytokeratin 14 locate more often to cluster IV and especially cluster V (or C).
Clusters I and III differ in relative frequency of lobular carcinomas (predominate in cluster I) and ductal carcinomas (predominate in cluster III), otherwise sharing similar characteristics (ER-positive, low grade, older patients). Cluster II is characterized with higher nuclear and tumor grade if compared with adjacent clusters I and III, and contains onehalf of the 14 cases exhibiting HER2/neu gene amplification.
Cluster IV exhibits some transitional characteristics from clusters I to III to cluster V, where the high-grade, triple-negative carcinomas with low expression of mucin 1 and gross cystic fluid protein clearly predominate. Cyclin D 1 coding gene amplification is randomly distributed except in cluster V. Cyclin D 1 protein expression is distributed similarly to ER, and the same applies for ERβ. The distribution of lymph node metastases does not exhibit a specific relationship with clustering.
Clustering of patients into five groups was determined by the expression profile of all 130 peaks. To identify these peaks we separated the IMAC binding proteins either by HPLC and tricine SDS-PAGE or directly using tricine SDS-PAGE with subsequent MS/MS identification. For the final identification we extracted four gel bands with identical SELDI-TOF MS and tricine gel molecular masses (7,706 Da, 17,599 Da, 27,152 Da and 33,327 Da corresponding to peaks 81, 107, 114 and 119). Using MS/MS analysis of samples separated by reverse phase-liquid chromatography prior to tricine SDS-PAGE, we identified a peak with molecular mass of 33 kDa as annexin 5 (accession number NP 001145). MS/MS analysis of IMAC binding proteins directly separated by tricine SDS-PAGE lead to identification of a peak with mass 27 kDa as heat shock protein 27 (hsp27) (accession number AAA62175). Both proteins were identified with a protein score confidence interval of 100%. The result of identification of two remaining peaks was not successful.

Figure 1
Graphical representation of Spearman correlation matrix of 130 sur-face-enhanced laser desorption/ionization time-of-flight mass spec-trometry peaks Graphical representation of Spearman correlation matrix of 130 surface-enhanced laser desorption/ionization time-of-flight mass spectrometry peaks. Red color intensity, positive correlation; green color intensity, negative correlation.

Figure 2
Result of hierarchical clustering in the form of a heat map of peak values Result of hierarchical clustering in the form of a heat map of peak values. Rows represent 105 individual patients and columns represent 130 peaks used for the analysis. The value of the peak is indicated by the color intensity. Unsupervised hierarchical clustering revealed two (not labeled), three (labeled A, B, C), five (labeled I to V) and six (labeled 1 to 6) groups of patients.
(page number not for citation purposes) Table 3 Distribution of clinicopathological variables within groups of patients as revealed by hierarchical clustering

Discussion
The goal of expression profiling of human tumors is to provide information that can assist with tumor diagnosis or classification, or can provide prognostic information. To be of value, such data must improve on the clinicopathological assessments already used. Gene expression profiling has shown its worth in these respects in recent years, and in breast cancer has provided alternative classification systems that appear valuable in clinical practice. Protein expression profiling has also been useful in studies of human tumors and has often been used to identify serum biomarkers that aid disease monitoring and the therapeutic response. In the present article we analyzed protein profiles in breast cancers and evaluated the potential for this approach to provide clinically useful classifications of this heterogeneous disease.
The protein expression profiles of tumors obtained in our study, consisting of as little as 130 substantially intercorrelated peaks, must certainly represent only a small fraction of tumor nature. The diversity in profiles reflects necessarily both biological diversity and methodological factors, including variable tumor/stroma ratios. Nonetheless, our data clearly show For some parameters evaluated in tissue microarrays, information was not available in all patients. a A to C, three clusters; I to V, five clusters.

Figure 3
Distribution of selected clinicopathological parameters within the cluster tree of patients Distribution of selected clinicopathological parameters within the cluster tree of patients. Each square label represents a case. ER, estrogen receptor; MUC1, mucin 1.
that this approach provides robust, statistically significant groupings of patients that are related to the recently described classifications based on cDNA expression profiling. Correlating protein expression patterns to tumor morphology, established biomarkers and clinical outcomes is therefore a key issue of high-throughput studies to discover peaks that merit further investigation.
The categories of patients identified by hierarchical clustering of SELDI-TOF MS peaks resemble the recently adopted classifications of breast carcinomas based on gene expression profiling [2]. In this study the first subgroup, so-called luminal subtypes (A and B) On the other hand, we found no significant relationship of our tumor clusters to the presence of lymph node metastases. Regarding the potential prognostic significance of our data, the follow up of the patients is not yet long enough to allow a valuable analysis.
Although controversial, it may be acceptable to use a proteomic polypeptide profile consisting of as yet unsequenced peptides for the diagnosis of disease and to predict the risk of disease development and progression and/or the efficacy of treatment -if it has proven its value in blinded multicenter and repeated studies, as recently pointed out [29]. For a fuller understanding of disease processes and to simplify the investigative procedures, it may also be useful to identify the biomarkers. We were able to identify peak 114 from the fifth peak cluster group with mass 27,152 Da as hsp27. This protein is a molecular chaperone, participating in cell homeostasis under stress conditions, and is associated with resistance of tumors to chemotherapeutics, radiation and hyperthermia [30].
High intensities of all peaks occurring in fifth and sixth peak cluster groups, including the hsp27 peak, classify patients into subcohorts I and III corresponding to luminal A subtype of tumors with high expression of ER. This finding is in agreement with Ciocca and colleagues, who showed correlation of high levels of hsp27 with ER expression [31]. Recent studies revealed the correlation of hsp27 phosphorylation status (that also increases protein binding on the IMAC surface) with HER2/neu and lymph node positivity in breast cancer [32].
We could also identify the peak at 33,327 Da from the sixth peak cluster group as annexin 5. The annexin family has been linked to inhibition of phospholipase activity, exocytosis and endocytosis, signal transduction, organization of the extracellular matrix, resistance to reactive oxygen species and DNA replication [33]. Annexin 5 normally forms a shield around certain phospholipid molecules that blocks their entry into coagulation reactions [34]. The role of annexin 5 in breast cancer, however, is elusive.
Our methodological approach of unsupervised hierarchical clustering helped overcome some of the complexities of SELDI-TOF MS to yield biologically meaningful and interpretable data. In the present study we demonstrated that, using the whole protein spectra, we were able to cluster our patients into subcohorts paralleling classification based on DNA microarray profiling data.

Conclusion
Our results show that SELDI-TOF MS protein profiling distinguishes between different groups of primary human breast cancers and produces a similar clustering of tumors as cDNA expression profiles. This fact testifies to the validity of the SELDI-TOF MS proteomic approach in these types of study. As SELDI-TOF MS provides different information compared with cDNA expression profiles, the results suggest its potential to supplement and expand our knowledge of breast carcinomas. The identification of proteins belonging to the interesting peaks followed by validation by immunohistochemistry is an essential task for future studies. Although many parameters have been explored in relation to breast cancer biology and outcome, the ability to classify tumors into distinct subclasses by identifying representative protein expression patterns improves our understanding of cancer biology and provides the potential for more precise clinicopathological phenotyping.
data. LH performed MS analysis and data interpretation. DK prepared all of the tissue samples for SELDI-TOF MS. DV supervised and reviewed the SELDI-TOF MS analysis. RV participated in the study design and supervision of clinical data analysis. BV participated in the initiation of the project and study design, financially supported the project and helped to draft the manuscript. RN conceived the study, participated on interpretation of the results and statistical analysis, supervised the statistical analysis and finalized the manuscript. All authors read and approved the final manuscript.