Establishment of the epithelial-specific transcriptome of normal and malignant human breast cells based on MPSS and array expression data
- Anita Grigoriadis1,
- Alan Mackay2,
- Jorge S Reis-Filho2,
- Dawn Steele2,
- Christian Iseli3,
- Brian J Stevenson3,
- C Victor Jongeneel3,
- Haukur Valgeirsson2,
- Kerry Fenwick2,
- Marjan Iravani2,
- Maria Leao1,
- Andrew JG Simpson4,
- Robert L Strausberg5,
- Parmjit S Jat6,
- Alan Ashworth2,
- A Munro Neville1Email author and
- Michael J O'Hare1
© Grigoriadis et al.; licensee BioMed Central Ltd. 2006
Received: 17 July 2006
Accepted: 2 October 2006
Published: 2 October 2006
Diverse microarray and sequencing technologies have been widely used to characterise the molecular changes in malignant epithelial cells in breast cancers. Such gene expression studies to identify markers and targets in tumour cells are, however, compromised by the cellular heterogeneity of solid breast tumours and by the lack of appropriate counterparts representing normal breast epithelial cells.
Malignant neoplastic epithelial cells from primary breast cancers and luminal and myoepithelial cells isolated from normal human breast tissue were isolated by immunomagnetic separation methods. Pools of RNA from highly enriched preparations of these cell types were subjected to expression profiling using massively parallel signature sequencing (MPSS) and four different genome wide microarray platforms. Functional related transcripts of the differential tumour epithelial transcriptome were used for gene set enrichment analysis to identify enrichment of luminal and myoepithelial type genes. Clinical pathological validation of a small number of genes was performed on tissue microarrays.
MPSS identified 6,553 differentially expressed genes between the pool of normal luminal cells and that of primary tumours substantially enriched for epithelial cells, of which 98% were represented and 60% were confirmed by microarray profiling. Significant expression level changes between these two samples detected only by microarray technology were shown by 4,149 transcripts, resulting in a combined differential tumour epithelial transcriptome of 8,051 genes. Microarray gene signatures identified a comprehensive list of 907 and 955 transcripts whose expression differed between luminal epithelial cells and myoepithelial cells, respectively. Functional annotation and gene set enrichment analysis highlighted a group of genes related to skeletal development that were associated with the myoepithelial/basal cells and upregulated in the tumour sample. One of the most highly overexpressed genes in this category, that encoding periostin, was analysed immunohistochemically on breast cancer tissue microarrays and its expression in neoplastic cells correlated with poor outcome in a cohort of poor prognosis estrogen receptor-positive tumours.
Using highly enriched cell populations in combination with multiplatform gene expression profiling studies, a comprehensive analysis of molecular changes between the normal and malignant breast tissue was established. This study provides a basis for the identification of novel and potentially important targets for diagnosis, prognosis and therapy in breast cancer.
Breast cancer is a clinically heterogeneous disease and consists of many different cell types, including normal and reactive stromal components in addition to the malignant neoplastic compartment. Moreover, it comprises a series of distinct malignant tumours that present diverse cellular features with varying differentiation status, distinct genetic changes, responses to therapy and outcome . Likewise, the normal breast is also composed of different parenchymal and stromal cell types, with the terminal ductal-lobular unit being the most important feature with regard to neoplasia. The latter is composed of two morphologically recognisable cell types, epithelial cells on the luminal surface and basally located myoepithelial cells. While typical breast cancers have been traditionally regarded as exhibiting characteristics akin to luminal epithelial cells, recent data have shown that some also exhibit, in part or whole, myoepithelial/basal features [2–4]. Based on the restricted expression of genes representing the phenotypes of luminal epithelial and basal cells , major subtypes of breast cancer have been defined and linked to both long term survival  and their response to therapy . Therefore, detailed characterisation of the normal luminal and myoepithelial/basal phenotypes is a prerequisite for understanding the genetic alterations that occur in breast cancers and how they may impact on disease progression and outcome.
The use of solid tissues, as in most previous breast cancer gene expression analyses, results in greatly enhanced complexity of data because of the widely varying degrees of stromal responses (desmoplasia) and inflammatory infiltrates in individual tumours. Laser capture microdissection partially alleviates this problem in respect to tumour samples, but is unsuited to the large-scale separation of the normal epithelial cell types in breast because of the close contact between these cells. Immunomagnetic separation of individual cell types from normal human breast tissue [7, 8] and primary breast cancers  has enabled direct comparisons of normal epithelial and malignant epithelial cells to be made. Previous proteomic [9, 10] and gene expression analyses of such samples [10–13] have established a partial molecular characterisation of the epithelial compartment in the normal breast and breast cancer , but, due to the limitations of technology available at the time of these studies, did not provide a comprehensive comparison of all proteins or transcripts.
Multiple large-scale analytical techniques now make it possible to capture entire transcriptomes of defined cell populations. Breast cancers have been extensively analysed with both expression arrays  and with direct sequencing techniques such as serial analysis of gene expression (SAGE) . Although several studies have correlated expression data based on microarray and SAGE [16, 17], a comprehensive genome-wide expression profile using a combination of complementary technologies has not yet been achieved for purified malignant epithelial breast cells in comparison with purified normal breast epithelial cells. In this study, massively parallel signature sequencing (MPSS) [18, 19] and multiple genome-wide microarrays have been used to analyse immunomagnetically separated normal luminal epithelial cells and primary breast cancers substantially enriched for the neoplastic epithelial component. The aim of this study was to establish a virtually complete coverage of transcripts deregulated in the neoplastic cells of human breast cancer. In addition, expression profiles from normal luminal and myoepithelial cells have been used to identify cell-type specific transcripts and ontologically related gene sets in the differentially expressed tumour epithelial transcriptome. The use of highly enriched cell preparations in combination with a multiplatform approach to their expression analysis has revealed novel markers and potential targets, the clinical significance of some of which has also been examined, using tissue microarrays.
Materials and methods
Ten primary cultures (approximately 107) of normal human breast luminal and myoepithelial cells were prepared from reduction mammoplasty samples by double immunomagnetic sorting methods [7, 8, 10]. In brief, breast epithelial cells were immunomagnetically purified using combined positive magnetic activated cell sorting (MACS; Miltenyi Biotec, Auburn, CA) selection with antibodies against the luminal epithelial marker EMA (rat monoclonal ICR-2, Seralab, Leicestershire, UK) and the myoepithelial membrane antigen CD10 (mouse monoclonal CALLA clone SS2/36, DAKO Corporation, Glostrup, Denmark), followed by negative Dynabead (Dynal, UK) selection using mouse monoclonal antibodies against anti-β-4-integrin clone A9, a myoepithelial cell-surface antigen (Santa Cruz Biotechnology, CA, USA) and BerEp-4 Epithelial Antigen, a luminal antigen (DAKO Corporation, Glostrup, Denmark). Immunostaining with myoepithelial and luminal-specific lineage markers showed the final sort of epithelial cells used in this study to be >95% pure. Full details of these procedures are not only contained in previous publications [10, 11], but are also appended, as required, to the Minimum information about a microarray experiment (MIAME) protocol that accompanies submission E-TABM-66 .
Malignant breast epithelial cells of 50 freshly isolated primary infiltrating ductal carcinomas of histological grade 2 and 3 were enriched from disaggregated tumour tissue as described previously . In brief, fresh tumour biopsies (1 to 2 g) were comminuted to approximately 1 mm3, using scalpel blades, and subjected to a controlled disaggregation using 0.25% collagenase Type1 (Sigma-Aldrich, Dorset, UK) in L-15 medium with 2% fetal calf serum for 4 to 6 h with intermittent shaking. After brief settling, the supernatant was spun down, and the pellet resuspended in L-15 medium and passed through a 100 μm mesh filter to remove residual undisaggregated tumour fragments, plus disaggregated 'normal' organoids and ducts as well as lobules and ducts distended with ductal carcinoma in situ, leaving only small clusters and single cells. The latter were then reacted with the mouse monoclonal antibody F19 to fibroblast activation protein bound to sheep anti-mouse coated Dynabeads (Dynal, Paisley, UK) using the manufacturer's protocols. Almost all desmoplastic fibroblasts associated with breast cancers express this antigen strongly. Cells attached to beads were removed with a Dynal MP40 magnet; F19-negative cells were then allowed to sediment under unit gravity for 2 to 3 h (to remove most lymphocytes). The resulting preparation was then screened by phase contrast microscopy to identify those preparations in which there were few if any microvessels (the other main potential stromal contaminant not removed by fibroblast activation protein sorting), or normal tissue elements, such as ducts or acini's. Of the 50 samples, 15 were selected for this study, based on the criteria of ≥80% malignant cell content as determined by phase-contrast examination, ≥80% viability (as determined by trypan blue exclusion) and the integrity of its total RNA. The purity of both normal and malignant epithelial preparations is illustrated in Additional file 1. Informed consent to use this material for scientific research was obtained, and details of the pathology of the individual tumours are given as Additional file 2. RNA was prepared from individual samples by standard Trizol methods and pooled to give a luminal, a myoepithelial and a malignant RNA sample of >1 mg for analysis.
MPSS was performed by Lynx Therapeutics, (CA, USA) according to the Megaclone 'signature' protocol [18, 19]. Briefly for each library synthesis, after DNase treatment of approximately 300 μg total RNA from normal luminal and malignant breast epithelial pools, cDNA was generated from poly(A)+ RNA, and amplified copies of each cDNA clone were attached to beads. The sequence adjacent to the poly(A) proximal DpnII site was determined by cycles of ligations to fluorescently tagged 'decoding' oligonucleotides and cleavages by restriction enzymes. Each sequence signature comprises the DpnII restriction recognition site (GATC) and 13 contiguous nucleotides. The raw data resulted from four sequencing runs, collected in two reading frames offset by two nucleotides relative to the anchoring restriction enzyme site and generating approximately 2 to 3 × 106 sequences. Signatures that were seen in at least two independent runs (reproducible) and were present at a frequency of more than three transcripts per million (tpm) in one sample (significant) were selected for further analysis.
As a basis for the matching of signature sequences to transcripts, we used our own reconstitution of the human transcriptome database (HTR) [21–23] based on a comprehensive set of cDNA to genome alignments that are merged into gene models representing the detailed structure of human transcribed regions. Each HTR contains a cluster of cDNA sequences, similarly to the NCBI/UniGene database. The annotation of the signature was then performed in two steps as described previously , using the NCBI35 assembly of the human genome. Firstly, a 'signature-centric' annotation was performed, where sequence signatures were mapped to either one or more transcribed regions of the genome, including repetitive sequences, ribosomal, mitochondrial and non-mapped transcripts. In the second step, only signatures from the 'signature-centric' annotation that matched exactly or had one nucleotide mismatch to known transcribed regions were retained to form the 'gene-centric' version. When different sequence signatures mapped to the same gene, counts were combined. To identify genes with significant differences (P value ≤ 0.05) in representation in the two RNA pools, the absolute difference in abundance between the malignant and the normal epithelial RNA sample was determined and log2 transformed, resulting in a relative expression measurement.
The same total RNA pools were hybridised onto a 20 k cDNA microarray (20 k brk, constructed at The Breakthrough Breast Cancer Research Centre, Institute of Cancer Research, London, UK containing 19,391 sequence-validated IMAGE clones), Affymetrix Human Genome U133 Plus 2.0 GeneChip (Affymetrix, Inc., Santa Clara, CA, USA), CodeLink™ Human Whole Genome Bioarray (GE, Healthcare, formerly Amersham Biosciences, Chandler, AZ, USA) and Agilent Whole Human Genome Oligo Microarray 44 k cDNA array (Agilent Technologies, Palo Alto, CA, USA). Three technical replicates of each RNA pool were amplified, labelled and hybridised according to manufacturer's guidelines. Where necessary an RNA pool consisting of breast cancer cell lines was used as a reference sample  and dye-swap hybridisations were performed. All primary array data are available through ArrayExpress ; they comply with MIAME standards, with the accession number E-TABM-66. Overlay of each microarray platform with MPSS was done by mapping the sequence information of probes and probe sets to the same HTR database as used for MPSS tag mapping (see above). Only those microarray features that were unambiguously mapped to a single HTR cluster were included for further studies. All preprocessing of each microarray platform and further statistical analysis was performed in the R 2.1.1 environment  by making extensive usage of the limma package  in BioConductor 1.6 . For the Affymetrix platform, probe-level data were normalised and expression data were summarised by the robust multi-array analysis ; cyclic lowess normalisation was applied to the CodeLink™ expression data through the codelink 0.7.2 package in R 2.3; for the Agilent microarrays, global normalisation with no background correction was applied; and for the 20 k brk microarrays, raw expression data were print-tip normalised and background corrected. Relative measurements for each transcript were given as a log2 fold ratio, and only genes with a false discovery prediction of P ≤ 0.05 were regarded as significantly differentially expressed when using Benjamini and Hochberg's P values adjustment .
Genes were categorised with respect to their biological process, cellular role, molecular function, using Onto-Express (OE) [29, 30]. The most significant perturbed biological processes were determined with respect to the number of genes expected for each Gene Ontology (GO) category based on their representation on the Affymetrix U133 Plus 2.0 array. Statistical significance was determined by using OE's hypergeometric probability distribution and Bonferroni correction options, and annotations with P ≤ 0.05 were accepted as significant. Gene set enrichment analysis (GSEA) comparing luminal and myoepithelial gene signatures was done using described methods . Biological processes were ranked according to their significance of enrichment, and the validation mode measure of significance was used to identify those of greatest enrichment.
Total RNA (10 μg) from the normal luminal epithelial and the malignant epithelial RNA pool was used for each 40 μl reverse-transcription reaction, and 10 μl of 1/50 diluted cDNA was used per 30 μl PCR. RT-PCR was performed by using the Applied Biosystems AmpliTaq Gold, Cheshire, UK, with either 25 or 30 cycles, each consisting of 30 s at 94°C, 30 s at 55°C, and 45 s at 72°C. PCR products were visualised on 2% Invitrogen agarose E-Gels 96 Gels (Invitrogen Life Technologies, Carlsbad, CA, USA).
Immunohistochemistry and tissue microarray analysis
A cohort of invasive breast carcinomas from 245 patients treated with surgery (wide local excision or mastectomy) and adjuvant anthracycline-based chemotherapy was retrieved from the Department of Histopathology files of the Royal Marsden Hospital (London, UK) with appropriate local Ethical Committee approval. Representative blocks were reviewed by a pathologist (JSRF) and selected cores were incorporated in two duplicate tissue microarray (TMA) blocks [32, 33]. Full details of the TMA are given as Additional file 3. TMA samples were dewaxed in xylene, cleared in absolute ethanol and blocked in methanol for 10 minutes. Antigen retrieval for cartilage oligomeric matrix protein (COMP) and IL8 was by boiling slides in citrate buffer (pH 6) for 2 minutes in a pressure cooker, after which they were blocked with normal horse serum (2.5% for 20 minutes; Vector Laboratories VL, Burlingame, CA, USA) and endogenous biotin blocked by pre-incubating with avidin (15 minutes) and biotin (15 minutes). They were then incubated with anti-COMP antibody (1/50; Serotec, Oxford, UK) or IL8 antibody (1/5; Serotec) for 1 h at room temperature. For immunohistochemistry of Periostin (POSTN), sections were pretreated by microwaving in Dakocytomation (Glostrup, Denmark) pH 6 antigen retrieval buffer for 18 minutes, blocked, and anti-POSTN antibody (1/1500; Biovendor Laboratory, Heidelberg, Germany) applied for 30 minutes at room temperature. Antibody binding was detected using Vectastain Universal ABC (VL), visualised with 3,3'-diaminobenzidine DAKO (Corporation, Glostrup, Denmark). Full details on the distribution of ER, PR, HER2, EGFR, CK 14, CK 5/6, and CK 17, as well as P53 (DO7, 1/200; DAKO Corporation) are described elsewhere  and summarised in Additional file 3. To evaluate the proliferative activity of tumour cells, immunohistochemical detection of MIB1 antibody to detect Ki-67 nuclear antigen (1/300; DAKO Corporation), which is associated with cell proliferation, was carried out under the same conditions . For these markers, only nuclear staining was considered specific. Ki67 (MIB1) staining was scored low if less than 10% of neoplastic cells were positive, intermediate if 10% to 30% of neoplastic cells were positive and high if more than 30% of neoplastic cells were positive . Tumours were scored positive for P53 if >10% of the nuclei of neoplastic cells displayed strong staining .
Cumulative survival probabilities were calculated using the Kaplan-Meier method/log-rank test. Differences between disease-free interval and survival were tested with the log-rank test (two-tailed, confidence interval 95%) using the statistical software Statview 5.0., NC, USA. Multivariate analysis was performed using the Cox multiple hazards model. A P value < 0.05 in the univariate survival analysis was used as the limit for inclusion in the multivariate model.
MPSS analysis of normal luminal and malignant breast cancer cells
Numerical analysis of massively parallel signature sequencing
Malignant breast epithelium
Normal luminal epithelium
Uniquely mapped signatures
Unique HTR clusters
Differentially expressed transcripts
4,311 T > L
2,242 L > T
Establishing a microarray validated transcriptome
The MPSS derived transcriptomes were compared with gene expression profiles of the same RNA pools obtained using three oligonucleotide genome-wide microarrays, Affymetrix U133 Plus 2.0 GeneChip and CodeLink™ Human Whole Genome Bioarray, Agilent Whole Human Genome Oligo Microarray 44 k cDNA array and 20 k brk cDNA microarray. These different microarray platforms were chosen to achieve the highest possible coverage of known transcribed sequences. Features from all platforms were mapped to HTR clusters and our analysis was restricted to those that mapped unambiguously to one HTR cluster. For the Affymetrix platform 41,322 of 54,613 (75.66%) features could be assigned to unique HTR clusters; for CodeLink™ 28,949 of 54,841 (52.78%); for Agilent 32,402 of 44,290 (73.15%); and for the 20 k brk 12,055 of 19,959 (60.4%). Overlay of the transcript coverage of each microarray demonstrated that each platform contributed a set of unique genes as well as those common to other platforms, justifying the use of more than one microarray platform. (Full annotation of each microarray platform to HTR clusters is available as Additional file 5) The microarray features of all four platforms provided a total coverage of 26,103 HTRs, and 6,342 out of 6,553 (96.8%) of the differentially expressed transcripts obtained by MPSS were represented on one or more of these genome-wide platforms.
Functional classification of differentially expressed genes
The most significantly perturbed functional gene set identified in the down-regulated tumour epithelial transcriptome (Figure 2b) was epidermis development, including members of the kallikrein family (KLK5, KLK7, KLK8, KLK10) and the keratin family (CK10, CK14), as well as the family of extracellular matrix glycoproteins, such as LAMC2, LAMB3 and LAMA3. The second most perturbed subset of down-regulated genes included several members of the RAS-related proteins, RAP1A, RALB, RAB5B, RAB4A, RAB3B, RAB2 and RAB25 (protein transport; Figure 2b), some of which counteract the mitogenic function of RAS-MAPK signalling pathways .
Differentially expressed transcripts in normal breast epithelial cells
Enrichment of luminal and myoepithelial genes and gene sets in the differential tumour epithelial transcriptome
Clinical significance of POSTN using tissue microarray analysis
Multivariant proportional-hazard analysis
Hazard ratio (95% confidence interval)
Using highly enriched populations of malignant breast epithelial cells and normal epithelial cells, obtained from immunomagnetic cell sorting, we have established genome-wide molecular signatures specific to the epithelial compartments of both the normal and the malignant human breast. Combining gene profiles obtained from different expression platforms, including direct high-throughput sequencing (MPSS) and multiple microarray platforms, yielded a validated transcriptome comprising 8,051 differential transcripts. These data provide a basis for the molecular changes that occur in the transition from normal luminal to malignant epithelial cells, and also allow further analysis of solid breast tumour (neoplastic plus stroma) gene expression studies, enabling those genes of specific epithelial origin to be identified in respect to progression, prediction of outcome and metastasis. The expression data obtained from the normal luminal and myoepithelial cells have extended our previous analysis of these normal cell types , and provide gene sets that can be used to comprehensively specify the epithelial phenotype expressed in breast tumours, as well as defining new markers of each cell type.
The data presented here report for the first time the application and validation of the MPSS sequencing technology to malignant human breast epithelial cells and their normal counterparts. MPSS expression studies of different human cell lines and normal tissues have already shown that this technology represents the most comprehensive sequencing methodology available at present, in terms of gene coverage and quantitative assessment of gene expression [22, 39]. With over 106 sequencing reactions per sample [18, 19], it is comparable in scope with the now commonly used genome-wide microarray profiling methods, as also used in the present study. Comparative studies of genome wide data sets are entirely dependent on the choice of common denominator for annotation . By using our sequence based mapping, 97% of MPSS tags could be aligned with individual features on genome-wide microarrays, indicating that the vast majority of the expressed sequence tags in the normal and malignant breast epithelium MPSS libraries represent known transcripts, in agreement with the recent data suggesting that MPSS identifies very few truly novel genes . Given the significant methodological differences between microarray and MPSS analysis, the fact that more than 65% of our MPSS differential data set showed concordance with expression profiling obtained by several different microarray platforms, represents a good overlap compared with other examples of sequence versus array data . However, a substantial number of differentially expressed genes (4,149) measured on at least two microarray platforms were not identified as such by MPSS, and a significant number of MPSS differential transcripts (2,440) were not confirmed on any array (Figure 1), implying a relatively high false positive and false negative rate of the MPSS methodology. This probably reflects the known limitations of the MPSS technology , particularly with regards to transcripts that were not detected (zero counts) in one sample, as well as genes lacking appropriate restriction enzyme sites required for this technology. However, individual microarray platforms themselves differ substantially  and a multiplatform approach, as used here, clearly defines a robust DTET seen by every technology.
Another important feature of our DTET is the use of purified epithelial cells, derived by both positive and negative immunomagnetic sorting in which the contamination of malignant samples with stromal cells is reduced to a minimum, and normal luminal and myoepithelial cells are separated from short-term primary cultures. Although the profiling techniques used represent the global transcriptomes of purified normal and neoplastic breast epithelial cells in highly enriched preparations, it is conceivable that even a small contamination of the malignant samples by normal or reactive stromal cells, as well as the induction of inflammatory genes due to in vitro manipulation, could result in false positives. However, verification of the probable epithelial origin of differentially expressed genes can be obtained by comparing expression data from breast epithelial cell lines , breast tumour cell lines or, as in the present study, by immunohistochemistry, all of which show that, for example, IL8, is a bona fide epithelial tumour-associated product [43, 44]. One of the features of normal luminal epithelial cultures is the loss of estrogen receptor expression . The microarray gene expression profiling currently used to classify breast cancers supports the paradigm that ER status is the most important phenotype in breast cancer and has led to the classification of breast cancers into luminal A (ER-positive good prognosis) and luminal B (ER-positive poor prognosis), and ER-negative myoepithelial/basal and HER2 subtypes, each with distinct differences in prognosis and response to therapy [4, 5, 46]. Genes identified in this study representing the normal luminal epithelial phenotype are distinct from the subset of genes that are associated with ER expression and are used to classify 'luminal' breast tumours. Thus, we are able to define the luminal phenotype independently of ER status. In contrast, our myoepithelial signature contains several members of the previously reported gene clusters identifying basal-like breast cancers. Some of these have been previously identified as myoepithelial genes in the normal breast epithelium, for example, TIMP3, SPARC, JAG1, PRSS11 and CAV-1 , and some of them, such as S100A7, SPARC and CNN1, have previously been shown individually to be correlated to poor outcome [5, 11, 47]. Since our cell type specific gene signatures were derived from phenotypically well characterised cell types compared to empirical stratification based on expression data, we were also able to identify a range of myoepithelial type genes in ER-positive tumours as well as those in basal-like breast cancers. Thus, although the majority of the primary breast tumours within our malignant pool were ER-positive 'luminal' tumours, a significant number of up-regulated gene sets also showed myoepithelial expression. The observation of myoepithelial genes such as SFRP2, DCN, POSTN, LUM, COL1A2 and COL11A1, which showed higher expression in ER-positive compared to ER-negative breast tumours in two other breast cancer tumour profiling studies [48, 49], proved the value of such an approach and demonstrated the heterogeneity of breast tumours with respect to the levels of luminal epithelial and myoepithelial gene expression. The potential clinical significance of the expression of myoepithelial/basal genes in ER-positive tumours has been highlighted by recent data showing that the promoter DNA methylation of the classic myoepithelial marker S100A2 is correlated with a poor prognosis in ER-positive tumours . In contrast, increased levels of expression of phosphoserine aminotransferase (encoded by PSAT1), which was another gene also identified in our myoepithelial transcriptome, was the strongest predictive marker for a poor response to tamoxifen therapy in ER-positive tumours . Our observation that the malignant epithelial expression of POSTN, also a myoepithelial/basal gene, is associated with poorer survival (P = 0.0083) in ER-positive tumours demonstrates that the normal epithelial annotation of tumour transcripts can identify many other types of myoepithelial/basal genes, including those associated with a poor outcome.
An important question is whether the expression of myoepithelial/basal genes in breast cancers are responsible for the prognosis and poor response to therapy or are merely surrogate markers thereof. There are several lines of evidence to suggest that POSTN may play a role in the biology of breast cancer [51, 52]. POSTN is a ligand of αvβ3 integrins and promotes adhesion and migration of epithelial cells . Clinical studies of periostin expression in human cancers have demonstrated that increased expression of POSTN is correlated with tumour angiogenesis and metastasis [52–54]. In primary breast tumours, POSTN causes up-regulation of vascular endothelial growth factor receptor (VEGFR)-2 in endothelial cells . Elevated expression of VEGFs, the ligands for the VEGF receptors, as observed in some breast carcinomas as well as in our study, provides synergistic paracrine signalling through VEGFR-2 on endothelial cells, potentially promoting angiogenesis and dissemination. Although the expression of POSTN shows a weak correlation with Ki67 immunoreactivity, there is no evidence to suggest that POSTN itself influences proliferation or is a surrogate marker of proliferation rate. Rather, it seems more likely that that its prognostic significance may be due to the altered therapeutic responses of POSTN positive tumours to drugs like tamoxifen. The fact that tumour-specific expression of VEGFR-2 has been associated with an impaired response to tamoxifen therapy in ER-positive premenopausal breast cancer  is in line with the poor prognosis of this cohort of breast cancers. Therefore, further studies are required to investigate if POSTN positivity is correlated with VEGFR-2 expression, thereby providing a molecular mechanism that links POSTN to endocrine resistance for ER-positive breast tumours.
Metastasis to bone occurs frequently in advanced breast cancer and is accompanied by debilitating skeletal complications . Among the up-regulated gene sets in the malignant sample with enrichment in myoepithelial/basal type genes in this study was a small family of genes involved in bone remodelling and skeletal development. Their expression in the human breast epithelial cells, including the normal myoepithelial cells, indicates that they play a significant role in epithelial cell biology, in addition to mesenchymal development. Many of these mesenchymal-specific genes, associated with osteoblasts, have previously been found overexpressed in other primary breast tumours . By acquiring the expression of such mesenchymal genes, the malignant epithelial breast cells may have an advantage in growth in the bone environment correlating with progression into a more aggressive cancer phenotype. Targeting such genes and proteins might, therefore, be a means of suppressing this phenomenon.
In the past decade, several different expression and proteomics studies on purified cell populations of normal luminal and myoepithelial, as well as tumour enriched cell populations, have been carried out [11–13, 58, 59]. Genes characterising these cell types have been identified, some of which showed altered expression levels in the malignant compared to the normal breast epithelium. In this study, we have taken this profiling forward by comprehensively defining the transcriptomes of highly enriched normal and malignant breast epithelial cell populations on a genome wide scale using multiple technologies. We present here, for the first time, co-regulated breast tumour-associated gene sets enriched in either luminal or myoepithelial-type genes. These data are important for evaluating the breast cancer stratification systems based on established expression profiling, in which luminal and basal phenotypes have been shown to be prognostically significant. Further analysis of these related gene subsets, including expression studies in individual tumours, will assist in our understanding of the mechanisms involved in the initiation and progression of breast cancer, and the loss or acquisition of luminal or myoepithelial phenotypes in breast tumours. This will lead to the identification of additional luminal and basal markers and targets, with importance in the biology of breast cancer and its treatment.
cartilage oligomeric matrix protein
differential tumour epithelial transcriptome
gene set enrichment analysis
human transcriptome database
mitogen-activated protein kinases
minimum information about a microarray experiment
massively parallel signature sequencing
reverse transcription PCR
serial analysis of gene expression
transcripts per million
vascular endothelial growth factor
vascular endothelial growth factor receptor.
This work was funded by the Ludwig Institute for Cancer Research and Breakthrough Breast Cancer. Further thanks are due to Prof. K Felsenstein, Vienna University of Technology, Austria, for statistical advice; and Prof. S Lakhani for reviewing the pathology of the tumours used for MPSS and microarray analysis.
- Simpson PT, Reis-Filho JS, Gale T, Lakhani SR: Molecular evolution of breast cancer. J Pathol. 2005, 205: 248-254. 10.1002/path.1691.PubMedView ArticleGoogle Scholar
- Abd El-Rehim DM, Ball G, Pinder SE, Rakha E, Paish C, Robertson JF, Macmillan D, Blamey RW, Ellis IO: High-throughput protein expression analysis using tissue microarray technology of a large well-characterised series identifies biologically distinct classes of breast cancer confirming recent cDNA expression analyses. Int J Cancer. 2005, 116: 340-350. 10.1002/ijc.21004.PubMedView ArticleGoogle Scholar
- Nielsen TO, Hsu FD, Jensen K, Cheang M, Karaca G, Hu Z, Hernandez-Boussard T, Livasy C, Cowan D, Dressler L, et al: Immunohistochemical and clinical characterization of the basal-like subtype of invasive breast carcinoma. Clin Cancer Res. 2004, 10: 5367-5374. 10.1158/1078-0432.CCR-04-0220.PubMedView ArticleGoogle Scholar
- Perou CM, Sorlie T, Eisen MB, van de Rijn M, Jeffrey SS, Rees CA, Pollack JR, Ross DT, Johnsen H, Akslen LA, et al: Molecular portraits of human breast tumours. Nature. 2000, 406: 747-752. 10.1038/35021093.PubMedView ArticleGoogle Scholar
- Sorlie T, Perou CM, Tibshirani R, Aas T, Geisler S, Johnsen H, Hastie T, Eisen MB, van de Rijn M, Jeffrey SS, et al: Gene expression patterns of breast carcinomas distinguish tumor subclasses with clinical implications. Proc Natl Acad Sci USA. 2001, 98: 10869-10874. 10.1073/pnas.191367098.PubMedPubMed CentralView ArticleGoogle Scholar
- Rouzier R, Perou CM, Symmans WF, Ibrahim N, Cristofanilli M, Anderson K, Hess KR, Stec J, Ayers M, Wagner P, et al: Breast cancer molecular subtypes respond differently to preoperative chemotherapy. Clin Cancer Res. 2005, 11: 5678-5685. 10.1158/1078-0432.CCR-04-2421.PubMedView ArticleGoogle Scholar
- Clarke C, Titley J, Davies S, O'Hare MJ: An immunomagnetic separation method using superparamagnetic (MACS) beads for large-scale purification of human mammary luminal and myoepithelial cells. Epithelial Cell Biol. 1994, 3: 38-46.PubMedGoogle Scholar
- O'Hare MJ, Ormerod MG, Monaghan P, Lane EB, Gusterson BA: Characterization in vitro of luminal and myoepithelial cells isolated from the human mammary gland by cell sorting. Differentiation. 1991, 46: 209-221. 10.1111/j.1432-0436.1991.tb00883.x.PubMedView ArticleGoogle Scholar
- Adam PJ, Berry J, Loader JA, Tyson KL, Craggs G, Smith P, De Belin J, Steers G, Pezzella F, Sachsenmeir KF, et al: Arylamine N-acetyltransferase-1 is highly expressed in breast cancers and conveys enhanced growth and resistance to etoposide in vitro. Mol Cancer Res. 2003, 1: 826-835.PubMedGoogle Scholar
- Page MJ, Amess B, Townsend RR, Parekh R, Herath A, Brusten L, Zvelebil MJ, Stein RC, Waterfield MD, Davies SC, et al: Proteomic definition of normal human luminal and myoepithelial breast cells purified from reduction mammoplasties. Proc Natl Acad Sci USA. 1999, 96: 12589-12594. 10.1073/pnas.96.22.12589.PubMedPubMed CentralView ArticleGoogle Scholar
- Jones C, Mackay A, Grigoriadis A, Cossu A, Reis-Filho JS, Fulford L, Dexter T, Davies S, Bulmer K, Ford E, et al: Expression profiling of purified normal human luminal and myoepithelial breast cells: identification of novel prognostic markers for breast cancer. Cancer Res. 2004, 64: 3037-3045. 10.1158/0008-5472.CAN-03-2028.PubMedView ArticleGoogle Scholar
- Porter DA, Krop IE, Nasser S, Sgroi D, Kaelin CM, Marks JR, Riggins G, Polyak K: A SAGE (serial analysis of gene expression) view of breast tumor progression. Cancer Res. 2001, 61: 5697-5702.PubMedGoogle Scholar
- Zucchi I, Mento E, Kuznetsov VA, Scotti M, Valsecchi V, Simionati B, Vicinanza E, Valle G, Pilotti S, Reinbold R, et al: Gene expression profiles of epithelial cells microscopically isolated from a breast-invasive ductal carcinoma and a nodal metastasis. Proc Natl Acad Sci USA. 2004, 101: 18147-18152. 10.1073/pnas.0408260101.PubMedPubMed CentralView ArticleGoogle Scholar
- Chang JC, Hilsenbeck SG, Fuqua SA: The promise of microarrays in the management and treatment of breast cancer. Breast Cancer Res. 2005, 7: 100-104. 10.1186/bcr1018.PubMedPubMed CentralView ArticleGoogle Scholar
- Robison JE, Perreard L, Bernard PS: State of the science: molecular classifications of breast cancer for clinical diagnostics. Clin Biochem. 2004, 37: 572-578. 10.1016/j.clinbiochem.2004.05.002.PubMedView ArticleGoogle Scholar
- Shen D, He J, Chang HR: In silico identification of breast cancer genes by combined multiple high throughput analyses. Int J Mol Med. 2005, 15: 205-212.PubMedGoogle Scholar
- van Ruissen F, Ruijter JM, Schaaf GJ, Asgharnegad L, Zwijnenburg DA, Kool M, Baas F: Evaluation of the similarity of gene expression data estimated with SAGE and Affymetrix GeneChips. BMC Genomics. 2005, 6: 91-10.1186/1471-2164-6-91.PubMedPubMed CentralView ArticleGoogle Scholar
- Brenner S, Johnson M, Bridgham J, Golda G, Lloyd DH, Johnson D, Luo S, McCurdy S, Foy M, Ewan M, et al: Gene expression analysis by massively parallel signature sequencing (MPSS) on microbead arrays. Nat Biotechnol. 2000, 18: 630-634. 10.1038/76469.PubMedView ArticleGoogle Scholar
- Brenner S, Williams SR, Vermaas EH, Storck T, Moon K, McCollum C, Mao JI, Luo S, Kirchner JJ, Eletr S, et al: In vitro cloning of complex mixtures of DNA on microbeads: physical separation of differentially expressed cDNAs. Proc Natl Acad Sci USA. 2000, 97: 1665-1670. 10.1073/pnas.97.4.1665.PubMedPubMed CentralView ArticleGoogle Scholar
- ArrayExpress. [http://www.ebi.ac.uk/arrayexpress/]
- Iseli C, Stevenson BJ, de Souza SJ, Samaia HB, Camargo AA, Buetow KH, Strausberg RL, Simpson AJ, Bucher P, Jongeneel CV: Long-range heterogeneity at the 3' ends of human mRNAs. Genome Res. 2002, 12: 1068-1074. 10.1101/gr.62002. Article published online before print in June 2002.PubMedPubMed CentralView ArticleGoogle Scholar
- Jongeneel CV, Iseli C, Stevenson BJ, Riggins GJ, Lal A, Mackay A, Harris RA, O'Hare MJ, Neville AM, Simpson AJ, et al: Comprehensive sampling of gene expression in human cell lines with massively parallel signature sequencing. Proc Natl Acad Sci USA. 2003, 100: 4702-4705. 10.1073/pnas.0831040100.PubMedPubMed CentralView ArticleGoogle Scholar
- Naef F, Huelsken J: Cell-type-specific transcriptomics in chimeric models using transcriptome-based masks. Nucleic Acids Res. 2005, 33: e111-10.1093/nar/gni104.PubMedPubMed CentralView ArticleGoogle Scholar
- The R Project for Statistical Computing. [http://www.r-project.org/]
- Smyth GK: Linear models and empirical Bayes methods for assessing differential expression in microarray experiments. Stat Appl Genet Mol Biol. 2004, 3: Article 3-Google Scholar
- Bioconductor. [http://www.bioconductor.org/]
- Irizarry RA, Hobbs B, Collin F, Beazer-Barclay YD, Antonellis KJ, Scherf U, Speed TP: Exploration, normalization, and summaries of high density oligonucleotide array probe level data. Biostatistics. 2003, 4: 249-264. 10.1093/biostatistics/4.2.249.PubMedView ArticleGoogle Scholar
- Benjamini Y, Hochberg Y: Controlling the false discovery rate: a practical and powerful approach to multiple testing. J R Stat Soc. 1995, 57: 289-300.Google Scholar
- Draghici S, Khatri P, Bhavsar P, Shah A, Krawetz SA, Tainsky MA: Onto-Tools, the toolkit of the modern biologist: Onto-Express, Onto-Compare, Onto-Design and Onto-Translate. Nucleic Acids Res. 2003, 31: 3775-3781. 10.1093/nar/gkg624.PubMedPubMed CentralView ArticleGoogle Scholar
- Onto-Express. [http://vortex.cs.wayne.edu/ontoexpress]
- Subramanian A, Tamayo P, Mootha VK, Mukherjee S, Ebert BL, Gillette MA, Paulovich A, Pomeroy SL, Golub TR, Lander ES, et al: Gene set enrichment analysis: a knowledge-based approach for interpreting genome-wide expression profiles. Proc Natl Acad Sci USA. 2005, 102: 15545-15550. 10.1073/pnas.0506580102.PubMedPubMed CentralView ArticleGoogle Scholar
- Reis-Filho JS, Savage K, Lambros MB, James M, Steele D, Jones RL, Dowsett M: Cyclin D1 protein overexpression and CCND1 amplification in breast carcinomas: an immunohistochemical and chromogenic in situ hybridisation analysis. Mod Pathol. 2006, 19: 999-1009. 10.1038/modpathol.3800621.PubMedView ArticleGoogle Scholar
- Reis-Filho JS, Steele D, Di Palma S, Jones RL, Savage K, James M, Milanezi F, Schmitt FC, Ashworth A: Distribution and significance of nerve growth factor receptor (NGFR/p75(NTR)) in normal, benign and malignant breast tissue. Mod Pathol. 2006, 19: 307-319. 10.1038/modpathol.3800542.PubMedView ArticleGoogle Scholar
- Lehmann: Testing Statistical Hypothesis. 1986, New York: WileyView ArticleGoogle Scholar
- Dai H, van't Veer L, Lamb J, He YD, Mao M, Fine BM, Bernards R, van de Vijver M, Deutsch P, Sachs A, et al: A cell proliferation signature is a marker of extremely poor outcome in a subpopulation of breast cancer patients. Cancer Res. 2005, 65: 4059-4066. 10.1158/0008-5472.CAN-04-3953.PubMedView ArticleGoogle Scholar
- Whitfield ML, George LK, Grant GD, Perou CM: Common markers of proliferation. Nat Rev Cancer. 2006, 6: 99-106. 10.1038/nrc1802.PubMedView ArticleGoogle Scholar
- Cheng JM, Ding M, Aribi A, Shah P, Rao K: Loss of RAB25 expression in breast cancer. Int J Cancer. 2006, 118: 2957-2964. 10.1002/ijc.21739.PubMedView ArticleGoogle Scholar
- Newton G, Weremowicz S, Morton CC, Copeland NG, Gilbert DJ, Jenkins NA, Lawler J: Characterization of human and mouse cartilage oligomeric matrix protein. Genomics. 1994, 24: 435-439. 10.1006/geno.1994.1649.PubMedView ArticleGoogle Scholar
- Jongeneel CV, Delorenzi M, Iseli C, Zhou D, Haudenschild CD, Khrebtukova I, Kuznetsov D, Stevenson BJ, Strausberg RL, Simpson AJ, et al: An atlas of human gene expression from massively parallel signature sequencing (MPSS). Genome Res. 2005, 15: 1007-1014. 10.1101/gr.4041005.PubMedPubMed CentralView ArticleGoogle Scholar
- Irizarry RA, Warren D, Spencer F, Kim IF, Biswal S, Frank BC, Gabrielson E, Garcia JG, Geoghegan J, Germino G, et al: Multiple-laboratory comparison of microarray platforms. Nat Methods. 2005, 2: 345-350. 10.1038/nmeth756.PubMedView ArticleGoogle Scholar
- Oudes AJ, Roach JC, Walashek LS, Eichner LJ, True LD, Vessella RL, Liu AY: Application of Affymetrix array and massively parallel signature sequencing for identification of genes involved in prostate cancer progression. BMC Cancer. 2005, 5: 86-10.1186/1471-2407-5-86.PubMedPubMed CentralView ArticleGoogle Scholar
- Draghici S, Khatri P, Eklund AC, Szallasi Z: Reliability and reproducibility issues in DNA microarray measurements. Trends Genet. 2006, 22: 101-109. 10.1016/j.tig.2005.12.005.PubMedPubMed CentralView ArticleGoogle Scholar
- Bendre MS, Montague DC, Peery T, Akel NS, Gaddy D, Suva LJ: Interleukin-8 stimulation of osteoclastogenesis and bone resorption is a mechanism for the increased osteolysis of metastatic bone disease. Bone. 2003, 33: 28-37. 10.1016/S8756-3282(03)00086-3.PubMedView ArticleGoogle Scholar
- Green AR, Green VL, White MC, Speirs V: Expression of cytokine messenger RNA in normal and neoplastic human breast tissue: identification of interleukin-8 as a potential regulatory factor in breast tumours. Int J Cancer. 1997, 72: 937-941. 10.1002/(SICI)1097-0215(19970917)72:6<937::AID-IJC3>3.0.CO;2-Q.PubMedView ArticleGoogle Scholar
- Kothari MS, Ali S, Buluwela L, Livni N, Shousha S, Sinnett HD, Vashisht R, Thorpe P, Van Noorden S, Coombes RC, et al: Purified malignant mammary epithelial cells maintain hormone responsiveness in culture. Br J Cancer. 2003, 88: 1071-1076. 10.1038/sj.bjc.6600866.PubMedPubMed CentralView ArticleGoogle Scholar
- Sorlie T, Tibshirani R, Parker J, Hastie T, Marron JS, Nobel A, Deng S, Johnsen H, Pesich R, Geisler S, et al: Repeated observation of breast tumor subtypes in independent gene expression data sets. Proc Natl Acad Sci USA. 2003, 100: 8418-8423. 10.1073/pnas.0932692100.PubMedPubMed CentralView ArticleGoogle Scholar
- Emberley ED, Niu Y, Curtis L, Troup S, Mandal SK, Myers JN, Gibson SB, Murphy LC, Watson PH: The S100A7-c-Jun activation domain binding protein 1 pathway enhances poor survival pathways in breast cancer. Cancer Res. 2005, 65: 5696-5702. 10.1158/0008-5472.CAN-04-3927.PubMedView ArticleGoogle Scholar
- Wang Y, Klijn JG, Zhang Y, Sieuwerts AM, Look MP, Yang F, Talantov D, Timmermans M, Meijer-van Gelder ME, Yu J, et al: Gene-expression profiles to predict distant metastasis of lymph-node-negative primary breast cancer. Lancet. 2005, 365: 671-679.PubMedView ArticleGoogle Scholar
- West M, Blanchette C, Dressman H, Huang E, Ishida S, Spang R, Zuzan H, Olson JA, Marks JR, Nevins JR: Predicting the clinical status of human breast cancer by using gene expression profiles. Proc Natl Acad Sci USA. 2001, 98: 11462-11467. 10.1073/pnas.201162998.PubMedPubMed CentralView ArticleGoogle Scholar
- Martens JW, Nimmrich I, Koenig T, Look MP, Harbeck N, Model F, Kluth A, Bolt-de Vries J, Sieuwerts AM, Portengen H, et al: Association of DNA methylation of phosphoserine aminotransferase with response to endocrine therapy in patients with recurrent breast cancer. Cancer Res. 2005, 65: 4101-4117. 10.1158/0008-5472.CAN-05-0064.PubMedView ArticleGoogle Scholar
- Gillan L, Matei D, Fishman DA, Gerbin CS, Karlan BY, Chang DD: Periostin secreted by epithelial ovarian carcinoma is a ligand for alpha(V)beta(3) and alpha(V)beta(5) integrins and promotes cell motility. Cancer Res. 2002, 62: 5358-5364.PubMedGoogle Scholar
- Shao R, Bao S, Bai X, Blanchette C, Anderson RM, Dang T, Gishizky ML, Marks JR, Wang XF: Acquired expression of periostin by human breast cancers promotes tumor angiogenesis through up-regulation of vascular endothelial growth factor receptor 2 expression. Mol Cell Biol. 2004, 24: 3992-4003. 10.1128/MCB.24.9.3992-4003.2004.PubMedPubMed CentralView ArticleGoogle Scholar
- Sasaki H, Dai M, Auclair D, Kaji M, Fukai I, Kiriyama M, Yamakawa Y, Fujii Y, Chen LB: Serum level of the periostin, a homologue of an insect cell adhesion molecule, in thymoma patients. Cancer Lett. 2001, 172: 37-42. 10.1016/S0304-3835(01)00633-4.PubMedView ArticleGoogle Scholar
- Sasaki H, Yu CY, Dai M, Tam C, Loda M, Auclair D, Chen LB, Elias A: Elevated serum periostin levels in patients with bone metastases from breast but not lung cancer. Breast Cancer Res Treat. 2003, 77: 245-252. 10.1023/A:1021899904332.PubMedView ArticleGoogle Scholar
- Ryden L, Jirstrom K, Bendahl PO, Ferno M, Nordenskjold B, Stal O, Thorstenson S, Jonsson PE, Landberg G: Tumor-specific expression of vascular endothelial growth factor receptor 2 but not vascular endothelial growth factor or human epidermal growth factor receptor 2 is associated with impaired response to adjuvant tamoxifen in premenopausal breast cancer. J Clin Oncol. 2005, 23: 4695-4704. 10.1200/JCO.2005.08.126.PubMedView ArticleGoogle Scholar
- Coleman RE: Conclusion: Bone markers in metastatic bone disease. Cancer Treat Rev. 2006, 32 (Suppl 1): 27-28. 10.1016/S0305-7372(06)80007-1.PubMedView ArticleGoogle Scholar
- Segal E, Friedman N, Koller D, Regev A: A module map showing conditional activity of expression modules in cancer. Nat Genet. 2004, 36: 1090-1098.PubMedView ArticleGoogle Scholar
- Barsky SH: Myoepithelial mRNA expression profiling reveals a common tumor-suppressor phenotype. Exp Mol Pathol. 2003, 74: 113-122. 10.1016/S0014-4800(03)00011-X.PubMedView ArticleGoogle Scholar
- Polyak K, Hu M: Do myoepithelial cells hold the key for breast tumor progression?. J Mammary Gland Biol Neoplasia. 2005, 10: 231-247. 10.1007/s10911-005-9584-6.PubMedView ArticleGoogle Scholar
This article is published under license to BioMed Central Ltd. This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.