Regulation of mRNA expression in breast cancer - a cis-tematic trans-action

Large research consortia such as the Molecular Taxonomy of Breast Cancer International Consortium (METABRIC), The Cancer Genome Atlas and International Cancer Genomics Consortium are systematically interrogating large sets of tumor samples through integrated analysis of genome-wide DNA copy number and promoter methylation, transcriptome-wide RNA expression, protein expression and exome-wide sequencing. A recent METABRIC study explored the effects of cis-acting and trans-acting factors of gene expression regulation in breast cancer. By making their data sets publicly available, these large consortia are inviting new types of analysis that have the potential to drive breast cancer research into previously unexplored avenues.

Th e central dogma of molecular biology dictates that DNA is transcribed to mRNA, which is translated to protein. DNA dosage is frequently altered in cancer and is an important determinant of mRNA expression. Transcription is organized by trans-acting transcription factors or transcription factor complexes that associate with binding sites of a specifi c sequence. Th e number and binding affi nity of such cis-binding elements provide a mechanism of transcription regulation [1]. DNA copy number changes aff ect the cell by altering the amount of DNA for transcription factors to act upon (cis) or by altering the production of transcription factors that would alter the expression of genes elsewhere (trans). Chromosomal gains and losses thus lead to increased or decreased numbers of mRNA molecules that are transcribed from the altered locus, thereby providing a proliferative advantage to the cell. Elucidating the mechanisms underlying the eff ects of DNA copy number aberrations on expression regulation in cancer would aid in identifying master regulators and the design of therapeutic modalities that specifi cally block key elements in a regulatory network.
Transcript levels are regulated through multiple processes, including chromatin organization and modification, and eff ects of microRNAs and long non-coding RNAs. Th e multitude of regulatory mechanisms complicates the eff ective unraveling of cis-and trans-acting factors. Th e Molecular Taxonomy of Breast Cancer International Consortium (METABRIC) circumvented these complicating factors through a tour-de-force approach [2]. In a collaborative eff ort of British and Canadian institutes, METABRIC collected a very large number of fresh frozen breast cancer tissues (n = 2,136) with long-term follow-up and generated gene expression, genotype and DNA copy number profi les for all cases. Th is eff ort resulted in a data set of previously unmatched proportions, adding signifi cantly to the number of breast cancer genomic profi les available in the public domain (Th e European Genome-phenome Archive [3] accession number EGAS00000000083).
After separating the sample cohort into a training set of approximately 1,000 samples and a similarly sized validation data set, which contained those profi les of lower cellularity or with missing matching normal sample data, the investigators performed integrated analysis of DNA copy number and transcript levels in order to identify target genes of DNA copy number alterations. Th e large sample size of the data set also allowed investigation of expression quantitative trait loci, which are chromosomal segments whose genotypes or copy number levels show an association with the expression levels of distal genes (>3 Mb). Combined, the two types of analysis were aimed at exposing the cis-and trans-circuitry of breast cancer and eff ectively showed that whereas trans-acting loci infl uence expression of a larger number of transcripts, cis-acting loci had a more pronounced eff ect on transcript levels. Th e availability of a large validation data set allowed the investigators to provide convincing results. Importantly, recent studies of breast cancer sample purity have shown that the average tumor cellularity of breast

Abstract
Large research consortia such as the Molecular Taxonomy of Breast Cancer International Consortium (METABRIC), The Cancer Genome Atlas and International Cancer Genomics Consortium are systematically interrogating large sets of tumor samples through integrated analysis of genomewide DNA copy number and promoter methylation, transcriptome-wide RNA expression, protein expression and exome-wide sequencing. A recent METABRIC study explored the eff ects of cis-acting and trans-acting factors of gene expression regulation in breast cancer. By making their data sets publicly available, these large consortia are inviting new types of analysis that have the potential to drive breast cancer research into previously unexplored avenues.
cancer tissues is about 49% [4]. Th e large number of stromal cells, immune cells and tumor-adjacent normal tissue contributes signifi cantly to gene expression levels, which is refl ected by the prominent trans-acting association of a T-cell receptor loci gene expression module identifi ed by the investigators. Samples highly expressing the T-cell receptor gene set represent one of the 10 patient clusters that were generated through clustering of expression profi les based on the expression levels of the 1,000 genes, which were most strongly regulated in cis. Interestingly, the HER2 and basal gene expression subtypes that are covered by the PAM50 gene expression classifi cation of breast cancer [5] are also identifi ed by cis-element clustering. Th is suggests that the dominant eff ects driving the HER2 and basal gene expression subtypes are due to alterations in copy number of the protein targets directly rather than of transcription factors that act in trans.
As in publications from Th e Cancer Genome Atlas Research Network [6,7], the analysis described in the METABRIC publication only scratches the surface of what is possible with a data set of this magnitude. Th e large number of samples makes the data set particularly useful for identifi cation of mutually exclusive copy number alterations, as well as co-occurring abnormalities [8], but also for further exploration of emergent questions, such as the contributions of non-tumor cells to breast cancer homeostasis.
Th e METABRIC study has eff ectively characterized the DNA copy number alteration landscape of breast cancer. Th ese studies are most powerful when integrated with a series of recent studies that have deciphered the mutational landscape of breast cancer through wholegenome and whole-exome sequencing of four independent cohorts containing 80 to 100 breast cancer tissues [9][10][11][12]. Th ese eff orts uncovered frequent mutations of genes in the mitogen-activated protein kinase (MAPK) pathway and further highlighted the important role of the phosphoinositide 3-kinase (PI3K) pathway as rational therapeutic targets. Integrated analysis of point mutations, methylation profi les, DNA copy number alterations, gene expression, and functional proteomics, as is under way by Th e Cancer Genome Atlas (TCGA) Research Network, further improves our understanding of breast cancer tumorigenesis [13]. Using genome sequencing, these and other recent studies showed that breast tumors harbor a clonal hierarchy in which genomic abnormalities may only be present in a subset of tumor cells [14,15]. Such analytical approaches may distinguish tumorinitiating changes from abnormalities that lead to tumor progression and provide insights into the temporal order of genomic alteration events during breast cancer progression [11,[14][15][16]. Th is may be particularly important, given marked intratumor heterogeneity, by identi fying aberrations present in all tumor cells that, if druggable, would represent optimal targets.
Large consortia such as METABRIC, TCGA and the International Cancer Genomics Consortium are generating high quality data sets that invite creative research questions and build a foundation for data analysis for years to come, providing insights into breast cancer tumorigenesis that have never before been possible. Whereas this makes the current era of breast cancer research more exciting than ever before, the key to translating these fi ndings into palpable clinical improvements has not yet been determined. With the emergence of these high quality data sets, the emphasis of genomic studies is rapidly shifting away from data generation towards meta-analysis, integrated analysis, data mining and computational analysis of existing data. With petabytes of high content data available, asking the right research question has become a critical factor towards fi nding a curative strategy for this complex disease.