© BioMed Central Ltd 2009
Published: 18 December 2009
Skip to main content
© BioMed Central Ltd 2009
Published: 18 December 2009
Next-generation sequencing (also known as massively parallel sequencing) technologies are revolutionising our ability to characterise cancers at the genomic, transcriptomic and epigenetic levels. Cataloguing all mutations, copy number aberrations and somatic rearrangements in an entire cancer genome at base pair resolution can now be performed in a matter of weeks. Furthermore, massively parallel sequencing can be used as a means for unbiased transcriptomic analysis of mRNAs, small RNAs and noncoding RNAs, genome-wide methylation assays and high-throughput chromatin immunoprecipitation assays. Here, I discuss the potential impact of this technology on breast cancer research and the challenges that come with this technological breakthrough.
Since the publication of the first draft of the human genome sequence [1, 2], the field of genomics has changed dramatically. Most importantly, the availability of this information has led to a technological boom, with the development of high-throughput methods that could be used to interrogate the wealth of data available in the human genome and transcriptome. The fields of genomic and transcriptomic science have expanded at an unprecedented pace.
In the past decade we have witnessed the rise of microarrays, a technology that has been extensively applied to the study of cancer genomes and transcriptomes. Of all solid cancers, breast cancer has been the most comprehensively studied using these methods. Although some of the promises of microarrays have not materialised in the time frame some of the proponents of this technology have foreseen, the high-throughput data generated in microarray-based experiments have changed the way breast cancer is perceived [3, 4]. The approach has brought to the forefront of cancer research the concepts of breast cancer heterogeneity - that distinct molecular subtypes of breast cancer are underpinned by distinct genetic and epigenetic aberrations, and that distinct subtypes of breast cancer may have their prognosis and response to therapy governed by distinct molecular pathways and networks [5, 6]. It should be noted, however, that microarray-based expression profiling and comparative genomic hybridisation provide data with important limitations. For instance, microarray-based expression profiling only provides a semiquantitative assessment of gene expression; it is limited by the nature of the probes included in the platform and their sensitivity and specificity. Comparative genomic hybridisation and SNP array analysis have provided a wealth of data on gene copy number aberrations in breast cancer and have helped identify potential therapeutic targets for subgroups of breast cancer patients; however, this technology does not provide any information about structural genomic aberrations and base pair mutations .
An ideal tool for the genetic characterisation of cancers is one that could provide information about copy number aberrations, allelic information, somatic rearrangements and base pair mutations in a single experiment . Furthermore, data generated with such technology should be presented in such a way that the presence of cells other than cancer cells in the samples would not constitute an insurmountable hurdle. Such a tool, a few years ago, would belong to the realms of science fiction.
Technology, however, has evolved at an unprecedented pace. We are currently witnessing yet another molecular revolution, one that will most certainly dwarf the paradigm shifts brought about by the introduction of microarrays: the advent of massively parallel sequencing (also known as next-generation sequencing). This technology allows for the accrual of qualitative and quantitative information about any type of nucleic acid in a given sample at an incredible throughput while incurring relatively limited costs (reviewed in [8–13]).
For the past 15 years, Sanger sequencing and fluorescence-based electrophoresis technologies have been extensively used in somatic and germline genetic studies. Improvements in instrumentation coupled with the development of high-performance computing and bioinformatics have reduced the cost of sequencing. However, increases in the throughput of Sanger DNA sequencing are achieved by the use of additional sequencers in parallel, owing to the requirement of gel electrophoresis or additional wells for the capillary sequencing of each reaction.
Summary of massively parallel sequencing technologies
Read length (base pairs)
Templates per run
Commercially available technologies
~900 to 1,100
454 FLX Roche
400 Mb/run/7.5 to 8 hours
Illumina (Solexa) Genome Analyzer
36 to 175
>17 Gb/run/3 to 6 days
10 to 15 Gb/run/6 days
30 to 35
21 to 28 Gb/run/8 days
Single molecule sequence by synthesis
Technologies in development
Single molecule real-time DNA sequencing
Sequence by synthesis
Base-specific FRET emission
ZSG atomic labelling and electron microscopy
Perhaps more important than the sequencing throughput provided by this technology and its relative low cost compared with traditional sequencing methods is the type of data it generates. Instead of long reads generated from a PCR-amplified sample, massively parallel sequencing methods provide much shorter reads (~21 to ~400 base pairs), but millions of them [8–13]. Unlike previous sequencing methods that required DNA amplification (that is, the final sequence was representative of modal population of DNA templates), sequencing can now be performed from single DNA molecules. The short reads generated in the sequencing of each DNA molecule can be counted and quantified, allowing the identification of mutations in nonmodal populations of cells (that is, identification of a somatic mutation in a small subpopulation of cells immersed in a modal population with wild-type sequences) and accurate copy number assessment of each genomic region ( and references therein). In addition, with the recent introduction of approaches that allow for the sequences of both ends of a DNA molecule (that is, paired end massively parallel sequencing or mate pair sequencing), it has become possible to detect balanced and unbalanced somatic rearrangements (that is, fusion genes) in a genome-wide fashion [12, 14, 15].
Not surprisingly, this massive increase in throughput has come at a cost, with the accuracy of each short read being significantly lower than the output generated from Sanger sequencing. Although this is circumvented by the depth of sequencing (that is, multiple reads of the same region), it is accepted that physical validation using traditional sequencing methods is required. Note that each type of next generation sequencing leads to specific types of artefacts (reviewed in [8–13]); however, as we are writing the book on next-generation sequencing as we go along, one should be aware of unexpected artefacts and new findings should be interpreted with caution.
Next-generation sequencing has already been applied to re-sequencing studies, which have led to sequencing of complete normal and cancer genomes being performed in a matter of weeks [16–18]. Massively parallel sequencing can be employed for the simultaneous characterisation of cancer genomes in terms of somatic base pair and in-del mutations, balanced and unbalanced rearrangements, and copy number changes in a single experiment [14, 18]. Apart from sequencing whole genomes, massively parallel sequencing can be coupled with DNA capturing methods for focused analysis of specific genomic regions, specific genes or the whole exome . In fact, the Breast Cancer International Cancer Genome Consortium has pledged to complete sequencing the genome of 1,500 breast cancers . This study will provide a comprehensive catalogue of the genetic alterations found in breast cancer in general and in the different subtypes of the disease.
Massively parallel sequencing can be applied to germline DNA for gene association studies and for the analysis of cancer genomes [8–14], and may constitute a paradigm shift in the way mutations that cause rare diseases can be identified. In fact, the power of this technology to unravel genes whose germline mutations cause rare mendelian disorders is exemplified by the identification of MYH3 germline mutations as a cause of Freeman-Sheldon syndrome through the targeted sequencing of all protein-coding regions (exomes) of four individuals with this syndrome and eight unrelated individuals . Although in the interpretation of results from target exome and whole genome sequencing studies of a small number of subjects, investigators will have to deal with the previously underestimated number of private SNPs and copy number DNA polymorphisms, the 1000 Genomes Project will provide a more complete catalogue of SNPs, copy number polymorphisms, and short insertion and deletion polymorphisms in the general population , which may facilitate the discovery of pathogenic germline mutations.
In addition to the ability to sequence DNA, massively parallel sequencing can be applied to sequencing RNA . Four main applications have already been developed - namely, digital gene expression, RNA sequencing, paired end RNA sequencing, and small and noncoding RNA sequencing. An in-depth discussion of these methods and their impact on our ability to perform transcriptomic analyses is beyond the scope of this short communication, and readers are referred to excellent reviews on this topic [13, 22]. Suffice it is to say these approaches have already led to the identification of multiple novel splice variants , novel gene rearrangements [24, 25] and novel fusion genes [26, 27], and to the identification of read-throughs , which are RNA molecules resultant from the co-splicing of two genes that are contiguous in the genome in the absence of a structural genomic aberration. When combined with DNA massively parallel sequencing, RNA sequencing has the potential to unravel RNA editing events, such as the nonsynonymous transcript editing of the COG3 and SRP9 genes in a meta-static invasive lobular carcinoma . Furthermore, massively parallel sequencing studies of noncoding and small RNAs coupled with the results of the ENCODE project  are likely to reveal a level of transcriptional regulation way beyond our current models.
Modifications of the protocols for massively parallel sequencing also allow for an unbiased assessment of DNA methylation [29, 30] and histone acetylation, and are likely to replace microarrays in the analysis of high-throughput immunochromatin precipitation assays [13, 31]. Next-generation sequencing is also replacing microarrays in high-throughput RNA interference screens: one can perform genomewide screens to identify genes that interfere with the viability of cancer cells using pools of short hairpin RNAs, and the results can be deconvoluted using next-generation sequencing . This latter approach is likely to provide a wealth of information on genes that are selectively required for cancer cell survival and potential drug targets.
The multiple applications and uses of massively parallel sequencing are likely to reshape several aspects of breast cancer research. Given the unprecedented ability to identify mutations, copy number aberrations and somatic rearrangements in cancer genomes, the information accrued by massively parallel sequencing of breast cancers may lead to a paradigm shift in the way breast cancers are classified. In fact, this technology offers a unique opportunity to move from the current descriptive and prognostic classification systems to a functional genomic taxonomy that is based on the molecular aberrations that drive specific subgroups of cancers, in a way akin to the classification system currently used for leukaemias and lymphomas. With the availability of information of the genetic alterations required for the survival of cells of a given cancer, tumours may be classified according to the genetic aberrations they harbour, according to the molecular networks activated or inactivated by these genetic aberrations, and, importantly, according to the agents these tumours are sensitive to.
Studies performing large-scale conventional sequencing of breast cancers [33, 34] revealed that there are a relatively low number of genes frequently mutated and a high number of genes rarely mutated in breast cancer. It should be noted, however, that the number of mutations found in oestrogen receptor-negative breast cancer cell lines  was higher than that found in an oestrogen receptor-positive breast cancer . It is therefore plausible that different types of breast cancer are driven by distinct constellations of genetic aberrations. It should be noted, however, that even tumours from the same type may be characterised by mutations of distinct genes in the same or complementary molecular networks, which would result in a similar phenotype.
Recent whole-genome characterisation of M1 leukaemias [35, 36] and of a metastatic deposit of an invasive lobular carcinoma of the breast  has demonstrated the power of this technology for the identification of novel potential mutations that drive specific subtypes of complex and heterogeneous diseases such as leukaemias and breast cancer, and has demonstrated how the mutational spectrum of a cancer evolves over time. Furthermore, next-generation sequencing analysis of cancer types whose tumours are rather homogeneous in terms of their molecular makeup, such as some special types of breast cancer [37–41], may lead to the identification of pathognomonic genomic alterations, in a way akin to C134Y FOXL2 mutations in granulosa cell tumours of the ovary . These driver genetic alterations (for example, mutations, amplifications and fusion genes) have the potential of being exploited as therapeutic targets.
Although the presence of non-neoplastic tissues (that is, stroma, inflammatory infiltrate and entrapped normal tissues) represents a challenge for the analysis of the genomes of preinvasive lesions, primary breast cancers and their meta-static deposits, there is evidence to suggest that if a tumour is sequenced at a sufficient depth then accurate sequences at base pair resolution can be obtained and somatic mutations identified .
Another important application of massively parallel sequencing due to its ability to deep sequence specific genomic regions is the identification of secondary mutations as mechanisms of resistance to specific agents [43, 44]. There are several lines of evidence to demonstrate that de novo and acquired resistance to some targeted therapies is driven by secondary mutations in the target genes (for example, the T790M mutation in the EGFR gene causing resistance to anti-epidermal growth factor receptor agents , and secondary KIT mutations leading to resistance to imatinib mesylate and sorafenib ) or in genes whose inactivation is synthetically lethal in the presence of the targeted therapy (for example, BRCA2 and BRCA1 revertant mutations as a mechanism of resistance to platinum salts and poly(ADP-ribose) polymerase inhibitors [47–49]).
It should be noted, however, that the deluge of data derived from next-generation sequencing studies might take a relatively long time to be translated into information that is clinically relevant. Given that each cancer genome may have an excess of 10,000 somatic mutations, it is unclear how much validation through the identification of recurrent mutations  or by laborious functional studies will be required to separate driver mutations (that is, those that either confer growth/survival advantage for a tumour or are required for the cancer cells for the maintenance of their malignant behaviour) from passenger alterations (that is, genomic noise). Furthermore, next-generation sequencing is likely to unravel a much greater complexity of the normal human genome in terms of SNPs and copy number polymorphisms , some of which may be confined to some somatic tissues in the same individual [51, 52]. Massively parallel sequencing will require an availability of high-performance computing and bioinformatic support that is way beyond that of most research laboratories. Furthermore, quality control and standardisation of the massively parallel sequencing experiments and data reporting are important issues to consider. Finally, the ethical aspects of next-generation sequencing are by no means trivial, and the readers are referred to excellent reviews covering these aspects [9, 11].
One could argue that massively parallel sequencing is not only an end, but also a means for performing experiments that may answer questions that could not even be asked previously. The revolution that is likely to be brought about by massively parallel sequencing methods is akin to the revolution fostered by the introduction of the PCR in the 1980s. It is undeniable that this technology will constitute a quantum leap in breast cancer basic and translational research; however, numerous challenges lie ahead. We ought to learn from our recent experience with microarrays, and avoid any sort of unjustified overoptimism. The greatest danger of using this revolutionary technology is that it comes with new problems; if we move too quickly, the lessons we are beginning to learn from previous high-throughput studies may be forgotten when massively parallel sequencing is applied to clinical and translational questions.
polymerase chain reaction
single nucleotide polymorphism
The author is grateful to the Molecular Pathology Team and Dr Britta Weigelt for critically reading this manuscript. JSR-F is funded in part by Breakthrough Breast Cancer. NHS funding to the NIHR Biomedical Research Centre is also acknowledged.
This article has been published as part of Breast Cancer Research Volume 11 Suppl 3 2009: Controversies in Breast Cancer 2009. The full contents of the supplement are available online at http://breast-cancer-research.com/supplements/11/S3.