High resolution genomic analysis of sporadic breast cancer using array-based comparative genomic hybridization

Introduction Genomic aberrations in the form of subchromosomal DNA copy number changes are a hallmark of epithelial cancers, including breast cancer. The goal of the present study was to analyze such aberrations in breast cancer at high resolution. Methods We employed high-resolution array comparative genomic hybridization with 4,134 bacterial artificial chromosomes that cover the genome at 0.9 megabase resolution to analyze 47 primary breast tumors and 18 breast cancer cell lines. Results Common amplicons included 8q24.3 (amplified in 79% of tumors, with 5/47 exhibiting high level amplification), 1q32.1 and 16p13.3 (amplified in 66% and 57% of tumors, respectively). Moreover, we found several positive correlations between specific amplicons from different chromosomes, suggesting the existence of cooperating genetic loci. Queried by gene, the most frequently amplified kinase was PTK2 (79% of tumors), whereas the most frequently lost kinase was PTK2B (hemizygous loss in 34% of tumors). Amplification of ERBB2 as measured by comparative genomic hybridization (CGH) correlated closely with ERBB2 DNA and RNA levels measured by quantitative PCR as well as with ERBB2 protein levels. The overall frequency of recurrent losses was lower, with no region lost in more than 50% of tumors; the most frequently lost tumor suppressor gene was RB1 (hemizygous loss in 26% of tumors). Finally, we find that specific copy number changes in cell lines closely mimicked those in primary tumors, with an overall Pearson correlation coefficient of 0.843 for gains and 0.734 for losses. Conclusion High resolution CGH analysis of breast cancer reveals several regions where DNA copy number is commonly gained or lost, that non-random correlations between specific amplicons exist, and that specific genetic alterations are maintained in breast cancer cell lines despite repeat passage in tissue culture. These observations suggest that genes within these regions are critical to the malignant phenotype and may thus serve as future therapeutic targets.


Introduction
Genomic instability is a hallmark of cancer, and specific subchromosomal copy number changes are thought to play a driving role in the transformation of normal cells to malignant clones. These genomic copy number changes may result in deletion of one or both alleles of tumor suppressor genes, overexpression of oncogenes and rearrangements that may alter transcription of target and downstream genes (reviewed in [1]). Several recent studies suggest that fixed genetic abnormalities in human cancers may be highly predictive of response to targeted therapeutics. For example, ERBB2 amplification may be more predictive of response to trastuzumab than protein overexpression with normal gene copy number (reviewed in [2]), and activating mutations in EGFR determine response to gefitinib [3,4]. aCGH = array-based comparative genomic hybridization; BAC = bacterial artificial chromosome; CGH = comparative genomic hybridization; CRO = common region of overlap; EGF = epidermal growth factor; IDC = infiltrating ductal carcinoma; ILC = infiltrating lobular carcinoma; IR = intensity ratio; Mb = megabase; Q-PCR = quantitative PCR.
There is an extensive literature on DNA copy number alterations in cancer using low resolution technology such as PCRbased allelotyping, spectral karyotyping, and metaphase comparative genomic hybridization (CGH). These studies, however, are limited in their ability to characterize specific abnormalities across the genome and to identify altered genes within the large regions defined by these methodologies. Nonetheless, when considering the breast cancer literature, these studies are consistent, frequently reporting the same regions of gain (1q, 8q, 11q, 17q, 20q) and loss (6q, 8p, 9p, 13q, 16q) [5][6][7][8][9][10][11].
More recent studies have employed higher resolution arraybased CGH (aCGH) to characterize primary tumors [10,[12][13][14]. These studies demonstrate the enormous complexity of cancer genomes, but also provide evidence that consistent, non-random patterns of copy number alterations are present in human cancers and support the hypothesis that selection for genomic changes conferring a proliferative advantage plays an important role in malignant transformation.
To further characterize the genomic alterations that may drive both transformation and response to targeted therapies, we developed an aCGH platform that covers the genome at 0.9 megabase (Mb) resolution [15]. Here we report the use of these arrays to define the genomic profile of 47 primary breast tumors and 18 breast cancer cell lines. Specifically, we evaluated the most common regions of gain and loss across the genome, assessed correlations with clinical parameters, characterized the ERRB2 locus and pathway in detail, and identified potentially cooperating genetic loci.

DNA copy number detection
Hematoxylin and eosin staining was used to define a region of tissue containing at least 70% tumor, which was dissected away from normal tissue using a scalpel. Thirty 20 µm sections were used for DNA isolation by proteinase K digestion followed by phenol/chloroform extraction in PhaseLock Gel tubes (Eppendorf, Westbury, NY, USA). An additional hematoxylin and eosin stained section was used to verify the persistence of at least 70% tumor in the residual tissue adjacent to the sectioned area. Normal genomic DNA, used as the reference probe for aCGH co-hybridization, was prepared from peripheral leukocytes of healthy female volunteers of diverse ethnic backgrounds using alkaline lysis. DNA from at least five donors was pooled equally before labeling.
For hybridization, 1 µg of test DNA and 1 µg of pooled normal human DNA were labeled with either Cy3-dCTP or Cy5-dCTP incorporated by random priming (Bioprime Labeling Kit, Invitrogen, Carlsbad, CA, USA). After overnight incubation at 37°C, labeling reactions were purified (MinElute PCR Cleanup, Qiagen, Valencia, CA, USA), and the tumor and normal DNAs combined and ethanol-precipitated with 100 µg human Cot-1 DNA (Invitrogen). DNAs were rehydrated in 50 µl of formamide-based hybridization buffer [16], denatured at 70°C for 15 minutes and re-annealed for 30 minutes at 37°C to block repetitive sequences. Tumor and normal DNAs were then cohybridized to the CGH arrays at 37°C for 72 h on a rotating platform and washed as described by Gray and colleagues [16]. For each sample, the test and reference DNA also were labeled with the opposite dye in a separate experiment ('dye swap') to account for differences in dye incorporation and provide additional data points for analysis.
Arrays were scanned on a GenePix 4000B scanner (Axon Instruments, Downingtown, PA, USA) and the composite tiff image was segmented using GenePix Pro 4.0 (Axon Instruments). Foreground (signal) and background intensities were generated separately for Cy3 and Cy5 channels, and the local background intensity was subtracted to generate a corrected intensity for each spot. The ratio of background corrected Cy3 to Cy5 values was then calculated for each spot and, because each BAC clone was printed on the array three times, these measurements were averaged to generate the intensity ratio (IR) for the clone. Two arrays (dye swap) were hybridized for each sample; therefore, a total of six measurements determine the relative DNA copy number of each BAC clone in the test sample relative to the reference sample.
Data were normalized and visualized using CGHAnalyzer [15], which is available for download from CGHcloneDB [17]. Copy number deviations from diploid were determined by BACs with IRs that differed significantly from a normal distribution representing diploid copy number which was developed using 50 normal:normal genomic DNA hybridizations on these arrays [15] (>2 standard deviations for both replicates). We use the following thresholds and terms throughout the text: IR >2.0 (high-level copy number gain, >5 copies), IR <0.5 (homozygous deletion).

DNA and transcript copy number validation
Genomic copy number alterations of selected regions were validated using a relative quantitative (Q)-PCR assay. For the ERBB2 region, the LightCycler HER2/neu DNA Quantification Kit (Roche, Nutley, NJ, USA) was used. An additional 50 × 20 µm sections were cut from 18 primary tumors (15 IDC, one mixed IDC/ILC, two ductal carcinoma in situ) directly into Trizol for RNA extraction and the RNA was further purified using the RNeasy Kit (Qiagen). ERBB2 transcript levels were determined using the High Capacity cDNA Archive Kit, and a TaqMan © Microfluidic Card Assay on Demand (Hs. 00170433_m1; Applied Biosystems, Foster City, CA, USA). The 18S ribosomal subunit assay (Hs. 99999904_m1) was used as a control.
These data were directly compared to aCGH data from a BAC clone covering the PTK2 locus (RP11-502G13; chromosome 8, 142,176,518 to 142,176,951 bases) and several neighboring BACS. The concordance of these data was assessed by a t-test between the Taqman © data for those samples with and without high-level gains (BAC clone IR >2.0).

Statistical analysis
As only a portion of the genome is directly covered by BAC clones on this array, a flanking region algorithm built into CGHAnalyzer extrapolates copy number estimations in uncovered regions between BAC clones. This extrapolates the extent of the copy number alteration represented by a given BAC to the genomic coordinate of the neighboring BAC clone of a different estimated copy number. This approach avoids missing important genetic changes between BACs but, by definition, overestimates the size of alterations. Consecutive BAC clones that are designated as being similarly altered are merged into a single representative region of change. The resultant data structure for a single sample is simply a series of genomic regions designated as either gained or lost. Pear-son correlation coefficients were used to evaluate similarities (positive correlations) and differences (negative correlations) in copy number alteration trends within and across data sets (e.g., cell lines versus tumors). Pearson correlations were calculated by weighting the alteration frequency of each locus on a linear scale. Further, all correlations were compared to the distribution of correlations where the copy number alterations were arranged randomly (n = 1,000). To estimate the total portion of the genome gained or lost, each segment was summed and divided by 2,679 Mb, the total Mb in the genome (excluding heterochromatic, centromeric and telomeric regions not covered by BACs, and the sex chromosomes). The Wilcoxon rank sum test was used to compare estimates between tumors and cell lines, as well as between tumor subsets.
Correlation matrices were generated to identify the intersection of changes at two loci by a binomial probability-based metric. Specifically, the relative correlation of two loci was scored by their pair-wise comparison based on the genomic regions data structure, and defined in terms of the probability of the number of samples sharing aberrations at two loci if they were distributed among the samples by chance. The Fisher's exact test was used with permutation analysis to determine the statistical significance of the correlation between the most frequently aberrant loci in the data set. Multiple iterations (n = 2,000) were performed and the lowest p-values from these randomized iterations were compared to the p-values from the experimental data. Loci were considered significantly correlated when the associated p-value was less than the lowest pvalue from the randomized data.

Results
As the initial step in our analysis, we reviewed the existing literature on whole genome DNA copy number analysis of human breast cancers; four chromosomal CGH studies [5][6][7][8] and two array-based CGH studies [9,10] were used for comparison (Table 1). All studies identified recurrent gains on chromosomes 1q, 8q, 11q, 17q, 20q, and losses on 6q, 8p, 9p, 13q, 16q. However, our high-resolution arrays detected a higher percentage of tumors with these gains, as well as several high-level amplifications (IR >2, estimated >5 copies), in these regions. For example, six previous studies [5][6][7][8][9][10] suggest that 40% to 50% of primary breast tumors have copy number gains of chromosome 8q24; however, this region was amplified in 79% of the tumors in the current study. Our aCGH arrays also identified five regions of gain in more than 50%, and four regions of loss in more than 30% of tumors that have not been previously associated with breast cancer ( Table 1).

Comparison of primary tumors and cell lines
We compared the location, frequency and size of copy number changes in primary tumors versus cell lines. Surprisingly, the location of more frequent gains and losses in the cell lines very closely mirrored those in the primary tumors ( Fig. 1). This is demonstrated by a Pearson correlation coefficient of 0.843 for gains and 0.734 for losses. The mean correlation of randomly placed gains was 0.295 (σ = 0.078) and losses was 0.203 (σ = 0.082), which yield p < 0.0001 for the tumor versus cell line correlations for both cases when modeled to a normal distribution. Additionally, there appeared to be more alterations in cell lines. Primary tumors had gains involving a mean of 410.8 Mb (14.5%) of the genome, which was significantly greater than that seen in cell lines (µ = 674.2 Mb (23.8%), σ = 243.6 Mb (8.6%); p = 0.0014). Similarly, losses in primary tumors (µ = 286.1 Mb (10.1%), σ = 195.5 Mb (6.9%)) were less extensive than those seen in cell lines (µ = 589.3 Mb (20.8%), σ = 226.6 Mb (8.0%); p = 0.0001). As

Frequent amplifications
We identified 55 regions of gain present in more than 30% of the primary tumors (Additional file 2), ranging in size from 0.1 to 8.7 Mb (median 2.1 Mb). Of those 55 regions, 20 encompassed a region of estimated high-level copy number change with an IR >2 (range 2.1 to 8.7) in more than one tumor ( Table  2). Several of these regions contain genes known to be amplified in breast cancer, including ERBB2, EGFR and MYC, while others include genes not previously implicated in breast cancer, including PTK2.
The most frequently gained region in this sample set is chromosome 8q24. Metaphase CGH suggests that this region is a single amplicon, but with the increased resolution of aCGH, two distinct regions of gain become apparent (Fig. 2). The centromeric amplicon extends from 117. 8   ELK4, Prostein, and NUCKS. The region on 16p13.3 contains ZNF205, ZNF215 and ZNF200. The region on 17q12 containing ERBB2 (33.6 to 38.9 Mb, CRO 5.3 Mb) was also a commonly gained region in the primary tumors, with increased copy number detected in 21/47 primary tumors (45%) and 12/18 cell lines (67%). Three primary tumors and two cell lines had high-level gains.

Frequent deletions
The frequency of recurrent losses was lower than that of gains; no region was lost in more than 50% of primary tumors (  Regions of copy number gain containing more than one amplification with intensity ratio (IR) >2.0, as well as frequency of single copy gain (IR 1.2-2.0) in these regions. a Map positions and cytogenetic locations are based on data available from UCSC genome browser (July 2003 freeze) [35]. b Region boundaries were determined by the next bacterial artificial chromosome with different copy number.  [15]). None of these regions contain known fragile sites.

Correlation of genomic alterations with clinical characteristics
Based on the flanking region approach to copy number alteration estimation, the mean percent of the genome gained and lost was calculated for each of the tumor subgroups (e.g., estrogen receptor positive versus negative) and differences evaluated using the Wilcoxon rank sum test (Additional file 3). In this sample set, total percentage of the genome altered did not vary significantly by stage (p = 0.

Genes with frequent copy number changes
We queried the frequency of copy number changes in all known genes, as well as the following gene classes; tumor suppressor genes [18], kinases [19], and cancer-related genes adapted from work by Futreal and colleagues [15,20]. The 10 most frequent gains and losses for each gene class are shown in Additional file 4 (complete list available on request). RB1 was the most frequently lost tumor suppressor gene, hemizygously deleted in 12/47 primary tumors (26%) and 9/ 18 cell lines (50%). RB1 copy number gains were not seen in any primary tumors and only 1/18 cell lines (5.6%).
The ten most frequently amplified kinases were gained at least six times more frequently than lost, likely indicating a selection for gains of these genes. PTK2 was the most frequently gained gene on both the kinase and cancer-related gene list, amplified in 37/47 primary tumors (79%) and 15/18 cell lines (83%). PTK2 was never deleted in the primary tumors and deleted in only 2/18 cell lines (11%). Amplification of the PTK2 gene within the amplified locus was further analyzed by Taqman quantification, and DNA levels in samples that had amplified PTK2 (IR >2.0 by aCGH) were significantly higher than levels in unamplified samples (p = 0.0018; data not shown). Interestingly, five kinases (PTK2B, PHKB, DCAMKL1, TEK, MAP2K4) were deleted at least five times more frequently than gained, suggesting these kinases may play a role in negatively regulating growth. Of note, inactivating mutations in MAP2K4 have been identified in 5% of breast cancers [19]. Additionally, PTK2B is both the most frequently lost kinase and cancer-related gene, hemizygously deleted in 16/47 tumors (34%), and 12/18 cell lines (67%). The remaining five of the ten most frequently deleted kinases are found with equal frequency in regions of gain and loss, an indication that copy number changes in these genes are not likely to be functionally significant and consistent with the idea that most kinases confer a proliferative advantage.
Tumor suppressor genes and kinases are subsets of the cancer-related gene list, so it is not surprising that seven of the top ten cancer-related gene gains are kinases. However, there are three cancer-related genes (GRB2, GAS6, MLLT6) found in regions of gain at least five times more frequently than lost that are not kinases. GRB2, an adaptor molecule in the epidermal growth factor (EGF) signaling pathway, is gained in 22/47 primary tumors (47%) and 12/47 cell lines (67%). GAS6, the ligand of the tyrosine kinase AXL, is gained in 21/47 primary tumors (45%) and 12/47 cell lines (67%). Two cancer-related genes (RBL2 and CDH8) are three times more frequently lost than gained. RBL2/p130 is deleted in 14/47 primary tumors (30%) and 10/18 cell lines (56%).

Correlation of ERBB2 DNA, RNA and protein levels
We identified five distinct regions of gain on chr17q, including the ERBB2 locus at 17q12. Because of the clinical significance of this gene, we determined ERBB2 genomic DNA copy number (Q-PCR LightCycler) and mRNA transcript levels (Taqman) in the subset of 13 primary tumors for which clinically obtained Hercept test data and adequate RNA were available (Table 4). Consistent with the previously reported frequency of ERBB2 overexpression in primary breast tumors of approximately 30% (reviewed in [2]), four of these 13 primary tumors had a positive Hercep test (2+ or 3+) reported by a clinical lab. The two tumors with 3+ staining intensity also had IR >2 for the BAC clone closest to ERBB2 on the CGH array (RP11-552K3) and had a Q-PCR relative ratio >1.5, validating the aCGH data. These tumors also showed overexpression of ERBB2 mRNA compared to regions without copy number increase, with a TaqMan ratio >5.

Pathway mapping of aCGH data
As noted above, the EGF/ERBB2 signaling pathway is clinically relevant in breast cancer. We thus used GenMapp [21] to visualize a composite analysis of DNA copy number of this pathway (Fig. 3). Of note, one or more genes in the canonical EGF signaling pathway were altered in 39/47 primary tumors (83%). Three genes in this pathway had IRs >2 in our set of 47 primary tumors; EGFR in two tumors, ERBB2 in three tumors, and GRB2 in one tumor. A hemizygous deletion of RASA1, which encodes the Ras-GAP that deactivates H-Ras, was detected in four tumors.  [35]. b Region boundaries were determined by the next bacterial artificial chromosome with different copy number. CRO, common region of overlap.

Cooperating genetic loci
As a means of identifying genetic alterations that may function coordinately in tumor initiation and progression, we looked for correlation between genetic loci (i.e. for genes that were gained or lost coordinately with other genes more commonly than would be expected by chance). Separate correlation matrices (Fig. 4) were created using the most commonly altered loci (Table 1). Full heatmaps for genes and loci are available online [22]. Each matrix was evaluated for positive correlation (concordant gains or losses) and negative correla-  Genmapp view of the epidermal growth factor (EGF) signaling pathway Genmapp view of the epidermal growth factor (EGF) signaling pathway. Array-based comparative genomic hybridization copy number frequency data are represented with gains in green and losses in red. tion (discordant gains or losses). To account for the large number of comparisons, we determined the significance of the correlation between these loci using the Fisher's exact test to generate a p-value, again with gains and losses considered separately.
Although several positive correlations were identified, negative correlations with p < 0.05 were not found. With the exception of loci on the same chromosome, only one pair of common losses was correlated, those on chromosome 4:141.6-145.2 Mb and chr13: 44.6-45.1 Mb (p = 0.004). In contrast, common regions of gain were highly correlated to one another. Chromosome 1 and 9 showed the strongest correlation (p < 0.0001), but these loci are also significantly correlated to many of the other commonly gained loci in the matrix. One exception is a frequent gain on chromosome16, which is not correlated to either of these loci, but is highly correlated to chromosome 17q12 (p = 0.0001).

Discussion
aCGH is a powerful technique that allows determination of DNA copy number across the genome of a tumor in a single experiment, with resolution limited only by the number of elements on the array. As a result, aCGH detects changes at higher frequency and with smaller CROs than previous approaches (i.e., chromosomal CGH). Finally, aCGH is semiquantitative, providing an opportunity to narrow regions of copy number gain to those genes most likely to be biologically significant by identifying infrequent high-level amplifications in regions of frequent, lower copy number gain, analogous to using homozygous deletions to narrow larger regions of hemizygous loss.
Perhaps the most surprising finding of this study is the very similar pattern of gains and losses in primary tumors and cell lines. Although there are significantly more frequent genomic alterations in the cell lines, the pattern of gain and loss is strikingly similar to that seen in the primary tumors. It has long been thought that cell lines contained substantial amounts of genomic noise -random, biologically insignificant copy number alterations considered a reflection of the inherent genomic instability of human cancers. The current data argue to the contrary, suggesting that the amplification and deletions seen in cell lines offer the same growth advantages in cell culture that they do in vivo, and are similarly selected for over multiple passages. Furthermore, the relative concordance of genomic aberrations in cell lines and primary tumors increases the confidence in the former as relevant in vitro models and should in the near future allow a direct assessment of how closely a given cell line reflects the parent genotype from which it was derived. Finally, these data also suggest that cancer genomes are relatively stabile over time, unlike expression profiles, which can vary dramatically in short periods of time in response to various growth conditions.
Another surprising finding is the lack of correlation between prognostic clinical parameters such as stage, grade, and receptor status and the overall frequency of genome copy number alterations. A relationship between estrogen receptor status and both overall genomic aberrations as well as specific regions of common gains and losses has been recently reported [14]. Although our data do not confirm their findings, we cannot exclude the possibility that the relatively small sample size of this study provided limited power to see these correlations; however, similar results were recently reported with an aCGH analysis of bladder cancers [23]. If validated in larger series, these data suggest that it is the specific genetic changes, not the total number of copy number alterations, that are determinants of outcome. This hypothesis is consistent with global expression profiling data in breast cancers, where altered expression of a small subset of genes in the human genome was shown to be predictive of outcome years after Correlations between gained and lost loci in breast cancer Correlations between gained and lost loci in breast cancer. Correlation matrices depicting the interaction between the most frequently gained (a) and lost (b) loci in our dataset. Black indicates a p-value > 0.05, whereas the green color becomes more intense as the p-value decreases. Cytoband location is indicated, along with the frequency of aberration in parentheses.
diagnosis [24]. A larger dataset with longer follow-up than the current sample set is needed to determine whether a subset of copy number alterations will be similarly predictive of long term outcome. As expression profiles are a reflection of the fixed genetic changes in cancer genomes, however, it is likely that aCGH profiles of specific gene sets will be similarly predictive. This would be of great clinical utility, as aCGH may be performed on archival material, which is much more readily available than the frozen tumor currently required for expression profiling. Coupled with the likelihood that fixed genetic abnormalities in cancer genomes may be more predictive of response to specific therapy, aCGH holds significant promise for clinical benefit.
Analysis of some of the most commonly altered regions in this sample set illustrate the complex pattern of copy number change that can be clarified with aCGH. For example, the 8q24 amplicon has previously been attributed MYC; however, aCGH of this sample set detects two distinct regions of amplification, with the most frequent region of gain being more telomeric to that which includes MYC. Expression profiling of breast and other epithelial cancers similarly suggested that MYC is less frequently overexpressed than genes located closer to the 8q telomere [25,26]. PTK2 (FAK) falls within this more telomeric region. PTK2 encodes a cytoplasmic tyrosine kinase, central to several proliferative pathways, including integrin, G-protein coupled, and receptor tyrosine kinase signaling, and thus has a plausible role in cancer biology [27]. Consistent with this hypothesis, breast cancer cell lines grown in monolayer culture frequently express constitutively activated PTK2, whereas normal mammary epithelial cells grown under similar conditions do not [28]. In fact, copy number gains of PTK2 are frequent in cell lines derived from invasive epithelial tumors, and PTK2 amplification correlates with increased protein expression in squamous carcinoma cell lines [29].
In addition to increasing the mapping resolution of known regions of copy number change, aCGH is an unbiased approach to detecting novel regions of genomic alteration, which potentially harbor novel cancer-related genes. For example, these data reveal several novel amplicons on chromosome 1, as well as a narrow region of high-level gain at 1q32.1, which is commonly gained in breast cancer. This region includes CNTN2, RBBP5, ELK4, Prostein, NUCKS and two hypothetical genes. Expression profiling of a subset of these tumors demonstrate that NUCKS and two ESTs (Expressed Sequence Tags) are overexpressed in the tumors with amplifications relative to tumors that are diploid at this locus. Evidence that NUCKS (nuclear ubiquitous casein kinase and cyclin-dependent kinase substrate) is expressed in breast tissue and is believed to play a role in regulating transcriptional regulation makes it an excellent candidate gene in this region [30]. The increased resolution of aCGH also revealed several small regions of modest copy number gain or hemizygous deletion that have not been associated with breast tumors in previous studies. Gains at the telomeres of 4p (4p16.1) and 5p (5p15. 33) are examples. While 4p16.1 does not contain any known cancer-related genes, the catalytic unit of telomerase (hTERT) is contained within the 5p15.33 amplicon.
The current data also suggest that homozygous deletions are relatively uncommon in primary tumors. The only homozygous deletion we detected in the primary tumors is on chromosome 9p21. This region contains MOB3B, which shares similarity with the yeast gene scMob1. scMob1 binds Mps1p, a protein kinase essential for spindle pole body duplication and mitotic checkpoint regulation, which in turn plays a role in maintaining genome stability, again providing biological plausibility for loss of MOB3B in cancer. Homozygous deletions are more common in cell lines, suggesting positive selection for loss of the genes in these regions and possibly an increased tolerance to the loss of adjacent genes in immortalized cell lines. An alternative explanation may be that contamination by normal (diploid) cells in primary tumors might decrease the sensitivity of detection of homozygous losses compared to homogenous cell lines. As an example, the recurrent homozygous deletion on 18q21 (46.8-52.8 Mb), which includes SMAD4 (MADH4), also has been observed only in pancreatic cell lines [31]. Another region of frequent hemizygous loss that contains a recurrent homozygous deletion in cell lines is 8p23. Although this region contains 14 genes, 12 of them belong to the defensin family. Interestingly, defensins play a role in epithelial wound repair, which involves migration, proliferation and EGFR activation [32].
A comparison of the relative frequency of gains and losses at specific loci may provide insight not only into the likelihood that change at a specific locus is of biological significance, but into the biological function of the associated genes as well. For example several tumor suppressor genes known to play a role in breast cancer, such as RB1, PTEN and BRCA2, were frequently lost, but rarely gained, in our data set. Conversely, most kinases were gained on average five times more frequently than they were deleted. These data suggest that regions that are equally likely to be gained and lost are unlikely to contain genes that confer a selective advantage when altered. As an example, analysis of copy number changes in kinases, expected to be amplified in cancers, revealed an interesting relationship between two members of the focal adhesion kinase family. PTK2 (FAK) is the most frequently gained gene overall and PTK2B (PYK) is the most frequently lost kinase. Interestingly, these gene products differentially regulate progression of the cell cycle, with induction of PTK2B inhibiting G1-S transition, while induction of PTK2 expression increases the rate of this transition [33].
Finally, we have used two different methods, pathway mapping and correlation analysis, to interrogate the aCGH data for evidence of interaction between genomic loci. In this instance, pathway mapping suggests that only a subset of the genes in a pathway may confer a selective advantage when altered in a specific tissue type. Thus ERBB2, EGFR and GRB2 were frequently amplified, but RAS, RAF and MEK were amplified in less than 10% of the tumors. Many tumors had more than one copy number alteration in this pathway, but none had a high level amplification in more than one node. Only RASA1, which encodes a Ras-GAP that deactivates H-Ras, was hemizygously deleted in this pathway. One of the tumors with a RASA1 deletion did not have a gain in any of the activating genes in this pathway, suggesting a mechanism of H-Ras activation in this tumor.
The correlation analysis demonstrated both specific loci that may cooperate in initiating or maintaining the malignant phenotype and some generalized differences between gains and losses. These observations are consistent with the hypothesis that gains and losses in cancer genomes are generated by different mechanisms and subject to different selection. Thus, when compared to deletions, gains are more common, often larger, and much more frequently correlated with other gains. One model of genomic instability that is consistent with these findings is large scale duplication of the entire genome with subsequent loss of whole chromosomes or smaller intrachromosomal regions.
The relatively small numbers of patients in each clinical subgroup did not allow us to identify association between specific aberrations and clinical characteristics at this time; this may require both more samples as well as novel analytical methods to analyze patterns of similar aberrations. In addition to clinical and histological characteristics, expression profiling is emerging as a viable means of molecularly subtyping breast cancer, and a recent report has correlated distinct regions of loss of heterozygosity with specific expression profiles [34]. Further work will be required to realize the potential of combining data from expression analysis and CGH to pinpointing genes affected by amplifications and deletions, which should lead to both a better understanding of the significance of specific genetic aberrations as well as novel targets for therapeutic interventions.

Conclusion
A set of primary breast cancers and breast cancer cell lines subjected to aCGH were found to be strikingly similar to one another and together they generated a rich dataset for inquiry into the cancer genome. Findings from the analysis of these data include fine mapping of previously described regions of gain and loss, identification of novel regions of gain and loss, and unbiased enumeration of the frequency of copy number alterations in specific genes. This analysis also was applied to gene subsets, including kinases and tumor suppressor genes, and specific pathways, which not only provided a rank order list of the most common alterations but had apparent functional implications. Finally, correlation analysis identified spe-cific potential cooperating loci and highlighted possible differences in genomic mechanisms that generate gains and losses. These findings require additional investigation but have the potential to be of substantial biological and therapeutic significance.

Additional files
The following Additional files are available online: