Serum microRNA expression as an early marker for breast cancer risk in prospectively collected samples from the Sister Study cohort

Introduction MicroRNAs (miRNAs) are small, non-coding, single-stranded RNAs between 18-22 nucleotides long that regulate gene expression. Expression of miRNAs is altered in tumor compared to normal tissue; there is some evidence that these changes may be reflected in the serum of cancer cases compared to healthy individuals. This has yet to be examined in a prospective study where samples are collected before diagnosis. Methods We used Affymetrix arrays to examine serum miRNA expression profiles in 410 participants in the Sister Study, a prospective cohort study of 50,884 women. All women in the cohort had never been diagnosed with breast cancer at the time of enrollment. We compared global miRNA expression patterns in 205 women who subsequently developed breast cancer and 205 women who remained breast cancer-free. In addition within the case group we examined the association of miRNA expression in serum with different tumor characteristics, including hormone status (ER, PR, and HER-2) and lymph node status. Results Overall, 414 of 1,105 of the human miRNAs on the chip were expressed above background levels in 50 or more women. When the average expression among controls was compared to cases using conditional logistic regression, 21 miRNAs were found to be differentially expressed (P≤.05). Using qRT-PCR on a small, independent sample of 5 cases and 5 controls we verified overexpression of the 3 highest expressing miRNAs among cases, miR-18a, miR-181a, and miR-222; the differences were not statistically significant in this small set. The 21 differentially expressed miRNAs are known to target at least 82 genes; using the gene list for pathway analysis we found enrichment of genes involved in cancer-related processes. In a separate case-case analyses restricted to the 21 miRNAs, we found 7 miRNAs with differential expression for women whose breast tumors differed by HER-2 expression, and 10 miRNAs with differential expression by nodal status. Conclusions miRNA levels in serum show a number of small differences between women who later develop cancer versus those who remain cancer-free.


Introduction
MicroRNAs (miRNAs) are small, non-coding, singlestranded RNAs ranging in size between 18 and 22 nucleotides; they are typically excised from longer, 60-to 110-nucleotide stem-loop precursors [1,2]. miRNAs are involved in fundamental biological processes, including development, differentiation, apoptosis, and proliferation, and are believed to act predominately as post-transcriptional regulators that can either degrade their mRNA targets or repress their translation [3]. A single miRNA may have multiple mRNA targets, and up to 30% of human genes may be regulated by miRNAs [4,5].
Aberrant expression of miRNAs in cancer was initially identified in B-cell chronic lymphocytic leukemia [6], and miRNA dysregulation has been subsequently reported for many tumor types in which, depending on the specific target mRNA(s), they may act either as tumor suppressor genes or as oncogenes [7,8]. In breast cancer, post-diagnosis miRNA levels have been shown to correlate with a number of tumor characteristics, including stage, vascular invasion, proliferative index, and estrogen receptor/progesterone receptor (ER/PR) status [9,10], and may have prognostic value.
miRNAs have recently been found in human serum and plasma, where they appear to be resistant to RNAase degradation and thus relatively stable, even in stored samples [11]. This stability has made miRNAs appealing candidates for epidemiologic studies of stored samples, particularly since miRNA profiling requires only small amounts of serum or plasma [12]. The use of circulating miRNA profiles as potential early-detection cancer markers has generated considerable interest [13][14][15][16], although data addressing such application remain sparse. Initial studies have suggested that serum levels of miRNAs may differ between diagnosed cancer cases and controls [17], and several recent case control studies of breast cancer have reported evidence of differential miRNA expression levels in serum [18][19][20][21]. These studies have shown little agreement, perhaps because some have measured only a few miRNAs whereas others have used more comprehensive miRNA screens, but with a small number of subjects. None has used samples obtained prior to diagnosis. Use of such prospective samples avoids a number of important potential biases (for example, differential selection and processing of cases and controls or the possibility that the differences observed in case samples are the result of biopsy, cancer treatments, behavioral changes, stress, or other factors experienced by cases but not controls).
Here, we report on a study that prospectively collected serum samples from 205 women who subsequently developed breast cancer and 205 women who remained cancer-free and that used microarrays to comprehensively assess known miRNAs.

Study population
The Sister Study [22] is a prospective cohort study of 50,884 women and was designed to examine the environmental and genetic determinants of breast cancer. The cohort has been previously described [23]; briefly, women from the US or Puerto Rico were eligible to enroll if they themselves had never had breast cancer but had a full or half-sister who had breast cancer. At baseline interview, all participants provided extensive information, including family history, reproductive history, and information about potential risk factors. Informed consent and blood samples were obtained during a home visit. For women who subsequently developed breast cancer, detailed information on diagnosis was collected from medical records and self-report. Pathology reports were abstracted for tumor grade, stage, and other information, including status for ER, PR, and HER-2 (human epidermal growth factor receptor 2) expression. The study was approved by the Institutional Review Board of the National Institute of Environmental Health Sciences, National Institutes of Health, and the Copernicus Group Institutional Review Board.

Selection of cases and controls
We designed a matched-pair nested case control study. We selected patients who had confirmed invasive breast cancer, who completed enrollment by August 2008, and whose diagnosis occurred within 18 months following blood draw (n = 242). We excluded 29 cases who lacked a serum sample or whose sample had integrity issues during collection and shipping and eight cases whose sample had limited volume, leaving 205 cases that are the focus of our study. For each case, a matched control was selected from the 50,884 participants on the basis of the following criteria: no history of cancer (other than non-melanoma skin cancer), having completed enrollment by August 2008, an available blood sample, same race (non-Hispanic white, black, Hispanic, or other), similar age at enrollment (within 5 years), and similar date of blood draw (within 2 months). Three replicate serum samples from three women (nine samples in total) who were not participants in the study but who provided blood samples that were collected and processed in the same manner as Sister Study participants were used to provide technical replicates.

Assignment to extraction batches and array chip lot
To minimize possible processing and chip lot effects, samples were assigned to processing batches of seven to nine pairs, and batches had similar distributions of age, race, and date of enrollment. For array hybridization, each batch was assigned to one of two different chip lots ('A' and 'B') in a manner designed to ensure a balance of these same characteristics. The nine replicates (described above) were assigned to the same batch and chip lot. Laboratory personnel were blind to case control status and other phenotype information.

RNA extraction, labeling, and hybridization
Total RNA was extracted in batches by using a Total RNA purification kit (cat. no. 17200; Norgen Biotek Corp., Thorold, ON, Canada). In accordance with the manufacturer's recommendation not to exceed 200 µL per column, 400 µL of total serum from each individual was split into two equal 200-µL aliquots and then processed separately following the manufacturer's recommended protocol for total RNA purification from serum. An on-column DNase digestion was added before sample elution by using an RNase-Free DNase I Kit (cat. no. 25710; Norgen Biotek Corp.), and the two aliquots were subsequently pooled. Fixed volumes rather than fixed amounts of RNA were used in accordance with other studies [24].
Total RNA (8 µL) was directly labeled by using Flash Tag Biotin HSR Labeling kits (cat. no. HSR30FTA; Genisphere, LLC, Hatfield, PA, USA) in accordance with the instructions of the manufacturer. RNA was heated to 80°C for 10 minutes before labeling to inactivate any residual DNase activity. RNA was hybridized for 42 hours to the GeneChip miRNA 2.0 array (cat. no. 901755; Affymetrix Inc., Santa Clara, CA, USA [25]). The GeneChip miRNA 2.0 arrays contain 100% miRBase version 15 coverage of 131 organisms and contain probes for 3,439 human non-coding RNAs (ncRNAs), including 1,105 miRNAs and 2,334 other ncRNAs (including scaRNAs and snoRNAs). The arrays were washed and stained by using standard Affymetrix protocols and scanned by using an Affymetrix GCS 3000 7G Scanner. Feature intensities were extracted by using miRNA 2.0 array library files. Array hybridization and scanning were completed by Precision Biomarker Resources, Inc. (Evanston, IL, USA). The average Spearman correlation coefficient values for three sets of three technical replicates were all above 0.8 (Additional file 1). Array data were deposited into the NCBI Gene Expression Omnibus (GSE44281).

Replication samples and qRT-PCR
An independent set of 10 women were used to validate selected miRNAs via quantitative reverse transcriptionpolymerase chain reaction (qRT-PCR). Five women who provided consent and blood samples but who developed breast cancer prior to completing enrollment were selected as cases, along with five controls who also provided consent and blood samples and who were cancerfree but did not complete enrollment. Total RNA was extracted from serum samples of these women as described above with the addition of Synthetic C. elegans miScript miRNA Mimic (cat. no. MSY0000010; Qiagen, Valencia, CA, USA). Synthetic cel-39 was spiked-in at a final concentration of 0.25 fmol/µL prior to extraction and used as a PCR normalization control. The RNA concentration, reverse transcription, and pre-PCR steps were carried out in accordance with a previously published protocol [26]. ExoSAP-IT (cat. no. 78250; Affymetrix Inc.) treatment followed by column purification (cat. no. 28004; Qiagen) in accordance with the protocol of the manufacturer was used to purify the pre-PCR product. Individual PCR was run in triplicate by using 1 µL of purified pre-PCR product. The reaction contained the following components: 2x Taqman universal master mix (cat. no. 4324018; ABI, Carlsbad, CA, USA), 1 µM forward primer, 1 µM universal reverse primer, and 0.2 µM probe. The reaction was run on a Bio-Rad CFX 384 Real-Time System (Bio-Rad Laboratories, Inc., Hercules, CA, USA) by using the following parameters: 55°C for 2 minutes, 95°C for 10 minutes, followed by 40 cycles of 95°C for 15 seconds and 55°C for 1 minute. PCR cycle threshold (Ct) values were recorded for each target gene and for normalization controls and were averaged across three independent runs. Primers for miR-222, miR-181a, miR-1825, and miR-18a were custom-ordered from IDT (San Diego, CA, USA) by using previously published sequences [26]. Primers for cel-39 were designed in the same fashion as above and customordered from IDT.
To determine the best candidate miRNA for PCR normalization in our data set, we ran the array expression data from the 47 miRNAs expressed in almost all individuals through the NormFinder software [27]. Norm-Finder uses a model-based variance estimation approach [28]. Using these results, we selected as a qRT-PCR normalization control miR-1825, which showed one of the highest stability values across the 410 cases and controls and had blood levels that were similar to those of the three target miRNAs. We used the average of miR-1825 and an external spike-in cel-39 control, a strategy shown to be effective for controlling both technical and biologic variability in qRT-PCR assays from serum [17,24]. The efficiency of the four PCR assays (for miR-181a, miR-18a, miR-222, and miR-1825) was similar for all four assays (Additional file 2). Normalized relative expression was based on Ct values and calculated as 1/(Ct gene −Ct norm ).
Data processing and statistical analysis miRNA expression intensity values were backgroundcorrected and normalized across arrays by using the robust multichip average method [29]. The intensity data used in all analysis were log (2)-transformed.
For each array, the miRNA probe set signals were compared with the distribution of signals for anti-genomic probes that had matching GC content (miRNA QC Tool, version 1.0.33.0), and in accordance with the recommendation of the manufacturer, Wilcoxon rank-sum test of P value of less than 0.06 was used to identify miRNAs above background. Subsequent analysis was restricted to 414 miRNAs that exceeded background levels in at least 50 women. Conditional logistic regression was used to identify differentially expressed miRNA probes between cases and controls for those 414 probes. Because analysis of circulating miRNAs in prospectively collected samples is still exploratory, we -like some other investigators of circulating miRNAs [30,31] -regard these results as descriptive and not as tests of hypotheses and so provide P values that are unadjusted for multiple comparisons.
The association between miRNAs and the tumor characteristics of hormone receptor status (ER, PR, and HER-2) and lymph node status was tested in a case-only logistic analysis, in which race was adjusted for. Chip lot and batch were specified as random effect variables. All statistical analyses were performed by using R 2.15.
Pathway analysis with ingenuity pathway analysis miRNAs found to be significantly associated with case control status were further analyzed with ingenuity pathway analysis (IPA) [32]. Using IPA's microRNA target filter, we generated a list of predicted mRNA targets for each of the 21 significant miRNAs. The list was then restricted to the mRNAs listed in the IPA database as experimentally verified targets of any of the 21 miRNAs. This mRNA target list was then used to run a canonical pathway analysis.

A large number of miRNAs are detected in serum
In total, 410 serum samples from breast cancer cases (n = 205) and controls (n = 205) were analyzed in this study; baseline characteristics of the cases and controls are summarized in Table 1. Of the 1,105 human miRNAs, 414 miRNAs were detected above background threshold levels in at least 50 women. Forty-seven miRNAs were detected above background in 400 or more women (Table 2), and miR-16 showed the highest average expression. Even though expression of miRNAs showed considerable inter-individual variation, several miRNAs, including miR-1825 and miR-1228, were relatively constant among women (Figure 1).

Discovery of differentially expressed miRNAs in serum
When paired case control analysis of the 414 miRNAs expressed above background was used, 21 miRNAs showed significantly different levels in cases and controls (P ≤0.05) ( Table 3). The differences were small, ranging from 4% to 19%. Higher miRNA expression in women destined to become cases was significantly more common (16 of 21 miRNAs) than would be expected by chance alone (binomial test, two-tailed P <0.05). Differential miRNA expression was not stronger in women close to their time of diagnosis, but sample size was small and all cases were diagnosed within 18 months of blood draw (data not shown). Using qRT-PCR on a small independent replication set of five cases and five controls, we further examined the three miRNAs (miR-18a, miR-181a, and miR-222) with the highest expression in cases. As predicted, all three miRNAs showed higher levels in cases, although none was statistically significant in this small set of women (Additional file 3).

The impact of miRNA alterations on regulatory pathways
To explore potential biological associations, we ran IPA on the 82 experimentally verified mRNA targets of the 21 differentially expressed miRNAs. Sixteen IPA canonical pathways, including molecular mechanisms of cancer, were enriched as were other cancer-related pathways, including p53 signaling, cyclins and cell cycle regulation, and Myc-mediated apoptosis signaling (Additional file 4).

miRNA expression association with tumor characteristics
To investigate the potential association of serum miRNA expression with tumor characteristics in the 205 women who later developed breast cancer, we subclassified them into groups based on tumor characteristics (Table 4) and performed a case-case comparison. There was no evidence of significant differences in serum miRNA levels based on tumor ER or PR staining characteristics. In comparisons of serum samples from the 25 women who developed HER-2-positive tumors with 147 samples from women who developed HER-2-negative tumors, there were seven miRNAs with significantly differential expression (P ≤0.05); one miRNA was overexpressed and six miRNAs were underexpressed in the HER-2-positive tumors (Figure 2A and Additional file 5). Case-case comparison of serum from women who subsequently developed lymph node-negative tumors (pN0, n = 153) with that of women who developed lymph node-positive tumors (pN1, pN2, or pN3, n = 52) revealed 10 differentially expressed miRNAs (P ≤0.05); five were overexpressed and five were underexpressed in node-positive tumors ( Figure 2B and Additional file 6).     Discussion miRNA profiles are gaining interest as potential diagnostic or prognostic markers for breast cancer [33]. However, existing studies have been limited by sample size or the number of miRNAs analyzed, and none has used prospectively collected samples [18,31,34]. Our study minimized potential biases by profiling global serum miRNA expression patterns in samples obtained from women prior to clinical diagnosis (mean time to diagnosis was 10 months).
We found a set of 21 miRNAs differentially expressed in serum samples from 205 women who subsequently developed breast cancer compared with 205 women who remained cancer-free during the time of follow-up. The differences in miRNA levels were small and include both overexpression and underexpression of miRNAs in the cases, and overexpression was significantly more frequent than would be expected had the association been random. Published reports of primary breast tumors or cell lines have examined seven of the 21 differentially expressed miRNAs we found, and all seven showed agreement with the direction of change in our case serum samples ( Table 3). IPA of the mRNA targets of these differentially expressed miRNAs suggested gene enrichment for cancer-related signaling pathways. Although the absolute differences in miRNA levels between serum samples of cases and controls are quite small, differences pre-date clinical diagnosis and may reflect important pathways for breast cancer development. miR-18a, miR-181a, and miR-222 showed the highest percentage difference between cases and controls in our study; qRT-PCR of these miRNAs in a small independent replication set of cases and controls, though not statistically significant, replicated the direction of change for all three. These three miRNAs have been suggested to act as oncogenes through regulation of their potential target mRNAs. miR-18a is part of the oncogenic miR 17-92 cluster, which is often overexpressed in solid tumors, including breast [35]. Overexpression of this cluster is believed to cooperate with c-Myc in stimulating proliferation by negatively regulating E2F1 [36,37]. Increased expression of miR-181a in the bone marrow of patients with breast cancer has been reported to be associated with shorter disease-free survival, higher grade, and breast cancer recurrence [38]. miR-181a is believed to target the tumor suppressor gene programmed cell death protein 4 (PDCD4) [38], which inhibits tumor neoplastic transformation [39]. In breast cancer cell lines, miR-222 overexpression has been reported to be associated with tamoxifen resistance through targeting the cell cycle inhibitor p27 (Kip1) [40]. miR-222 has also been reported to increase proliferation of ERα-negative cells while reducing the expression of various tumor suppressor proteins [41], and expression of miR-222 has been reported to increase cell migration in the epithelialto-mesenchymal transition acting downstream of the RAS-RAF-MEK oncogenic pathway [42].
Interestingly, two recent case control studies have provided evidence that both miR-222 and miR-181a are overexpressed in the serum of patients with breast cancer. One used sequencing by oligonucleotide ligation and detection (SOLiD) of serum samples obtained prior to surgery from 13 breast cancer cases compared with samples from 10 healthy controls and found 26 miRNAs that were overexpressed in cases, including miR-222 and miR-181a; overexpression of miR-222 was validated in an independent group of 50 cases and 50 controls by using qRT-PCR [20]. A second study used Solexa sequencing combined with Taqman low-density array chips on serum samples obtained prior to surgery from 48 breast cancer cases and 48 controls; 10 miRNAs were found to be overexpressed in the cases, and four were validated by using qRT-PCR in an independent group of 76 cases and 76 controls [21]. That study also found overexpression of miR-222 [21]. These studies, combined with our prospective study, provide a growing body of evidence that miR-222 measured in blood is associated with breast cancer. Among cases, we compared the serum miRNA profiles of women with different tumor characteristics, including hormone status (ER, PR, and HER-2) and nodal status. Although there were no differences in ER or PR status, there were differences in HER-2 and lymph node status. Of the seven miRNAs differentially expressed in the serum of women who developed HER-2-overexpressing breast tumors, miR-93, miR-183, and miR-29a have been reported to be associated with breast cancer in previous studies [20,43,44]. In our study, miR-93 was underexpressed in the serum of women who developed HER-2-overexpressing breast tumors; interestingly, miR-93 expression was recently shown to induce a more differentiated cell phenotype in breast cancer cell lines, and expression of miR-93 in mouse mammary fat pads blocked tumor development and metastases [44]. Of the 10 miRNAs differentially expressed in the serum of women with tumors that spread to the lymph nodes (pN1 or higher), four (miR-145, miR-124, miR-125b, and miR-320) have been reported to be associated with breast cancer in previous studies [45][46][47][48]. Of these, miR-320 is of particular interest as we found three miR-320 family members (miR-320b, miR-320d, and miR320e) to be underexpressed in the serum of women who developed lymph node-positive breast tumors. miR-320 has been reported to be decreased in breast tumor tissue and downregulation of miR-320 -through loss of phosphatase and tensin homolog (PTEN) -has been shown to promote tumor proliferation and invasiveness in mouse models; expression of miR-320 distinguished human normal breast stroma from tumor stroma and was correlated with recurrence [49]. A study comparing miRNA expression in inflammatory breast cancer (IBC) with non-IBC also found miR-320 to be downregulated in the more aggressive IBC group of tumors [50]. Thus, loss of miR-320 expression may be associated with a higher likelihood of lymph node involvement and a more aggressive metastatic phenotype.
Although miRNAs that are differentially expressed between tumor and normal tissue are more frequently downregulated in tumor tissue [7], our study (like others [20,21]) has found that circulating miRNAs that differ in levels between breast cancer patients and controls are more frequently at higher levels in case blood samples. The mechanism underlying circulating miRNA stability is still being investigated. One model involves the active release of miRNAs from cells in membrane-bound microvesicles, including exosomes and shedding vesicles [51][52][53]. There is evidence that microvesicles can deliver miRNAs to recipient cells and trigger changes in target mRNA levels [54]. A recent report has shown that vesicleencapsulated miRNAs represent only a minor portion of circulating miRNAs but that a significant portion of circulating miRNAs are associated with Argonaute2 (Ago2) [55], the effector component of the miRNA-induced silencing complex [56]. Both models support the possibility that miRNAs may be actively released into circulation and could act as signaling molecules able to regulate their target mRNA expression in recipient cells. Cancerassociated miRNAs in the circulation could also originate from immunocytes in the tumor microenvironment or from a response mediated by the body's systemic response to disease [57].

Conclusions
We find some evidence of differences in miRNA serum levels between women who subsequently developed cancer compared with women who remained clinically cancerfree. The magnitude of these differences is small, and this may limit their clinical application as circulating earlydetection markers for breast cancer. This is the first study to use prospectively collected samples; limitations include a relatively short follow-up period and a sample size that is not large enough to fully explore etiologic versus diagnostic relevance. Our study was carried out within a cohort of women who each had a sister with breast cancer, putting the former at about twofold increased risk, and so the differences that we observed may not be generalizable to women without a similar family history.

Additional material
Additional file 1: Spearman correlation coefficient values for technical replicates of arrays. Three replicate serum samples from three women (nine samples in total) were processed and hybridized to arrays as described for samples in the main study. Spearman correlation coefficients were calculated for the three pairings of replicate samples for each woman and averaged. One array from Individual 1 appeared to be an outlier but was included in the results shown above. Exclusion of this array resulted in correlation coefficients of greater than 0.97 in all three categories of probes for Individual 1.
Additional file 2: Efficiency of the four polymerase chain reaction (PCR) assays. The efficiency of PCR amplifications for the normalization control and three target microRNAs (miRNAs) was calculated by using DART-PCR version 1.0. The average efficiency of each of three independent PCRs was similar across all four miRNAs.
Additional file 3: Quantitative reverse transcription-polymerase chain reaction (qRT-PCR) validation in 10 cases and controls. Serum levels of five cases and five controls were examined by using qRT-PCR for miR181a, miR18a, and miR-222. Box plots show normalized relative expression. Normalization was carried out by using the mean of miR-1825 and spiked in cel-39. The horizontal line represents the mean for each sample.
Additional file 4: Experimentally observed targets enriched for cancer and signaling pathways. Identified ingenuity pathway analysis (IPA) canonical pathways enriched by the experimentally observed targets of the 21 miRNAs differentially expressed between the cases and non-cases. The negative log (10) false discovery rate-corrected P values are shown. Note that this test has not corrected for possible dependencies across the mRNAs considered and that statistical significance may be overstated. HER-2, human epidermal growth factor receptor 2; ILK, integrin-linked kinase; PTEN, phosphatase and tensin homolog.
Additional file 5: List of differentially expressed serum microRNAs (miRNAs) based on HER-2/NEU expression. Additional file 6: List of differentially expressed serum microRNAs (miRNAs) based on nodal status.
Authors' contributions ACG carried out sample extraction and sample preparation, did some statistical analysis, and helped to design the experiments, interpret the results, and write the manuscript. ZX performed the majority of the statistical analysis. CRW and PAW participated in study design. LDR participated in study design and provided patient data and study variables. DPS participated in study design and collected samples and data. RCG provided study reagents and processing as well as critical advice on study design, QA, and data interpretation. JAT helped to design the experiments, interpret the results, and write the manuscript. All authors read and approved the final manuscript.

Competing interests
The authors declare that they have no competing interests.