Design
DISC was a multicenter randomized controlled clinical trial sponsored by the National Heart, Lung, and Blood Institute (NHLBI) to test the safety and efficacy of a dietary intervention to reduce serum low-density lipoprotein cholesterol (LDL-C) in children with elevated LDL-C. The trial’s design and results have been described previously [20,21,22,23]. Briefly, between 1988 and 1990, 301 healthy, prepubertal 8–10-year-old girls (and 362 boys) with elevated LDL-C were recruited into DISC at 6 clinical centersFootnote 1 and randomized by the data coordinating centerFootnote 2 to a behavioral dietary intervention or usual care control group. Planned intervention continued until 1997 when the mean age of participants was 16.7 years. Assent was obtained from DISC participants and informed consent was obtained from parents/guardians prior to randomization. In 2006–2008 when participants were 25 to 29 years old, the DISC06 Follow-Up Study was conducted to evaluate the longer-term effects of the diet intervention on biomarkers associated with breast cancer in DISC female participants [24]. Informed consent was obtained from participants again prior to the DISC06 follow-up visit.
Participants
DISC participants originally were recruited through schools, health maintenance organizations and pediatric practices. A total of 301 8–10-year-old girls with elevated serum LDL-C who met several additional eligibility criteria were enrolled [20].
All female DISC participants were invited to participate in the DISC06 Follow-Up Study and 260 (86.4%) attended visits. Those who were pregnant or breast feeding at or within 12 weeks before visits (n = 30), who had breast augmentation or reduction surgery (n = 13), or whose breast MRI was missing or not technically acceptable (n = 35) were excluded leaving a total of 182 participants with breast density measurements.
DISC participants provided blood samples on multiple occasions during childhood. Because of limited volume of serum remaining, we measured metabolites in a single sample that had adequate volume (> 0.1 ml) and was collected at the DISC visit that occurred closest in time before or after menarche.
Data collection
Data and serum were collected previously in DISC or the DISC06 Follow-Up Study. DISC data were collected at baseline, before randomization and annually thereafter. Height and weight were measured, and a brief physical examination including Tanner staging of sexual maturation was performed. Data on demographics, medical history, medication use, physical activity and onset of menses were collected. At baseline, Year-1, Year-3, Year-5 and last visits, a venous blood sample was collected. Girls who were postmenarcheal completed menstrual cycle calendars to estimate the day of the menstrual cycle when blood was collected. At these visits, three non-consecutive 24-h dietary recalls were collected over 2 weeks and averaged to estimate nutrient intake [25].
For the DISC06 Follow-Up Study, participants attended a single visit between 2006 and 2008. Visits occurred within 14 days of onset of next menses whenever possible. Participant data, including demographics, medical and reproductive history, hormonal contraceptive and medication use, and physical activity, were collected on the same day, while 24-h dietary recalls were collected over two weeks.
In both studies, centralized data collection training was conducted prior to data collection. Data were collected by staff masked to treatment group assignment.
Anthropometry
Height and weight were measured annually in DISC and again in DISC06 using the same procedures. Height was measured using a stadiometer, and weight was measured on an electronic or beam balance scale. Measurements were taken twice. A third measurement was taken if the first two measurements were not within allowable tolerances (0.5 cm for height and 0.2 kg for weight) and the two closest values were averaged. BMI was calculated as wt(kg)/ht(m2) and for DISC visits during childhood expressed as a z-score relative to Centers for Disease Control and Prevention (CDC) 2000 Growth Charts [26] to account for changes with age. The BMI z-score at the DISC visit when blood used for metabolomics assays was collected was used in all analyses.
Blood collection and processing
Blood was collected at DISC and DISC06 follow-up visits in the morning after an overnight fast by venipuncture using standard procedures. Blood was allowed to clot for 45 min at room temperature and centrifuged at 1500×g for 20 min before separating serum, which was aliquoted into glass vials in DISC and cryovials in DISC06 and stored continuously at − 80 °C.
Breast density assessment
Breast density was measured at the DISC06 follow-up visit using non-contrast MRI. Imaging was performed using a whole-body 1.5 Tesla or higher-field-strength MRI scanner and dedicated breast imaging radiofrequency coil. A standard protocol was followed consisting of a 3D T1-weighted fast gradient echo pulse sequence performed with and without fat suppression and in transaxial and coronal orientations. A 32–40 cm field of view was used for bilateral coverage.
MRI technologists at the clinical centers were individually trained to recognize and correct failures due to incomplete fat suppression, motion artifacts and inadequate breast coverage. Acceptable image quality on 3 volunteers was required for site certification. Participant scans that were inaccurate due to artifacts, motion or technique were excluded (n = 21).
All MRI image data were processed at the University of California San Francisco using customized software to identify the chest wall–breast tissue boundary and skin surface, and to separate breast fibroglandular and fatty tissue using a segmentation method based on fuzzy C-means (FCM) clustering [27]. FCM segmentation was performed using fat-suppressed images; nonfat-suppressed images were used when incorrect or failed segmentation occurred due to poor fat suppression. In problematic cases that could not be segmented with automated FCM methods, manual delineation was used.
Separately for each breast, total breast volume and ADBV were measured and absolute non-dense breast volume (ANDBV) was estimated by subtraction. %DBV was calculated as the ratio ADBV:total breast volume × 100. All breast density measures on the two breasts were highly correlated (r > 0.94). Results for the two breasts were averaged to provide single measures of %DBV, ADBV and ANDBV for each participant.
Metabolomics assays
Untargeted metabolomic profiling was performed by Metabolon (Durham, NC). DISC serum samples were randomly ordered with 10% blind quality control (QC) samples integrated throughout to monitor laboratory performance. A pooled matrix sample served as a technical replicate throughout analyses, extracted water samples served as process blanks, and a cocktail of QC standards that was spiked into every sample allowed instrument performance monitoring and aided chromatographic alignment. Forty-two technical replicates from DISC06 samples that had previously been analyzed by Metabolon were re-assayed to facilitate comparison of metabolite levels in serum collected in childhood and young adulthood. A limited dataset with recalibrated levels of named metabolites measured in 180 participants with both DISC and DISC06 samples was created to allow adjustment for adult metabolite levels when analyzing associations of child levels with breast density.
Samples were prepared using the automated MicroLab StAR system (Hamilton Co.). Proteins were precipitated with methanol and the resulting extract was divided into 5 fractions for analysis by four ultra-high-performance liquid chromatography–tandem mass spectrometry (UPLC-MS/MS) methods with one sample reserved for backup. Samples were placed briefly on a TurboVap (Zymark) to remove the organic solvent and were stored overnight under nitrogen before preparation for analysis.
All methods used a Waters ACQUITY ultra-high-performance liquid chromatography and a Thermo Scientific Q-Exactive high-resolution/accurate mass spectrometer interfaced with a heated electrospray ionization (HESI-II) source and Orbitrap mass analyzer operated at 35,000 mass resolution. After drying, sample extracts were reconstituted in solvents compatible with each of the four analytical methods. Details of the methods have been reported previously [28, 29]. Briefly, two aliquots were analyzed under acidic positive ion conditions using a C18 column. One was chromatographically optimized for more hydrophilic compounds, whereas the other chromatographically optimized for more hydrophobic compounds. The third aliquot was analyzed under basic negative ion optimized conditions following gradient elution on a dedicated C18 column. The fourth aliquot also was analyzed under negative ionization following gradient elution from a HILIC column. The MS analysis alternated between MS and data-dependent MSn scans using dynamic exclusion and a scan range of 70–1000 m/z.
Compounds were identified by comparison to library entries of purified standards or recurrent unknowns. Biochemical identifications were based on three criteria: retention index, accurate mass match to the library and the MS/MS forward and reverse scores between the experimental data and authentic standards. At the time DISC samples were analyzed, more than 3300 commercially available purified standard compounds had been acquired and characterized.
Proprietary visualization and interpretation software were used to confirm the consistency of peak identification among samples. Peaks were quantified using area under the curve. A data normalization step corrected for day-to-day variation from instrument tuning differences.
Biochemicals are named according to the following guidelines. Biochemicals without any symbols appended to the end of their name were confirmed based on an authentic chemical standard. Biochemicals with a single asterisk appended to the end of their name were not confirmed based on a standard, but Metabolon is confident in their identify. Biochemicals with a double asterisk appended to the end of their name do not have a standard available, but Metabolon is reasonably confident in their identity. Biochemicals with a number appended to the end of their name are structural isomers of another biochemical in Metabolon’s library.
Statistical analysis
A total of 880 biochemicals including 650 named biochemicals of known identify and 230 unnamed biochemicals of unknown structural identity were semi-quantified as relative peak intensity by Metabolon. Metabolites with ≥ 30% of values less than the limit of detection or with coefficients of variation ≥ 25% calculated from masked quality control samples were dropped, leaving 571 metabolites for analysis. For metabolites with < 30% of values below the limit of detection, undetected values were imputed at the lowest observed value. Metabolites were transformed to the natural log scale, and extreme values were winsorized using the median absolute deviation [30].
Statistical models
The hypothesized associations among childhood BMI z-scores, childhood serum metabolites and young adult breast density phenotypes are shown in Fig. 1. Childhood BMI z-score could directly influence adult breast density and/or act indirectly via childhood serum metabolites. We, therefore, conducted a series of analyses to evaluate associations of BMI z-scores with breast density and serum metabolites, and serum metabolites with breast density. Associations were evaluated using robust mixed effects multivariable linear regression implemented using the R package robustlmm [31]. P-values, which are not reported by robustlmm, were estimated by borrowing degrees of freedom (df) from the same model fit with R package lmerTest [32]. All analyses were conducted with 2 levels of adjustment. Initial models adjusted for age at childhood BMI measurement (years, continuous) and treatment group assignment as fixed effects and DISC clinic as a random effect. When breast density phenotypes were the dependent variables, BMI and BMI2 at time of breast density assessment (continuous) also were included as fixed effects. Fully adjusted models with serum metabolites as the dependent variables also included fixed effects for race (white/nonwhite), whereas fixed effects for race, college graduate (yes/no), duration hormone use (continuous), number live births (0/1+) and current smoker (yes/no) also were included when breast density phenotypes were the dependent variables. These covariates were identified by backward stepwise elimination. When serum metabolites were included in models either as dependent or independent variables, menstrual cycle phase at blood collection was adjusted for by including a factor with 4 levels—premenarche/follicular phase/luteal phase/postmenarche unknown phase. Associations of BMI z-score with serum metabolites and serum metabolites with breast density were adjusted for multiple comparisons using the Benjamini Hochberg false discovery rate (FDR).
BMI z-score and breast density
The association of childhood BMI z-score with breast density was assessed separately for %DBV, ADBV and ANDBV. ADBV and ANDBV were natural log transformed prior to analysis. Breast density phenotype was modeled as a continuous dependent variable while childhood BMI z-score was included as a continuous fixed effect. Percent difference in ADBV and ANDBV for each unit increase in BMI z-score was estimated from the model coefficient for BMI z-score as (exp(β) − 1) × 100 [33].
BMI z-score and serum metabolites
Associations of childhood BMI z-score with serum metabolites were evaluated similarly to ADBV and ANDBV except natural log transformed serum metabolite levels were included as the dependent variable. Percent difference in metabolite levels for each unit increase in BMI z-score were calculated by back transforming the model coefficient for BMI z-score as shown above for ADBV and ANDBV.
Serum metabolites and breast density phenotypes
Metabolites that were associated with BMI z-score at FDR < 0.20 were evaluated in association with %DBV and ADBV. For these analyses, breast density phenotype was modeled as a continuous dependent variable and natural log transformed serum metabolite levels and BMI z-scores were included as continuous fixed effects. The difference in %DBV for a 10% increase in serum metabolite was estimated from the model coefficient for the metabolite as β × ln(1.10) [33]. Percent difference in ADBV for a 10% increase in serum metabolite was estimated from the model coefficient for the metabolite as (1.10β − 1) * 100 [33].
To explore the influence of dietary intake of nutrients on associations of nutrient metabolites with breast density phenotypes, average intakes from foods and supplements from three 24-h dietary recalls collected at the DISC visit when blood was collected were included as fixed effects in fully adjusted models described above.
Spearman correlations were used to estimate associations between metabolite levels in serum from childhood (DISC) and adulthood (DISC06) using the limited dataset described under Metabolomics Assays. Models described earlier were refit including fixed effects for both child and adult metabolite levels to evaluate whether these correlations explained associations between child metabolite levels and young adult breast density.
Mediation analysis
Mediation analysis was performed using the model-based approach as implemented in R package mediation [34]. Because childhood BMI z-scores were inversely associated with %DBV and ADBV, mediation analysis was performed for metabolites associated in opposite directions with BMI z-score (FDR < 0.20) and these breast density phenotypes (P < 0.05). Two multivariable linear regression models were fit for each metabolite–breast density phenotype combination evaluated. The first model included the metabolite as the dependent variable and BMI z-score, age, treatment group assignment, race and menstrual cycle phase at blood collection as fixed effects. The second model included the breast density phenotype as the dependent variable and metabolite, BMI z-score, age, treatment group assignment and several additional potential confounders measured at the DISC06 visit described above as fixed effects. Mediation was evaluated by applying the function mediate to these two models with BMI z-score as the ‘treatment’ and the metabolite as the ‘mediator,’ using bootstrap variances estimated with 5000 simulations.
All tests of statistical significance were two-sided. All analyses were conducted using SAS 9.4 and R 4.1 statistical software.