Cells and reagents
4T1 cells were obtained from the American Type Culture Collection (ATCC), tagged with luciferase using lentiviral particles expressing Firefly luciferase (Amsbio), and grown in Dulbecco’s modified Eagle’s medium (DMEM) supplemented with 10% fetal bovine serum (FBS). MDA-MB-231-Luc cells were obtained from Sibtech and grown in DMEM supplemented with 10% FBS. Where indicated, 4T1-Luc cells were transduced with lentiviral particles expressing H2B-mRFP as previously described  and RFP+ cells enriched by fluorescence-activated cell sorting (FACS). Cells were short tandem repeats (STR) tested regularly using the StemElite ID system (Promega). Both cell types were routinely tested for mycoplasma and used within 10 passages after resuscitation. Mouse astrocytes were purchased from ScienCell and maintained in astrocyte basal medium supplemented with FBS and astrocyte growth supplement. Recombinant human transforming growth factor (TGF)β-1 and bone morphogenetic protein (BMP)7 were purchased from R&D systems. Details of short hairpin RNA (shRNA) lentiviruses, full length open reading frame (ORF) clone expression systems, quantitative reverse-transcription polymerase chain reaction (RT-qPCR) reagents, and antibodies used in this study are provided in Additional file 1 (Tables S1–S4).
For shRNA knockdown of Id2 or Aldh3a1, 5 × 104 4T1-Luc cells were transduced with lentiviral particles (Sigma; Mission transduction particles) at a multiplicity of infection (MOI) of 2. At 24 h post-transduction, the medium was replaced with fresh medium containing 10% FBS. Stably transduced cells were selected in 2.5 μg/mL puromycin for two passages.
For ectopic expression of Id2 or Aldh3a1, 8 μg of bicistronic mammalian expression vector pReceiver-Lv166 mCherry vector with or without full length ORF for mouse Id2 (EX-Mm03201-Lv166) or Aldh3a1 (EX-Mm28326-Lv166-GS) purified plasmid, 4 μg of packaging plasmid psPAX2, and 1.5 μg envelope plasmid pMD2.G were co-transfected into the HEK293T cells using OptiMEM and Lipofectamine 2000. At 48 h post-transfections, virus-containing medium was collected and used to directly infect 4T1-Luc or MDA-MB-231-Luc cells. At 72 h post-infection, cells were FACS sorted to enrich for mCherry-positive cells.
All animals were monitored on a daily basis by staff from the ICR Biological Service Unit for signs of ill health.
To isolate tumour cells disseminated to metastatic sites for gene expression profiling, 1 × 104 4T1-Luc cells in 50 μL phosphate-buffered saline (PBS) were inoculated subcutaneously into 6- to 8-week-old female BALB/c mice. Once primary tumours reached the maximum (mean diameter ≥ 15 mm) allowable size, the mice were sacrificed. Primary tumours, lungs, and brains were harvested at necropsy. Primary tumours were individually cut into small pieces, homogenized using a McIlwain Tissue Chopper (Campden Instruments), and digested in L-15 medium containing 3 mg/mL collagenase type I at 37 °C for 1 h, followed by digestion with 0.025 mg/mL DNase1 at 37 °C for 5 min. After erythrocyte lysis using Red Blood Cell Lysing Buffer (Sigma), the cell suspension was plated into a 10-cm dish in 10 mL DMEM plus 10% FBS. Individual lungs and brains were placed in 1 mL PBS on a 40-μm sieve in a 6-cm plate, mechanically dissociated by pushing through the sieve, and cultured in 2 mL DMEM plus 10% FBS in 6-cm dishes. When primary tumour-, brain- and lung-derived 4T1 colonies were visible, cells were passaged 3–4 times before RNA was extracted from individual sublines for gene expression profiling.
For experimental metastasis assays, 6- to 8-week-old female BALB/c or NOD SCID gamma (NSG) mice were inoculated with 4T1-Luc or MDA-MB-231 cells. For intracranial inoculations, mice were anaesthetised with isoflurane and injected with 1 × 104 4T1-Luc cells in 5 μL PBS into the brain at a rate of 2.5 μL tumour cells/min using a stereotaxic frame with pre-defined co-ordinates relative to bregma (x = −2 mm, y = 1 mm, z = −2 mm). At post-mortem, brains were in-vivo imaging system (IVIS) imaged ex-vivo, fixed in 4% paraformaldehyde for 24 h, and paraffin embedded. For intracardiac inoculation, mice were anaesthetized with isoflurane and 5 × 104 4T1 (BALB/c mice) or 3 × 105 MDA-MB-231 cells (NSG mice) were injected into the left ventricle of the heart in 100 μL PBS. At the end of the experiment, post-mortem tissues were IVIS imaged ex-vivo, fixed in 4% paraformaldehyde for 24 h, and either paraffin embedded or frozen.
For RNA expression analysis of freshly isolated cells, 4T1-Luc-RFP cells were inoculated either subcutaneously (5 × 105 cells), intravenously via the lateral tail vein (1 × 105 cells) or, as described above, intracranially (1 × 104 cells). Then, 9–13 days later, primary tumours, lungs, and brains were collected. Primary tumours were dissociated using the MACS mouse tumour dissociation kit (Miltenyi Biotec), and lungs and brain were dissociated using the MACS lung dissociation kit. RFP-positive 4T1-Luc cells were FACS sorted directly into RLT lysis buffer (Qiagen) for RNA extraction.
For fluorescent imaging of brain sections, whole 4% paraformaldehyde-fixed brains were submerged in 30% sucrose in PBS at 4 °C before moulding in OCT and freezing in dry ice plus isopentane. The frozen brain was cryostat sectioned at 20-μm intervals. For imaging of mCherry-positive cells, sections were defrosted, washed in PBS, DAPI stained, mounted, and scanned using the Vectra 3.0 automated quantitative pathology imaging system (Perkin Elmer).
For histological and immunohistochemical analysis, formalin-fixed paraffin-embedded (FFPE) brain sections were haematoxylin and eosin (H&E) or antibody stained and scanned on the NanoZoomer digital slide scanner (Hamamatsu). Tumour burden was quantified using ImageJ in a coronal section taken at the median level through each brain.
Gene expression profiling
RNA extracted (RNeasy Mini kit) from independently isolated 4T1 sublines derived from primary tumour (T, n = 3), brain metastases (B, n = 4), and lung metastases (L, n = 3) was subjected to microarray analysis on Mouse WG-6 v2.0 expression BeadChips (Illumina, San Diego, CA, USA). RNA amplification, labelling, and hybridization were performed according to the manufacturer’s instructions at Cambridge Genomic Services. The raw data were extracted using GenomeStudio Software and was processed in R using the lumi package (http://www.bioconductor.org). In brief, data were: 1) filtered to remove any non-expressed probes (detection p > 0.01) across samples involved in a given group comparison; 2) transformed using the variance-stabilising transformation; and 3) normalised using the robust spline normalisation method.
Sample relations were estimated using unsupervised hierarchical clustering (Euclidean distance, average linkage) based on 17,550 probes. Two-sample t tests (with random variance model) were used to identify differentially expressed genes between 1) L and T, 2) B and L, and 3) B and T sublines using the BRB-Array Tools (https://brb.nci.nih.gov/BRB-ArrayTools) with a threshold of parametric p value < 0.001. When multiple probes were mapped to the same gene, the most variable probe measured by interquartile range (IQR) across the samples was selected to represent the gene. Gene expression data are deposited at GEO with the accession number GSE110101.
RNA from cultured cells or whole mouse tissue or from freshly isolated tumour cells was extracted using the RNeasy Mini kit or the RNeasy Plus Micro kit, respectively, according to the manufacturer’s instructions. RNA was eluted in 10–30 μL nuclease-free water. The RNA concentration was measured in a 1-μL sample using the Qubit2.0 Fluorometer (Invitrogen) or the ND-1000 Spectrophotometer (Nanodrop). cDNA was produced by reverse transcribing 150–500 ng RNA using the QuantiTect reverse transcription kit (Qiagen) or SuperScript IV First-Strand Synthesis System (Invitrogen) according to the manufacturer’s instruction. qPCR was performed on 11.25 ng cDNA (4.5 μL) with 0.5 μL Taqman Gene Expression Assay probe and 5 μL 2× qPCR Master mix per well. Relative quantification was performed using QuantStudio Real-time PCR software or on an ABI Prism 7900HT sequence detection system. Each reaction was performed in triplicate. Data were analysed using QuantStudio Real-time PCR or SDS 2.2.1 software, and relative expression levels were normalised, unless otherwise stated, to B2m/B2M or Gapdh endogenous control, with a confidence interval of 95% for all assays.
Cell based assays
For spheroid growth assays, 7.5 × 102 cells/well were sorted into U-bottom low adherence 96-well plates (Corning) in DMEM containing 2% FBS. At 7 days post-seeding, the viability of the cells in the three-dimensional tumour spheroids was assessed using CellTiter-Glo (Promega) with luminescence quantified using a Victor X5 plate reader.
For the anoikis assay, 5 × 104 cells/well were seeded into low-adherence six-well plates (Costar) in DMEM containing 2% FBS. At 24 h post-seeding, cells were stained with Annexin V-APC/PI Apoptosis Detection Kit (eBioscience) and analysed using a BD Biosciences LSRII flow cytometer with FACSDIVA and FlowJo software. Cell viability was measured as a proportion of healthy (Annexin-negative, PI-negative) cells.
Human and mouse datasets
The expression levels of ID2 and ALDH3A1 and their relation to the receptor status of ER, progesterone receptor (PR), and HER2 were assessed for breast cancer samples in The Cancer Genome Atlas (TCGA) . The expression level of BMP7 in non-tumour-bearing mice was assessed in 1) brain astrocytes, neurons, and microglia using the Srinivasan et al. RNAseq dataset , and 2) brain microglia and astrocytes using the Kamphuis et al. microarray dataset . Clinical significance (distant metastasis-free survival) of ID2 and ALDH3A1 expression in ER– breast cancers was assessed using publicly available data from Gyorffy et al. . Associations of ID2 and ALDH3A1 mRNA levels and brain metastasis were tested in four breast cancer datasets (GSE2034, GSE2603, GSE12276, and GSE14020), normalized by MAS5.0, log2 transformed, and batch corrected. The tumour subtype information was published in a previous study . The datasets contained 104 ER– breast cancer patients who either had no metastatic relapse (n = 71) or brain-only metastatic relapse (n = 33). Brain metastasis relapse-free survival analysis was performed using the upper tertile of gene expression to dichotomise the breast cancers.
To assess the expression of ID2 in primary breast cancers and breast cancer brain metastases, the datasets described in Schulten et al. (GSE100534)  and Harrell et al. (GSE26338)  were retrieved. GSE26338 contains data deposited from seven different platforms. Samples run on the GPL5325 platform were enriched for metaplastic breast cancers and were therefore excluded.
Statistical analyses were performed using GraphPad Prism 7. Unless stated otherwise, data represent the mean values ± standard error of the mean (SEM). Where the groups followed normal distribution and had equal variances, the significance of the differences of the groups was tested using either unpaired Student’s t test (two groups) or one-way analysis of variance (ANOVA; multiple groups) followed by Bonferroni post-hoc testing for correcting multiple comparisons. If groups did not follow a normal distribution, non-parametric Mann-Whitney (two groups) or Kruskal-Wallis (multiple groups) tests were used. Statistical significance was defined as *p < 0.05, **p < 0.01, and ***p < 0.001.