Analysis of EpCAM positive cells isolated from sentinel lymph nodes of breast cancer patients identifies subpopulations of cells with distinct transcription profiles

Introduction The presence of tumor cells in the axillary lymph nodes is the most important prognostic factor in early stage breast cancer. However, the optimal method for sentinel lymph node (SLN) examination is still sought and currently many different protocols are employed. To examine two approaches for tumor cell detection we performed, in sequence, immunomagnetic enrichment and RT-PCR analysis on SLN samples from early stage breast cancer patients. This allowed us to compare findings based on the expression of cell surface proteins with those based on detection of intracellular transcripts. Methods Enrichment of EpCAM and Mucin 1 expressing cells from fresh SLN samples was achieved using magnetic beads coated with the appropriate antibodies. All resulting cell fractions were analyzed by RT-PCR using four chosen breast epithelial markers (hMAM, AGR2, SBEM, TFF1). Gene expression was further analyzed using RT-PCR arrays and markers for epithelial to mesenchymal transition (EMT). Results Both EpCAM and Mucin 1 enriched for the epithelial-marker expressing cells. However, EpCAM-IMS identified epithelial cells in 71 SLNs, whereas only 35 samples were positive with RT-PCR targeting breast epithelial transcripts. Further analysis of EpCAM positive but RT-PCR negative cell fractions showed that they had increased expression of MMPs, repressors of E-cadherin, SPARC and vimentin, all transcripts associated with the process of epithelial to mesenchymal transition. Conclusions The EpCAM IMS-assay detected tumor cells with epithelial and mesenchymal-like characteristics, thus proving to be a more robust marker than pure epithelial derived biomarkers. This finding has clinical implications, as most methods for SLN analysis today rely on the detection of epithelial transcripts or proteins.


Introduction
The presence of metastatic deposits in the axillary lymph nodes is the most powerful predictor of survival in early stage breast cancer patients [1][2][3]. The sentinel lymph node (SLN) is defined as the first node or group of nodes receiving lymph from a tumor area, and the status of the SLN has been shown to reflect the presence of metastases in the axillary lymph nodes [4,5]. Reliable detection of micrometastatic cells in the SLN is, therefore, a subject of great clinical interest, and several different protocols aimed at identifying breast epithelial cells within the lymphatic basin are currently in use. Metastatic cells may exist in low concentrations, making their identification and isolation a difficult task. Studies have shown that extensive re-examination of presumably negative nodes will identify more positive specimens, but techniques using tissue sections are labor intensive if more detailed analysis is warranted [6][7][8][9]. However, several recent papers have concluded that even small cell deposits may be of clinical relevance, suggesting that a comprehensive examination would be worthwhile [10][11][12][13].
Our laboratory has for many years worked with immunomagnetic selection (IMS) using paramagnetic beads coated with antibodies against cell surface proteins for positive selection of tumor cells from cell suspensions [14][15][16]. The method is fast, sensitive and allows further molecular characterization of isolated live cells [17]. The choice of antibodies is decisive for the impact of the IMS method, as only cells expressing the targeted cell surface proteins will be captured by the magnetic beads.
For identification of epithelial-derived cells the epithelial cell adhesion molecule EpCAM is a commonly used target [18]. EpCAM is a transmembrane glycoprotein expressed by both normal and malignant cells of epithelial origin (for reviews see [19,20]), but over-expressed in many carcinomas. In a recent study, EpCAM was shown to be overexpressed on all breast cancer metastases relative to the matched primary tumor [21]. Mucin 1 (MUC1), a membrane bound glycosylated phosphoprotein predominantly expressed by epithelial cells, is suggested to be a marker for detection of breast-cancer cells not expressing EpCAM [22]. Mucin 1 is over-expressed in several human malignancies, especially adenocarcinomas (for a review see [23]).
Metastatic tumor cells may also be identified by RT-PCR which relies on the detection of intracellular gene transcripts. By carefully selecting genes expressed by the target tumor cells, but absent from the normal stroma, this method may allow very sensitive detection of small metastatic deposits.
The aim of our study was to analyze fresh SLN samples from early stage breast cancer patients using in sequence, IMS and RT-PCR techniques for identification of tumor cells [24]. IMS with anti-EpCAM and anti-Mucin 1 antibodies were used in parallel on disaggregated tissue from SLN to enrich for cells expressing these epithelial proteins. The IMS isolated cell fractions were then analyzed by RT-PCR targeting four epithelial cell associated transcripts; hMAM, AGR2, SBEM and TFF1. This allowed us to compare findings based on the expression of external cellsurface proteins (IMS) with those based on intracellular transcripts (RT-PCR), and to elucidate the molecular heterogeneity among the IMS positive cell populations.
Our initial results showed that all RT-PCR positive cells were also EpCAM positive, indicating that the IMS procedure selected an epithelial cell population. However, only approximately 50% of all IMS-positive SLN samples expressed any of the four epithelial transcripts targeted by our RT-PCR assay. We then speculated that the EpCAM positive but epithelial transcript negative cells isolated from the SLNs could be tumor derived cells passing through an epithelial to mesenchymal transition (EMT). To investigate this we analyzed more IMS positive SLN samples using transcripts associated with EMT. This revealed that EpCAM+ and mammaglobin positive (hMAM+) cells represented a cell population also expressing other typical epithelial transcripts. In the EpCAM+ and hMAM negative cells these epithelial transcripts were lost or markedly reduced, while several mesenchymal markers were increased. The observed changes in gene expression were consistent with cells passing through EMT, indicating that this process could be important in the lymphatic dissemination of breast cancer.

Patients and samples
The sentinel nodes used in this study were collected from January 2008 to March 2010. Patients with primary breast cancer and clinically negative axillary lymph nodes underwent sentinel lymph node biopsy (SLNB) and surgery at Ullevaal University Hospital. The majority of patients had T1 (≤2 cm) or T2 (> 2 cm ≤5 cm) tumors. In total, 120 SLN samples from as many patients have been analyzed in this study. The SLNB procedure was performed according to the guidelines of the Norwegian Breast Cancer Group [25]. Informed and written consent was obtained from all patients and the project was approved by the regional ethics committee. Sentinel lymph nodes from each of the patients were bisected along the longitudinal axis and one half was used for routine histopathological assessment. The other half was used for IMS and RT-PCR analysis.

Immunomagnetic selection method Antibodies and magnetic beads
The MOC31 (anti-EpCAM) monoclonal antibody (IQ Products, Groningen, The Netherlands) [26], and BM-7 (from Dr. S. Kaul, Heidelberg, Germany), reacting with the core protein of Mucin 1, were used in parallel for the detection and characterization of tumor cells in the disaggregated cell suspensions prepared from the sentinel nodes. Immunomagnetic M450 Dynabeads (Invitrogen, Oslo, Norway) with a diameter of 4.5 μm and coated with rat anti-mouse antibodies, were coated with the MOC31 or BM-7 antibodies as previously described [27].

Disaggregation of the sentinel lymph node tissue
The tissue from one half of the lymph node was placed in a petri dish containing 2 to 5 ml of PBS/1%HSA for disaggregation by the use of scalpels. A 70 μm cell strainer (BD Biosciences, Franklin Lakes, NJ, USA) was placed in a 50 ml tube and the fragmented tissue was filtered through the cell strainer into the tube. The cell strainer was washed with PBS/1%HSA. The filtered cell suspension was concentrated by centrifugation at 200 g for five minutes and the cells subsequently resuspended in 5 ml PBS/1%HSA and transferred to round bottom tubes for incubation with antibody-coated beads.

Immunomagnetic detection
Each sample was subjected to two different bead-antibody combinations, containing either MOC31 or BM-7coated immunomagnetic beads, while uncoated beads were used for control. Beads were added to the cells in a ratio of 1:2, and the mixture was incubated under constant rotation for 30 minutes at 4°C in a volume of 1 ml. After incubation, the cells were diluted 1:3 with PBS/1%HSA and the tubes were placed in a magnet holder for selection of the cell-immunobead rosettes. Cells expressing the target antigen were trapped in the tube by the magnet, whereas the supernatant, containing unbound cells, was decanted and saved for RNA isolation. The enriched fraction, containing the viable rosetted cells, was resuspended in 200 μl PBS/1%HSA and placed on ice. A 20 μl sample of the positively enriched fraction was pipetted onto a slide and evaluated by microscopy using a Zeiss Axioscope (Carl Zeiss, Jena, Germany).

Criteria for positivity
A sample was defined as positive when it contained at least two cells, each with ≥ 5 immunomagnetic beads bound to its surface, simultaneously evaluating the size, morphology and three-dimensional picture seen in the microscope.

RT-PCR analysis RNA and cDNA synthesis
All RNA extractions were performed using Trizol Reagent (Invitrogen Life Science, Carlsbad, CA, USA). RNA concentration was routinely assessed on the NanoDrop 1000 instrument (Thermo Fisher Scientific, Waltham, MA, USA). Generally 1 μg of RNA was reverse transcribed using the qScript cDNA synthesis kit (Quanta BioSciences Inc., Gaithersburg, MD, USA) in a volume of 20 μl, then diluted to 50 μl using dH 2 O. The positive enriched fractions contain few cells and thus have low RNA counts. RNA from these fractions was not measured, and all available RNA was loaded into the 20 μl RT-synthesis reaction and diluted to 30 μl before PCR.

Positive controls for RT-PCR
Mononuclear cells isolated from fresh bone marrow of a healthy donor were spiked with either SKBR3 or T47d cells in known ratios (1 SkBr3/10 4 MNC and 1 T47d/10 3 MNC). RNA was isolated using Trizol Reagent, dissolved in RNA storage solution (AM7000, Ambion/Applied Biosystems, Austin, TX, USA) and kept as frozen aliquots at -80°C. The final RNA preparation contained the equivalent of 100 SkBr3 cells/ug and 500 T47d cells/ug, and based on this each positive control PCR contained the equivalent of either 10 SkBr3 cells or 50 T47d cells. The T47d control was a positive control for AGR2 and TFF1 while the SkBR3 control was used for hMAM and SBEM. Controls were included on all PCR plates for quality assessment, and for relative quantification purposes. We used spiked samples and not pure cell lines to closer mimic the actual clinical samples.

Real-time PCR
Real-time PCR reactions were performed on the iCycler instrument from BioRad (Hercules, CA, USA). All reactions were run in parallel, and all samples were in 25 μl volume. Each primer mix contained 200 nM FAM-labeled probe, 300 nM of each primer and 1x Perfecta qPCR Supermix (Quanta BioSciences Inc.). Expression of YARS, a t-RNA synthetase, was used for sample validation and normalization of expression. The reference gene (YARS) had been tested in a panel of SLN samples and found to have equal expression per ng of cDNA. All primers have been validated using appropriate controls and negative and positive controls for all targets were always included in all PCR runs. Primers are designed using the probe finder software from Roche Applied Science available online at the Universal ProbeLibrary Assay Design Center [28], and all probes are from the Universal ProbeLibrary collection (Roche Applied Science). Primer sequences are listed in Additional file 1.

Criteria for positivity
The four markers hMAM, AGR2, TFF1 and SBEM were tested and found completely negative in normal bone marrow; therefore, no cut-offs were used on the cycle threshold (Ct) values. Only samples which showed positive signals from the reference gene YARS in both parallels were considered successfully analyzed. To count as a positive sample at least one other marker (either hMAM; AGR2, TFF1 or SBEM) in at least one cell fraction would have to be positive in both parallels.

EMT RT-PCR arrays
Since our RT-PCR results showed no differences between EpCAM-positive and Mucin 1 positive cells, we only used EpCAM positive and EpCAM depleted cell fractions in the array tests. RNA from EpCAM positive SLN fractions were first analyzed for YARS and hMAM expression using our own primers. All RNA was DNase-treated (Ambion, Austin, TX, USA). Samples denoted by numbers (for example, 540+ and 540-) represent EpCAM positive and depleted cells from single SLN samples. These were analyzed using approximately 500 ng DNase treated RNA per array. In order to obtain a higher amount DNase treated RNA before cDNA synthesis, some samples were pooled according to hMAM expression (details of samples are presented in the supplementary data). These are denoted by letters, for example, G+, L+, A+, Q+, P+, K-and C-on the arrays, and were analyzed using 2.5 μg DNase treated RNA per array. Both quantities are within the range recommended by the manufacturer. The cDNA was mixed with Perfecta SYBRGreen supermix (Quanta BioSciences Inc.) and applied on to ready bought 96-well plates containing primers for EMT-associated genes (SABiosciences, Frederick, MD, USA). The samples were run on the iCycler instrument from BioRad using the recommended protocol.

Data analysis
All expression data were analyzed using the software GenEx (MultiD analyses AB, Gothenburg, Sweden). The software includes an application (Normfinder), which determines the best reference gene(s) for normalization of expression data. The arrays contained five possible reference genes, ACTB, GAPDH, B2M, HPRT1 or RPL13A, and Normfinder determined ACTB and RPL13A to be the best references in our dataset. We also tried alternative normalization strategies, but this did not alter the results. The relative expression of target genes was calculated by comparing all samples against the average expression of each gene (Mean log expression = 1). This is an application included in the GenEx software.

Statistical analysis
Statistical analysis and hierarchical clustering was also performed using the GenEx software. Two-tailed unpaired ttests were used to compare array results obtained from all EpCAM+ fractions, EpCAM+/hMAM+ or EpCAM +/hMAM-fractions to results obtained from EpCAM depleted cell fractions. When genes failed the Kolomogorov-Smirnov test (used to determine if samples show a normal distribution) a non-parametric Mann-Whitney test was used to calculate P-values. A P-value ≤0.05 was considered statistically significant.

Validation of array-results
Real-time RT-PCR for EpCAM, EGFR, Wnt5b, SPARC, Col1A2, MMP2, ERRB2, hMAM and YARS were performed as previously described for other RT-PCR targets, but due to low amount of template all reactions were run as singles. YARS was used for normalization and fold change was calculated relative to the average expression of each gene. All calculations and statistics were performed using the GenEx software. Primer sequences are listed in Additional file 1

hMAM, AGR2, TFF1 and SBEM were selected for RT-PCR studies
For detection of breast cancer cells in lymph nodes we searched the literature for transcripts reported to be either breast or epithelial associated. We examined the expression of several potential markers in a panel of 18 normal bone marrow samples, 6 breast cancer cell lines and 8 cell lines of other human malignancies. The four genes hMAM (mammaglobin, MGB1, SCGB2A2) [29], AGR2 (anterior gradient homolog 2) [30], TFF1 (trefoil factor 1, pS2) [31] and SBEM (small breast epithelial mucin) [32] were all found negative in bone marrow, indicating their absence in normal hematopoietic tissue. hMAM proved to be the most breast specific marker, although its expression was restricted to only four of six breast cancer cell lines. TFF1, AGR2 and SBEM were expressed to a varying degree in all breast cancer cell lines, but also in other malignancies ( Table 1).

Comparison of IMS and RT-PCR results
To elucidate the sensitivity and specificity of the IMS method we performed, in sequence, IMS screening and RT-PCR analysis of clinical SLN samples. The IMS assay employed two antibodies against cell surface markers, MOC31 (anti-EpCAM) and BM-7 (anti-Mucin 1) for cell separation. Both immunomagnetically selected cells and their corresponding depleted fractions, that is, the remaining cell population after positive selection, were collected and subjected to RT-PCR using the four selected transcripts as molecular markers. The control fraction, corresponding to the original, non-separated cell population, was also analyzed. A schematic view of the IMS and RT-PCR procedure is presented in Figure 1.
Each lymph node sample gave a maximum of five cell fractions for RT-PCR, and from a total of 71 patients we successfully obtained both IMS and RT-PCR data.
All 71 SLNs were IMS positive with the MOC31 antibody, and 53 of these were also positive with BM-7. To check for associations between IMS and RT-PCR results, we compared the RT-PCR findings with the IMS data ( Table 2).
Expression of the four epithelial transcripts hMAM, AGR2, TFF1 and SBEM was evaluated in selected cell lines and healthy bone marrow by RT-PCR. We then compared the number of IMS positive cells (based on microscopy inspection of 1/10 of the sample) detected in each sample to the RT-PCR results ( Table 3) To further investigate this we extended the RT-PCR studies by the use of pathway-focused PCR-array plates, and tested for transcripts associated with the process of epithelial to mesenchymal transition. EMT is naturally occurring during embryonic development, and is believed to be reactivated in carcinoma metastasis [33][34][35][36]. Epithelial cells undergoing EMT down regulate typical epithelial markers and become mesenchymal-appearing cells with an increased capability for motility and invasion of surrounding tissues. If a similar process occurs in breast cancer cells that migrate to lymph nodes this would have affected the expression of epithelial transcripts including those we used in our RT-PCR assay.
A total of 17 IMS isolated samples representing EpCAM positive or EpCAM depleted cell fractions from SLNs were successfully analyzed on 96-well PCR-array plates. Prior to array analysis we scored all samples for hMAM expression. Information regarding the number of bead-rosetted cells and hMAM status of each sample is presented in the Additional file 2. To further examine The RT-PCR results (positive/negative) are shown as number (%) of the total samples that were positive or negative with each of the antibodies used in the immunomagnetic selection (IMS). Table 3   how the patient samples differed based on expression of the EMT associated genes we first performed an unsupervised hierarchical clustering analysis of the normalized expression data ( Figure 3). This divided all samples into two major groups, the EpCAM positive cells (10 arrays) and the EpCAM depleted cells (7 arrays). The group consisting of EpCAM positive cells was further split into two clusters, one containing the EpCAM+/hMAM-cells (five arrays) and one containing the EpCAM+/hMAM+ cells (five arrays). We then performed statistical t-tests on array results obtained from the different groups, that is, we compared EpCAM+/hMAM+ (5 arrays) vs EpCAM+/hMAM-(5 arrays), EpCAM+ (10 arrays) vs EpCAM depleted cells (7 arrays), and EpCAM+/hMAM-(5 arrays) vs EpCAM depleted (7 arrays). The low number of samples present in each group reduces the power of statistical testing, but the resulting gene lists are useful for evaluating trends in the data. All results from the t-tests are available in Additional file 2.

Epithelial transcripts are associated with hMAM expressing cells
When comparing EpCAM+/hMAM+ cells with EpCAM +/hMAM-cells the array results revealed that cells expressing mammaglobin also expressed high levels of other known epithelial markers such as E-cadherin (CDH1), epithelial cytokeratins (KRT19 and KRT7), and desmoplakin (DSP). This confirmed the epithelial nature of these EpCAM+/hMAM+ cells, and was in agreement with our initial findings from the RT-PCR-assay where the use of four epithelial markers gave little additional information compared to the use of hMAM alone. All epithelial transcripts seem to be expressed by the same population of EpCAM+ cells. The estrogen receptor (ESR1) and ERBB3 (a member of the ErbB family) were also significantly higher expressed in EpCAM+/hMAM+ compared with EpCAM+/hMAM-cells (Figure 4a). In contrast, the EpCAM+/hMAM-cells showed a marked decrease in E-cadherin expression, and no expression of KRT7 or KRT19, but an increased expression of the transcription factor Snai2 (Slug), a known repressor of E-cadherin, the mesenchymal filament vimentin and Wnt 5B, a component of the Wnt signaling pathway (Figure 4b).
EMT is characterized by a down-regulation in epithelial markers, particularly E-cadherin, and an up-regulation of mesenchymal markers, particularly vimentin. These findings are thus consistent with the EpCAM+/hMAM-cells being in an EMT-state.

Transcripts associated with EpCAM enrichment
A number of targets present on the EMT array showed significantly increased expression in all EpCAM enriched cell populations compared with the EpCAM depleted cell fractions. This applied to SPARC, an indirect repressor of E-cadherin implicated in cellular invasion and motility (for a recent review see [37]), the transcription factor MITF, and the epidermal growth factor EGFR. EGFR belongs to the ErbB family of receptor tyrosine kinases (which also includes ERBB2 and ERBB3) and members of this family have been implicated both in EMT induction, and in tumor progression. Members of the family of metalloproteinases (MMP2/3/9), which are matrix degrading enzymes, collagens (Col1A2, Col3A1) and desmocollin (DSC2), a protein important for the formation of desmosomes and primarily found in epithelial cells, were also enriched for in all EpCAM positive cell fractions ( Figure 5).
That so many genes were co-expressed with EpCAM, regardless of hMAM status, confirms that the EpCAM+ cells represented a distinct cell population, different from the EpCAM depleted cells.

Validation of array-results
To validate the results from the RT-PCR array analysis we selected some genes identified as "EpCAM associated" and designed our own RT-PCR assays specific for EGFR, Wnt5b, SPARC, MMP2, and Col1A2. RT-PCR primers for EpCAM, ERBB2, and hMAM were also included in this panel. The selected assays were used to analyze 17 new SLN samples, all sorted by IMS into EpCAM enriched and EpCAM depleted cell populations. All the selected transcripts showed significantly higher expression (P ≤0.05) in the EpCAM enriched cell populations compared with the EpCAM depleted cells, thereby confirming the array-data. This also applied to EpCAM itself, as measured by RT-PCR. However, EpCAM expression was markedly higher in the eight samples also positive for mammaglobin (EpCAM +/hMAM+) than in the nine samples not expressing hMAM (Figure 6). In a t-test, EpCAM was the gene with the single largest mean difference between groups, both when all EpCAM+ and EpCAM depleted cells were compared (difference 9.28), and also when comparing EpCAM+/hMAM+ (difference 10.78) or EpCAM+/hMAM-(difference 7.94) to the EpCAM depleted cells. This confirms that the IMS-enrichment using MOC31 targets EpCAM-RNA expressing cells. EGFR, SPARC and Wnt5B were the genes showing the largest difference between EpCAM+ and EpCAM depleted cells. SPARC was slightly higher in the EpCAM+/hMAMcells, while Wnt5B and EGFR were evenly expressed by all EpCAM+ cell populations (Figure 7).

Discussion
In early stage breast cancer it is well established that the metastatic status of the SLN is an important prognostic parameter, but an optimal method of SLN analysis is still not available. In this study we sought to compare two sensitive methods that both seek to identify tumor cells in sentinel lymph nodes, immunomagnetic selection (IMS) using antibodies targeting the epithelial surface proteins EpCAM and Mucin 1, and RT-PCR targeting breast-epithelial transcripts. The objective was to examine to what extent the results obtained with the different methods did overlap and if one method was better than the other. Our results showed that all RT-PCR positive cells were EpCAM positive, but the EpCAM-IMS assay identified twice as many positive specimens as RT-PCR targeting epithelial transcripts.
When we investigated gene expression using other RT-PCR markers we found expression profiles consistent with the EpCAM positive but epithelial transcript negative cells (EpCAM+/hMAM-) being tumor derived cells passing through an epithelial to mesenchymal transition (EMT). EMT is a developmental program used in cell differentiation during embryogenesis, but parts of this genetic program is believed to be reactivated during metastasis in order to transform malignant epithelial cells into motile and invasive mesenchymal-like cells [33,34,38]. EMT initiates changes in gene transcription which allows epithelial cells to lose polarity and cell-cell contact, detach from the epithelial cell layer, modify the extracellular matrix and migrate through tissues. It is believed that EMT is a transient state and that the process is reversed once the cells form a metastatic lesion. However, as EMT or EMT-like mechanisms affect gene expression and protein translation, identifying metastatic cells while they are en route to a secondary organ might require different biological markers than the markers used to identify cells in the primary tumors.
The challenge in micrometastasis detection consists of identifying very few tumor cells in a background of normal cells. However, tumor cells that pass through EMT adopt the phenotypic features of mesenchymal tissue, thereby making them difficult to distinguish from the surrounding stroma. Lymph nodes are home to a variety of different cell populations, many of which have a mesenchymal character. Consequently, clear evidence of oncogenic EMT is hard to prove in samples of lymph node tissue. In addition, genes are rarely implicated in only one cellular process; their effects may be quite different depending on the context in which they are expressed. Thus, we cannot at present be sure whether we are actually observing cancer cells passing through EMT or cells involved in other differentiation processes. However, a number of indications point to these cells being tumor derived. First, all cells were selected by EpCAM protein expression on the cell surface. The efficiency of this enrichment is demonstrated by EpCAM RT-PCR results from MOC31 enriched and depleted cell fractions ( Figure 6). EpCAM is a cell surface protein generally used as an epithelial specific antigen. Second, our EpCAM-IMS assay proved to enrich for epithelial-transcript expressing cells as demonstrated both by our initial RT-PCR assay targeting four epithelial transcripts ( Figure 2) and the array results ( Figure 4a). Third, all EpCAM positive cells consistently showed increased expression of a number of transcripts compared with the EpCAM depleted cells, demonstrating that EpCAM positive cells represent a distinct cell population independent of hMAM expression. Some genes were nearly exclusively expressed by EpCAM positive cells (EGFR, Serpine-1 (PAI-1), Snai2, Col3A1 and MMP2/3), while others were enriched for (SPARC, MITF, MMP9, MST1R Col1A2/ Col5A2, Wnt5A/Wnt5B and others). This expression pattern was confirmed for some selected genes using different RT-PCR primers and a new set of SLN samples (Figure 7). Fourth, the changes in expression observed between EpCAM+/hMAM+ and EpCAM+/hMAM-cell populations were consistent with changes expected to occur during EMT (Figure 4a, b). Thus, as expression of E-cadherin and other epithelial transcripts were lost or greatly reduced, mesenchymal markers as Snai2 (Slug), vimentin, Wnt5B, and MMP9 were increased. Collectively these results indicate that EpCAM+/hMAM-cells are tumorderived cells similar to EpCAM+/hMAM+ cells, but with phenotypic alterations compatible with an EMT state.
Epithelial cytokeratins are commonly used for antibodybased detection of tumor deposits in SLN by immunohistochemistry (IHC). Recently, both hMAM and CK19 also received FDA approval as markers for the RT-PCR based analysis of SLNs in a diagnostic kit (GeneSearch Breast Lymph Node (BLN) assay; Veridex LLC, Raritan, NJ, USA). Studies that have compared RT-PCR and IHC on SLN samples generally find that the methods agree [39][40][41], demonstrating that epithelial protein-and RNAmarkers are expressed by the same cells. As shown in Figure 4a our EpCAM+/hMAM+ cells also expressed CK19, along with other epithelial markers. However, hMAM is not a universal marker of breast cancer derived tumor cells, as demonstrated by our initial test where only four of six breast cancer cell lines were found to express hMAM. Similarly, the expression of different cytokeratins has been found to vary considerably in DTC detected in bone marrow from breast cancer patients [42]. As our EpCAM IMS assay detected both hMAM +/-cells and CK +/-cells, this demonstrates that EpCAM is able to capture a broader range of tumor cells. Importantly, these mesenchymal-like cells will remain undetected when purely epithelial markers are used.
All patients included in this study were early stage, clinically node-negative patients. Statistically, the expected five-year survival rate is between 85 and 95% (based on T1/T2 patients, prognoses from [43]). Our EpCAM IMS assay therefore detects cells in SLNs of far more patients than those who can be expected to suffer from relapse later. However, although these cells have migrated to the lymph node we do not know whether they will spread further from there. Currently, it is not known which characteristics that identify metastatic founder cells or which factors that influence the fate of cells disseminated to lymph nodes. Emerging evidence from studies on bone marrow suggests that tumor cell dissemination is a common event which starts at an early stage, and that disseminated cells may evolve at the ectopic site independently of the primary tumor [44,45]. We do not know what potential cells disseminated to lymph nodes have to give rise to distant metastases, but the data are relevant in the context of understanding tumor cell dissemination and behavior. Antibodies targeting EpCAM are commonly used for detecting metastatic cells in blood or bone marrow [46][47][48]. Notably, EpCAM is approved for diagnostic use in the CellSearch assay which aims to detect circulating breast cancer cells [49][50][51]. It has been suggested that EpCAM might be down-regulated as a consequence of EMT [52,53]. However, our data indicate the concomitant expression of EpCAM and EMT markers on metastatic cells in SLNs, in line with recent results obtained on samples from bone marrow [54] and blood [55] Our EpCAM RT-PCR assay consistently showed higher EpCAM expression in EpCAM+/hMAM+ samples than in EpCAM+/hMAM-samples ( Figure 6). This difference did not reflect results obtained in the EpCAM IMS assay where many samples later found to be hMAM negative contained high numbers (> 1,000) of heavily MOC31rosetted cells. This apparent imbalance in EpCAM protein expression and RNA expression could reflect the dynamics of an EMT process, where the halt of mRNA transcription and degradation of mRNA transcripts might be more rapidly executed than the internalization and degradation of proteins already present in the cell membrane. Thus, in cells undergoing EMT, proteins associated with an epithelial profile could be detectable longer than the transcripts encoding them.
Breast cancer is a very heterogeneous disease with variation both on the DNA, RNA and protein levels [56,57]. This heterogeneity is found within tumors as well as between patients, and can be explained by several mechanisms; the origin of the cell causing the cancer [58], the different environmental factors involved in driving the tumor, and transient changes in cellular status associated with malignant progression, such as EMT [33][34][35]38,59]. We observed a large heterogeneity on the RNA level among our EpCAM enriched cell populations, also in hMAM expression. Since our RT-PCR results are based on the pool of EpCAM positive cells, it can not be derived from this whether the variable hMAM intensity was due to a difference between patients, or if patients with low hMAM expression had a mixture of EpCAM+/hMAM+ and EpCAM+/hMAM-cells. EMT is a plastic process and the transition from an epithelial cell to a mesenchymallike cell is probably a continuum rather than an abrupt change. Thus, it will be interesting to analyze EpCAM positive cells at the single cell level to see if such differences in cell state are detectable within individual patients.

Conclusions
Our analysis of SLN samples indicates that the activation of EMT like processes is important in the lymphatic dissemination of breast cancer. EpCAM thus appear to be a more universal marker of breast cancer micrometastasis than pure epithelial derived targets, as it will also detect tumor cells with EMT-like characteristics.

Additional material
Additional file 1: Primers and probes used for RT-PCR. Sequence of primers designed using ProbeFinder from Roche Applied Science and the corresponding probes from the Universal ProbeLibrary [28] Additional file 2: SLN samples used on the EMT-arrays and EMT RT-PCR with calculations. A complete overview of all clinical samples used in this study. The number of Immunomagnetic selected (IMS) positive cells per sample and the hMAM status as determined by RT-PCR is listed. Then the EMT-array RT-PCR data are shown, both raw Ct values and the normalized expression data. Following this is the hierarchical clustering analysis and the different t-tests.