CD44 rs13347 C>T polymorphism predicts breast cancer risk and prognosis in Chinese populations

Introduction It has been demonstrated that the interplay of adhesion molecule CD44 and its ligands can regulate cancer cell proliferation, migration and invasion, as well as tumor-associated angiogenesis and is related to breast cancer patient survival. In this two-stage, case control study, we determined whether common functional tagSNPs (single nucleotide polymorphisms) are associated with breast cancer risk and prognosis. Methods Five tagSNPs of CD44 (rs10836347C>T, rs13347C>T, rs1425802A>G, rs11821102G>A, rs713330T>C) were selected and genotyped in 1,853 breast cancer patients and 1,992 healthy control subjects in Eastern and Southern populations. Potential function of rs13347C>T and association between this variation and breast cancer were further studied. Results Compared with the most common rs13347CC genotype, variant genotypes (CT and TT) increased an individual's susceptibility to breast cancer, especially in estrogen receptor (ER) negative patients (odds ratio (OR) = 1.37, 95%CI = 1.17 to 1.59 for ER positive patients; OR = 2.37, 95% CI = 2.00 to 2.80 for ER negative patients). We also found that rs13347CT+ TT genotypes predicts lower five-year survival rate (hazard ratio (HR) = 1.85, 95% CI = 1.09 to 3.15, P = 0.023), with the lowest survival probability in ER negative T allele carriers. Furthermore, our reporter assay findings, although preliminary and rather modest, showed that miR-509-3p may suppress CD44 expression more strongly in C allele carriers than T allele carriers (P < 0.01). Similarly, rs13347 variant genotypes (CT and TT) carriers were shown to have more CD44 expression than CC carriers in both immunohistochemistry (P < 0.001) and western blotting (P = 0.001) results. Conclusion These findings suggest that CD44 rs13347C>T polymorphism may affect breast cancer development and prognosis by increasing CD44 expression.


Introduction
With gradually increasing incidence and mortality, breast cancer refers to malignant tumor originating from breast tissue, most commonly from the inner lining of milk ducts or the lobules that supply the ducts with milk [1]. Excluding cervical cancer, it is the most frequent cancer killer of middle-aged women [2]. Recent studies have established some etiologic factor for breast cancer, such as ionizing radiation [3], alcohol consumption [4], high-fat diets [5], oral contraceptives and use of hormones in treatment of certain diseases [6]. Excluding these environmental factors, genetic variations also play an important role in an individual's risk of developing breast cancer [7].
Compelling evidence has demonstrated that breast cancers contain few phenotypically distinct cells, known as breast cancer-initiating cells (BCICs), which account for primary and metastatic tumor growth [8,9]. BCICs can be distinguished from other breast cancer cells by the expression of so-called CIC-markers that play a vital role in BCIC maintenance and activity [10]. CD44 is one of the well known markers of BCIC, which may contribute not only to drug and radiation resistance of BCIC but also preparation of the pre-metastatic niche [11].
By cell-cell and cell-extracellular matrix adhesive interactions, CD44 participates in some fundamental biological processes, including lymphocyte homing, cell migration, haematopoiesis, inflammation, wound healing, embryonal development and apoptosis [12]. Besides, CD44 also plays an indispensable role in tumor pathology, involved in cell differentiation, invasion and metastasis [13][14][15]. Also, some studies reported strong association between CD44 expression and breast cancer aggressiveness [16,17]. Correspondingly, some studies have recently indicated qualitative and quantitative changes in CD44 expression in breast cancer [18].
Since expression of CD44 is closely related to development of breast cancer and genetic variations in certain genes may affect their expression [19], we hypothesize that variations in CD44 that can theoretically affect its protein expression may be associated with varying risk and prognosis of breast cancer. In this study, five eligible tag single nucleotide polymorphisms (tagSNPs) of CD44 gene were selected from the Genbank dbSNP database to evaluate the contribution of detected polymorphisms to risk of developing breast cancer. One of them is an A/G polymorphism (rs1425802) in the promoter region, the conversion from A to G cause loss of an Nkx-2 binding site, which may theoretically affect the CD44 transcriptional activity. Another T/C (rs713330) polymorphism in the intron was linkage disequilibrium with the nonsynonymous rs9666607 G>A polymorphism, which may change the 417 amino acid from Arg to Lys. The other three polymorphisms (rs13347C/T, rs10836347C/T, rs11821102G/A) all locate in the 3'UTR of CD44, each of which can cause a change in the binding ability of certain MicroRNA between the two different alleles. Only one published research article has investigated polymorphisms in CD44 exon2 and breast cancer [20]; however, no study has investigated the role of tagSNPs that cover all common polymorphisms in breast cancer risk. So, we carried out a hospital-based, case-control study including 1,853 breast cancer patients and 1,992 cancer free controls to investigate the contribution of the five polymorphisms of CD44 to susceptibility to and prognosis of breast cancer.

Materials and methods
Study subjects for case-control and follow-up study All subjects in the case-control study were ethnically homogenous Han Chinese derived from the Eastern Chinese population or Southern Chinese population. In the Eastern Chinese population, patients with newly diagnosed breast cancer (n = 1,049) were consecutively recruited from the First Affiliate Hospital of Soochow University (Suzhou) during March 2001 to May 2009. All the eligible patients diagnosed at the hospital during the study period were recruited, with a response rate of 89%. Patients were recruited from Suzhou city and its surrounding regions, and there were no age, stage and histology restrictions. Population controls (n = 1,157) were cancer-free people living in Suzhou region; they were selected from a nutritional survey conducted in the same period as the cases were collected [21]. In the Southern Chinese population, breast cancer cases (n = 804) were recruited from the Tumor Hospitals affiliated with Guangzhou Medical College between 2002 and 2009 with a response rate of 91%. Cancer-free controls (n = 835) were randomly selected from a pool of 5,000 individuals who participated in a community-based screening program for a health checkup conducted in Guangdong province during the same time period when the cases were recruited [22]. The pathological type and tumor staging were evaluated according to the 2002 American Joint Committee on Cancer staging system. The clinical features of the patients are summarized in Additional file 1, Table S1. The patients were frequency matched to controls on age. In Suzhou center, the average age was 49 years (range 21 to 79) for case patients, and 49 years (range 20 to 81) for control subjects (P = 0.57); in Guangzhou center, the average age was 48 years (range 14 to 88) for case patients, and 47 years (range 17 to 79) for control subjects (P = 0.60) For the five-year survival rate study, 566 breast cancer patients with relatively complete clinical information from the First Affiliate Hospital of Soochow University were followed up as the discovery set. Similarly, 331 patients from tumor hospitals affiliated with Guangzhou Medical College were involved in the validation set. Patients were followed-up by telephone calls every three months and survival time was calculated from the date when patients first received confirmed diagnoses until the date of the last follow-up or death. Dates of death were obtained from inpatient and outpatient records or from the patients' families through telephone follow-up. Clinical features of the subjects for the follow-up studies were shown in Additional file 2, Table S2.
At recruitment, informed consent was obtained from each subject. This study was approved by the Medical Ethics Committee of The First Affiliate Hospital of Soochow University and Tumor Hospitals affiliated with Guangzhou Medical College.

TagSNPs selection
Bioinformatics analysis with Haploview software 4.2 (Mark Daly's lab of Broad Institute, Cambridge, MA, Britain) was performed to analyze the haplotype block based on the CHB (Chinese Han Beijing) population data of HapMap (HapMap Data Rel 27 PhaseII +III, Feb 09, on NCBI B36 assembly, dbSNP b126 (International HapMap Project). Six tagSNPs were found to cover all the potential functional common SNPs (MAF > 0.05) in the CD44 gene: rs8193, rs11821102, rs10836347 and rs13347 in the 3'UTR, rs1425802 in the promoter and rs9666607 in exon region (Additional file 3, Figure S1). Among them, rs8193 and rs13347 were in high linkage disequilibrium (LD) (D' = 1.0, r 2 = 0.527), so the selection of rs13347 is enough to represent the two SNPs. Besides, due to the difficulty in genotyping rs9666607 by MALDI-TOF method, we chose rs713330, which is in complete LD with rs9666607 (D' = 1.0, r 2 = 1) to replace it.

Genotyping analysis
Genomic DNA was isolated from the peripheral blood lymphocytes of the study subjects. MassArray (Sequenom, San Diego, CA, USA) was used for genotyping all markers using allele-specific MALDI-TOF mass spectrometry [23]. Primers and multiplex reactions were designed using the RealSNP.com Website. All breast cancer patients and healthy controls in Suzhou center were genotyped for rs10836347, rs13347, rs1425802, rs11821102 and rs713330 polymorphisms. Patients and controls from Guangzhou center were genotyped only for the polymorphism rs13347 to warrant the results of Suzhou.

Construction of CD44 3'UTR luciferase reporter plasmids
Based on bioinformatics analysis, CD44 rs13347 C not T is predicted to lie in a hsa-mir-509-3p binding site. Therefore, we hypothesized that hsa-mir-509-3p would bind tightly to CD44 mRNA transcripts containing the C allele, negatively regulating CD44 expression. To test this hypothesis, the T and C allelic reporter constructs were respectively prepared by amplifying a 362-bp CD44 3'UTR region from subjects homozygous for the T and C allele, including the artificial XhoI and NotI enzyme restriction sites with forward primer 5'-ATCG CTCGAG GGCCATTGTCAACGGAGA-3' and reverse primer 5'-ATGC GCGGCCGC CAGGCTTGAAA-TATGGATTCG-3'. The amplified fragments were then cleaved with the XhoI and NotI enzymes (New England BioLabs, Ipswich, MA, USA). The psiCHECK2 vector (Promega, Madison, WI, USA) was also cleaved with the XhoI and NotI enzymes, and the above-prepared fragment and psiCHECK2 vector were then ligated by T4 DNA ligase (New England BioLabs). The two constructs were sequenced to confirm the allele, the orientation and integrity of each insert.

Transient transfections and luciferase assays
293T or MCF-7 cells were maintained in Dulbecco's modified Eagle's medium with high glucose (Gibco, Los Angeles, California, USA) supplemented with 10% heatinactivated fetal bovine serum (Gibco) and 50 μg/ml streptomycin (Gibco) at a 37°C incubator supplemented with 5% CO2. Cells were seeded at 1 × 10 5 cells per well in 24-well plates (BD Biosciences, Bedford, MA, USA). Sixteen hours after the plating, cells were transfected by Lipofectamin 2000 (Invitrogen, Carlsbad, California, USA) according to the manufacturer's suggestion. In each well, 800 ng psiCHECK-2-CD44-3'UTR vectors were co-transfected with 50 pmol hsa-mir-509-3p mimics (Ambion, Austin, TX, USA) and 40 pmol hsa-mir-509-3p inhibitor accordingly. The hsa-mir-509-3p inhibitor is single-stranded RNA molecules, which can specifically knock-down endogenous hsa-mir-509-3p. In addition, 100 pmol Negative Control #1 from Ambion was in every transfection experiment. There are six replicates for each group and the experiment is repeated at least three times. Twenty-four hours after transfection, cells were harvested by passive adding of 100 μl buffer. Renilla luciferase activities in cell lysate were measured with the Dual-Luciferase Reporter assay system (Promega) in TD-20/20 luminometer (Turner Biosystems, Sunnyvale, CA, USA) and were normalized with the firefly luciferase activities.

Western blotting analysis
To analyze the correlation between rs13347 C>T polymorphism in 3' UTR of CD44 and the protein expression levels in breast cancer tissues, Western blotting assays were performed. Generally, 39 breast cancer tissues were homogenized in 800 μl detergent lysis buffer and then the tissue homogenates were centrifuged at 12,000 g for 15 minutes to get the supernatant. Sixty micrograms of total proteins (the supernatant) were run on a SDS-polyacrylamide gel electrophoresis (SDS-PAGE) and transferred to PVDF (Millipore, Billerica, MA, USA). The membrane was blocked with 5% milk in tris-buffered saline (TBS) with 0.05% Tween-20 for one hour at room temperature with constant agitation. The polyclonal antibody against CD44 and the monoclonal antibody against GAPDH were both purchased from Santa Cruz Biotechnology (Santa Cruz, CA, USA). The membranes were incubated overnight at 4°C with the primary antibody diluted 1:1,000 and the proteins were detected with a Phototope-horseradish peroxidase Western blot detection kit (Cell Signaling Technology, Danvers, MA, USA). The CD44 protein expression levels were normalized to that of GAPDH by calculating the relative expression levels.

Immunohistochemistry analysis
After screening hematoxylin and eosin-stained slides for optimal tumor content, we constructed tissue slides. Cores were taken from each formalin-fixed, paraffinembedded breast cancer samples by using punch cores that measured 0.8 mm in greatest dimension from the center of tumor foci. Immunohistochemistry for CD44 was performed by using the avidin-biotin complex method (ABC; Vector Laboratories, Burlingame, CA, USA), including heat-induced antigen-retrieval procedures. Primary antibodies were mouse antihuman monoclonal antibodies combined with CD44 (1:200; Santa Cruz Biotechnology,). The components of the Envision-plus detection system (EnVision+/HRP/Mo; Dako, Carpinteria, CA, USA) were applied. Reaction products were visualized by incubation with 3, 3'-diaminobenzidine. Negative controls were treated identically but with the primary antibody omitted. The images of stained slides were obtained and evaluated by experienced pathologists. The percentage of positive tumor cells was determined and graded (0 to 5): 0% (0), 1 to 20% (1), 21 to 40% (2), 41 to 60% (3), 61 to 80% (4) and > 81% (5) [24].

Statistical analysis
Two-sided chi-square tests were used to assess differences in the distributions of age, menstrual history, body mass index (BMI) and family history of breast cancer between cases and controls as well as the allele and genotypes. The Hardy-Weinberg equilibrium (HWE) was tested by a goodness-of-fit chi-square test to compare the expected genotype frequencies with observed genotype frequencies (p 2 + 2pq + q 2 = 1) in cancer-free controls. The association between case-control status and each SNP, measured by the odds ratio (OR) and its corresponding 95% confidence interval (CI), was estimated using an unconditional logistic regression model, with and without adjustment for age, BMI and family history of cancer. Logistic regression modeling was also used for the trend test [25,26]. The data were further stratified by age, age at menarche (years), menstrual history, BMI, pathological type, stage, estrogen receptor status, progesterone receptor status and family history of cancer to evaluate the stratum variable-related ORs among the CD44 genotypes. Homogeneity among stratum variable related ORs was tested [25]. The associations between overall survival time and demographic and clinical characteristics were estimated using the Kaplan-Meier method and Log-rank test by SAS. The effect modifications by these characteristics and the effects of SNPs on death risk in patients with breast cancer were assessed using the Wald test in the multivariate Cox proportional hazards regression models after adjusting for the confounders. The proportional hazards assumption was examined by testing interactions between the genotypes and time (all P-value > 0.05). The differences in the luciferase reporter activity, normalized expression values and protein level in cancer tissue of CD44 (Western blot ratio and IHC scores) between each allele were analyzed by Kruskal-Wallis one way ANOVA. The tests were all two-sided and analyzed using the SAS software (version 9.1; SAS Institute, Cary, NC, USA). P < 0.05 was considered statistically significant.

Genotypes and risk of breast cancer
The association of breast cancer with rs13347C>T was performed by two independent laboratories at Soochow University and Guangzhou Medical College in Eastern (1,049 cases and 1,157 controls, Jiangsu Province) and Southern (804 cases and 835 controls, Guangdong Province) Chinese populations. The polymorphisms rs10836347, rs1425802, rs11821102 and rs713330 were only genotyped in the Suzhou population (1,049 cases and 1,157 controls) (Additional file 4, Figure S2). Genotypes were confirmed by direct sequencing (Additional file 5, Figure S3). The observed genotype frequencies of the four polymorphisms in controls conformed to the HWE (P = 0.84 for rs13347, 0.97 for rs10836347, 0.55 for rs1425802, 0.22 for rs11821102, P = 0.39 for rs713330 in the Eastern population; and P = 0.89 for rs13347 in the Southern population, respectively). Genotyping results showed that only rs13347 was statistically, significantly associated with breast cancer in both Eastern and Southern Chinese populations (Table 1). In the Eastern Chinese population, the frequency of the rs13347 TT and CT genotype was significantly higher in patients with breast cancer (P trend < 10 -5 ) compared to the healthy controls. The adjusted OR of carrying the rs13347 CT and TT genotype in Suzhou cancer patient groups were 1.69 and 2.22, respectively, compared with the rs13347 CC genotype. The association was confirmed in the Southern population where the odds of carrying the rs13347 CT and TT genotype in cancer patient groups were 1.61 (95% CI = 1.31 to 1.98) and 2.25 (95% CI = 1.51 to 3.35), respectively, compared with the rs13347 CC genotype (P trend < 10 -5 ).

Stratification analysis of CD44 rs13347 genotypes and risk of breast cancer
The risk of breast cancer related to CD44 rs13347 genotypes were further examined with stratification by age, age at menarche, menstrual history, BMI and family history of breast cancer, pathological type, clinical stage, estrogen receptor status and progesterone receptor status. As shown in Figure 1, we observed significant difference in the genotype frequency between ER-negative patients and ER-positive patients (P < 10 -5 ). Compared with the CC genotype, the T allele carriers (CT+TT) had 2.37-fold increased risk of developing breast cancer in ER-negative patients. As for the ER-positive patients, the increased risk of CT+TT is only 1.37-fold. However, there were no differences in other subgroups.

Regulation effects of hsa-mir-509-3p on CD44 3'UTR translation efficiency
Compared with the psiCHECK-2-CD44-3'UTR-rs13347 T, the translation of Renilla luciferase of psiCHECK-2-CD44-3'UTR-rs13347 C was significantly reduced in the presence of hsa-mir-509-3p in a concentration-dependent manner (P < 0.001), which distinguished the magnitude of the effects of hsa-mir-509-3p on the transcription of different alleles in 293T cells (Figure 2A). The same experiments were repeated in MCF-7 cells and similar results were obtained ( Figure 2B). When psiCHECK-2-CD44-3'UTR with 50 pmol hsa-mir-509-3p and its corresponding inhibitor were cotransfected into 293T and MCF-7 cells separately, there appeared no significant difference in luciferase activity between the two recombinants ( Figure 2C). These results suggest that, indeed, hsa-mir-509-3p can binds and negatively regulate the transcription of CD44 in the presence of rs13347 C allele.
Effects of CD44 rs13347C>T variation on CD44 protein levels As shown in Figure 3 and Additional file 6, To confirm the results of Western blotting, we further performed the IHC study in 31 breast cancer tissues to verify association between expression level of CD44 protein and rs13347C >T in vivo ( Figure 3C and Additional file 7, Table S4). CD44 protein expression levels in breast cancer tissues of 15 patients carrying the CC genotype were significantly lower than that in 12 patients carrying the CT or 4 patients carrying TT genotype (Kruskal Wallis Test: P = 0.003).

CD44 rs13347C>T variation and five-year survival of breast cancer patients
The demographic and clinical characteristics of breast cancer patients in the survival discovery and validation sets are summarized in Additional file 2, Table S2. In the discovery set, the mean age was 48 years, among them, 63 (11.1%) patients died of breast cancer, 269 (47.5%) were ER negative, 242 (42.8%) were PR negative. In the validation set with the same mean age 48, 62 (18.7%) patients died of breast cancer, 139 (42.0%) were ER negative, 133 (40.2%) were PR negative. The fiveyear survival rates in the two sets were 88.9% and 81.3%, respectively. The Kaplan-Meier analysis, Log-rank test and univariate Cox analysis revealed that breast cancer patients that are ER or PR positive have a significantly decreased death risk (P = 0.0017 and P = 0.002, respectively). There were no significant effects of other characteristics.
Multivariate proportional hazards regression models and the Log-rank test revealed that, when compared with the rs13347 CC genotype, the rs13347 CT+TT genotypes were associated with poor survival (adjusted HR = 1.849 and P = 0.0233) and a lower survival probability (Log-rank P = 0.0211) ( Table 2).
The rs13347C > T polymorphism was further tested in the validation set. In this dataset, when compared with the rs13347CC genotype, the CT and TT genotypes were associated with poor survival (adjusted HR = 2.104, 3.144 and P = 0.0081, 0.015, respectively) and rs13347 CT+TT genotypes had a 2.34-fold increased death risk (P = 0.0010). Also, in the pooled analysis of the two cohorts we found that the rs13347 CT or rs13347 TT genotype had a 1.54-fold or 2.84-fold increased death risk (P = 0.00378 and P < 0.001) and the HR is 1.873 (P = 0.0007) for the CT+TT carriers ( Table 2). As is also shown in Figure 4A, B, CT or TT carriers have lower survival probability in discovery set, validation set and pooled analysis. The contribution of interaction between rs13347 variation and ER status to a five-year survival rate of breast cancer patients was further investigated and it was found that ER negative T carriers yield the lowest survival probability ( Figure 4C). However, no significant contribution was found in the other four polymorphisms.

Discussion
Associations between breast cancer susceptibility and CD44 polymorphisms have not been detected in any population using case-control studies. In this molecular epidemiological study we sought to identify genetic factors that confer individual susceptibility to breast cancer. Our results obtained by analyzing 1,853 breast cancer patients and 1,992 controls from two study centers showed that the functional variation rs13347 T in the CD44 was associated with increased risk for developing breast cancer and yields lower five-year survival probability. However, there exists no significant difference in the susceptibility and prognosis affect to breast cancer between different genotypes of the other four polymorphisms.
CD44 is a ubiquitously expressed family of cell adhesion glycoproteins comprising an N-terminal extracellular domain, a membrane proximal region, a transmembrane domain and a cytoplasmic tail. The family is coded by the human CD44 gene, which is mapped to chromosomal locus 11p13 and is composed of two groups of exons [27]. Exons 1 to 5 and 16 to 20 are spliced together to form a transcript encoding the ubiquitously expressed standard isoform (CD44s). The variable exons 6 to 5 (known as v1 to 10) can be alternatively spliced and inserted to the standard form between exons 5 and 16 [28]. The multiple functions of the CD44 family are generated by their binding of HA (hyaluronic acid) and some other extracellular molecules [28]. CD44 regulates breast cancer through several mechanisms. Interaction of hyaluronan and CD44 can promote breast cancer cell adhesion and inhibited invasion [29]. Besides, binding of hyaluronan to CD44v3 can stimulate breast cancer cell growth, survival and invasion through the Rho and PI3K-AKT signaling pathways [30]. Moreover, the migration of metastatic breast cancer cells can be increased by the interaction of CD44v3, 8 to 10 with ankyrin promoted by Rho kinase [31]. Based on the above, it is reasonable to predict that changes in the expression or function of CD44 will play a pivotal role in the development and progression of breast cancer. Krech R. et al. reported a significant increase in the CD44 expression in breast cancer compared to normal breast epithelium [18]. These findings correspond with our results that CD44 rs13347 T carriers possess higher protein levels and, therefore, they are more susceptible to breast cancer and have poorer prognosis.
Much interest has been generated by the recent discovery that CD44 is a surface marker of BCICs [9]. Lin et al. found that CD44 pos CD24 neg and CD44 pos CD24 pos cell populations in estrogen receptor (ER) α-negative breast tumors are tumorigenic in murine xenograft models, which indicate CD44 as a hallmark of BCIC in ER-negative breast cancer [32]. Similarly, in a study examining the expression profile of cancer stem cell markers in eight human breast cancer cell lines, Lee et al. found that CD44 was expressed mostly in basal-like cell lines, including MDA-MB-468, MDA-MB-231 and HCC1937, which were all ER negative [33]. Recently, substantial progress has been made in the identification of BCICs and there is accumulating evidence that these cells might be targets for transformation during mammary carcinogenesis [9]. Since CD44 contributes much to BCICs' maintenance and activity as its surface marker and BCICs play an important role in breast cancer tumorigenesis, it is inferable that the possible quantitative change of CD44 caused by rs13347 C/T mutation will affect breast cancer development, especially in ER-negative patients. In addition, the expression of ER also has important prognostic implications; that is, ER-positive tumors have a better prognosis in terms of overall survival, while ER-negative tumors have a more aggressive phenotype and poorer survival probability [34][35][36]. Although the exact mechanism is still unclear, there will be no doubt that some risk factor will do more for breast cancer generation, development and prognosis in ERnegative patients. These previous study results and inferences are consistent with our findings that the parlous role of rs13347 CT+TT is more pronounced in ER-negative patients and ER negative rs13347 T allele-carrying patients yield the minimum survival probability.
Although we have found that CD44 rs13347 variant genotypes (CT+TT) were associated with increased risk for breast cancer, our study may have certain limitations caused by the study design. For example, selection bias and/or systematic error may occur because the cases were from the hospital and the controls were from the community. Selection bias is a particular problem inherent in case-control studies, where it gives rise to noncomparability between cases and controls. In case-control studies, controls should be drawn from the same population as the cases, so they are representative of the population which produced the cases. In our present study, cases and controls in each center were collected from the same place during the same time and the breast cancer patient samples in our study were sporadic cancer patients, reducing the probability of selection bias from the maximum extent. Moreover, the fact that we have achieved a more than 95% study power (twosided test, α = 0.05) to detect an OR of 1.72 for the rs13347 CT+TT genotypes, which occurred at a frequency of 42.5% in the controls, compared with the rs13347 CC genotype, suggesting that this finding is noteworthy.

Conclusions
Our study indicated that compared with the CD44 rs13347 CC genotype, the variant genotypes (CT+TT) can elevate the risk of breast cancer and predicts poorer fiveyear survival rate in both Southern and Eastern Chinese populations. Moreover, the phenomenon is more obvious in ER-negative breast cancer patients. To our best knowledge, our study first demonstrated a significant association between the CD44 rs13347 C/T polymorphism and risk of breast cancer. Moreover, larger, preferably populationbased case-control studies, as well as well-designed mechanistic studies, are warranted to validate our findings in Chinese populations or to investigate the association between this polymorphism with different tumors in different ethnicities.

Additional material
Additional file 1: Distributions of characteristics among breast cancer patients and controls in Chinese populations used for association study. Age, age at menarche, body mass index, family history, pathological type, stage, estrogen receptor status and Figure 4 Kaplan-Meier curves about survival probability in different rs13347C>T genotype carriers. (A) difference in survival probability between CC, CT and TT carriers (B) difference in survival probability between CC and CT+TT carriers (C) difference in survival probability between ER + CC, ER + CT+TT, ER -CC and ER -CT+TT carriers.
progesterone receptor status distributions among breast cancer patients and healthy controls from Suzhou and Guangzhou center.