Variants on the promoter region of PTEN affect breast cancer progression and patient survival

Introduction The PTEN gene, a regulator of the phosphatidylinositol-3-kinase (PI3K)/Akt oncogenic pathway, is mutated in various cancers and its expression has been associated with tumor progression in a dose-dependent fashion. We investigated the effect of germline variation in the promoter region of the PTEN gene on clinical characteristics and survival in breast cancer. Methods We screened the promoter region of the PTEN gene for germline variation in 330 familial breast cancer cases and further determined the genotypes of three detected PTEN promoter polymorphisms -903GA, -975GC, and -1026CA in a total of 2,412 breast cancer patients to evaluate the effects of the variants on tumor characteristics and disease outcome. We compared the gene expression profiles in breast cancers of 10 variant carriers and 10 matched non-carriers and performed further survival analyses based on the differentially expressed genes. Results All three promoter variants associated with worse prognosis. The Cox's regression hazard ratio for 10-year breast cancer specific survival in multivariate analysis was 2.01 (95% CI 1.17 to 3.46) P = 0.0119, and for 5-year breast cancer death or distant metastasis free survival 1.79 (95% CI 1.03 to 3.11) P = 0.0381 for the variant carriers, indicating PTEN promoter variants as an independent prognostic factor. The breast tumors from the promoter variant carriers exhibited a similar gene expression signature of 160 differentially expressed genes compared to matched non-carrier tumors. The signature further stratified patients into two groups with different recurrence free survival in independent breast cancer gene expression data sets. Conclusions Inherited variation in the PTEN promoter region affects the tumor progression and gene expression profile in breast cancer. Further studies are warranted to establish PTEN promoter variants as clinical markers for prognosis in breast cancer.


Introduction
Hereditary predisposition to breast cancer is caused by variation in multiple genes affecting the cancer risk with varying penetrance. Mutations in the main high penetrance genes BRCA1 and BRCA2 are mostly found in families with multiple breast cancer cases particularly with early onset and with ovarian cancer [1,2], and may also affect breast cancer survival among the mutation carriers [3,4]. Strong familial breast cancer predisposition is also present in rare cancer syndromes. Rare germline mutations in the TP53 gene cause Li-Fraumeni syndrome with highly increased risk for various malignancies, including breast cancer [5]; whereas a common TP53 variant in the population, R72P with functional effect on p53 protein, has been shown to affect breast cancer survival [6,7]. Another rare cancer syndrome with increased breast cancer risk is Cowden syndrome caused by germline mutations in the PTEN gene [8,9]. Patients with Cowden syndrome develop multiple hamartomatous, mostly benign neoplasms especially on the skin and mucous membrane, and also have a lifetime risk of 25 to 50% for breast cancer and an increased risk of developing epithelial thyroid and endometrial carcinomas [10]. PTEN mutations causing Cowden syndrome include a noticeable number of variants on the promoter region affecting transcriptional levels of the gene or causing abnormal translation of the protein [11,12]. The promoter of PTEN has been characterized in the 5' region of the gene between nucleotides -1344 and -747 from translation start site and it contains binding sites, for example, for p53 and Sp1 transcription factors [12][13][14]. So far, PTEN germline variation increasing susceptibility to cancer outside Cowden syndrome, or associating with tumor progression, has not been detected [15][16][17].
The PTEN (Phosphatase and tensin homolog) gene is a tumor suppressor gene located on chromosome 10q23 and is mutated in multiple cancers [18,19]. The PTEN protein, a dual specificity phosphatase with lipid and protein phosphatase activities, functions as a negative regulator of PI3K/Akt oncogenic pathway [20]. Alterations in this pathway are among the most common changes in human carcinogenesis [21]. In addition to the PI3K/Akt pathway regulation, when localized to the nucleus, PTEN takes part, for instance, in regulation of chromosomal integrity, acetylation of p53, DNA-damage response and the induction of apoptosis [22]. In breast tumors, PTEN expression is often lost through mutations or epigenetic mechanisms [23,24]. Reduced PTEN expression [24][25][26] and the dysregulated PI3K/Akt pathway [27,28] have been associated with aggressive breast cancer phenotype and poor outcome of the disease. Breast tumors originating by dysfunctional BRCA1 often suffer PTEN loss through gross mutations [29]. Furthermore, tumors with reduced PTEN protein expression have been shown to carry a particular gene expression signature that predicts worse outcome and metastasis in breast cancer as well as in prostate and bladder carcinomas [30]. Recently, a moderate decrease in PTEN expression to 80% of the normal level has been shown to increase susceptibility to develop cancer in mice, particularly in mammary tissue [31].
To investigate the role of potentially regulatory PTEN germline genetic variation on clinical characteristics and survival in breast cancer, we screened the promoter region of PTEN from 330 familial breast cancer cases. We genotyped the detected promoter variants in a large set of familial and unselected breast cancer patients to evaluate the effects of the variants on tumor phenotype and disease outcome. We also compared the gene expression profiles in breast cancer tumors of the variant carriers and non-carriers, with further survival analyses on the differentially expressed genes in breast cancer gene expression data sets.

Subjects
The promoter region of the PTEN gene was screened for germline variation in 330 patients from families with multiple cases of breast or ovarian cancer, found negative for BRCA1 and BRCA2 mutations by screening the coding regions of the genes using denaturing gradient gel electrophoresis (DGGE) and for the large exon 11 of BRCA1 and exons 10 and 11 of BRCA2 using protein truncation test (PTT).
Altogether 1,870 unselected breast cancer patients and 542 additional familial cases were included in the genotyping of the promoter variants. The unselected cohort consisted of two series of patients, collected in Helsinki University Central Hospital's Department of Oncology in 1997 to 1998 and 2000 (884 patients) [32,33], and on the Department of Surgery in 2001 to 2004 (986 patients) [34] covering 79% and 87% of all consecutive newly diagnosed breast cancer patients, respectively. The familial breast cancer cases were collected at Helsinki University Central Hospital's Department of Clinical Genetics. The familial series consisted of patients with strong familial background with three or more breast or ovarian cancers among first or second degree relatives, including the proband, and of familial cases with one first degree relative and the proband affected with breast cancer. The BRCA1 and BRCA2 mutations were excluded as described earlier [35].
The cancer diagnoses were confirmed through the Finnish Cancer Registry and hospital records. Information on death due to breast cancer was obtained from the Finnish Cancer Registry, which collects diagnostic as well as death information on all cancer patients in Finland, and the distant metastasis data were retrieved from hospital records collected with routine follow-up investigations of breast cancer patients for five years from diagnosis. The survival was calculated as the time from the diagnosis of the first invasive breast cancer to the date of death due to breast cancer or of diagnosis of a distant metastasis. This study was performed with informed consent from the patients as well as permission from the Ethics Committee of the Helsinki University Central Hospital and from the Ministry of Social Affairs and Health in Finland.
Information on tumor histology, grade, size, nodal status and distant metastases at diagnosis were collected from pathology reports [36]. Additionally, a breast cancer pathologist re-reviewed 1,423 tumors (56% of all) for tumor histology and grade, according to Scarff-Bloom-Richardson modified by Elston and Ellis [37]. Estrogen receptor (ER) and progesterone receptor (PR) status were retrieved from pathology reports [36]. For both ER and PR status, samples with > 10% of the cancer cells stained positive with immunohistochemistry were considered as positive. HER2 status was based on immunohistochemical staining (samples with < 10% of the cells stained were considered negative and > 90% positive) and gene amplification with chromogenic in situ hybridization (CISH; over six replications was considered positive and zero to five replications was considered negative) on tumor microarrays as described in [38]. p53 protein expression was evaluated by immunohistochemical staining as previously described [6]. Samples were defined as positive for p53 when more than 20% of the cancer cells were positive for the staining. Ki67 status was defined as described [39] with strong positive (3) expression considered when ≥ 30%, intermediate (2) when 20 to 29%, weak positive (1) when 5 to 19%, and negative (0) when < 5% of the cancer cells were stained with Ki67 antibody. Altogether 2,401 invasive breast tumors from 2,256 patients were taken into the analysis of the variants' association with tumor characteristics.
A total of 2,204 patients with invasive breast cancer and follow-up data available were included in the survival analysis. The median follow-up time in 10-year breast cancer specific analysis was 83 months, and for 5-year breast cancer death or distant metastasis free survival (BDDM) was 47 months. Of all patients in the survival analysis, 298 died from breast cancer within 10 years from diagnosis, and 352 developed distant metastasis or died of breast cancer within 5 years from diagnosis.

Mutation screening and genotyping
The complete promoter region between nucleotides -1344 and -747 [12] of the PTEN gene were screened for variation on genomic DNA isolated from blood samples of 330 familial breast cancer patients using conformation sensitive gel electrophoresis (CSGE) heteroduplex analysis. All variants were verified by bidirectional sequencing.
CSGE was further used to determine the genotypes of the variants -903GA and -975GC, which were located in the same CSGE amplicon (covering nucleotides from -617 to -1087) in the complete data set. All samples showing different banding patterns on gel were verified by sequencing. The -1026CA variant was genotyped using Sequenom i-PLEX with service provided by the Finnish Genome Center. Of the 2,412 patients genotyped, a successful result was obtained from 2,375 (98.5%) individuals for -1026CA and from 2,369 (98.2%) individuals for -903GA and -975GC variants.

Gene expression analysis
Total RNA was extracted from primary breast tumors of 183 patients, including 10 cases carrying a promoter variant. Of them, 151 were collected as a part of the unselected series and 32 patients belonged to additional familial breast cancer cases. The samples were processed and hybridized to Illumina HumanHT-12 v3 Expression BeadChips (Illumina Inc, San Diego, CA, USA) containing 24,660 Entrez Gene entities, according to the manufacturer's recommendations (http://www.illumina.com).
Microarray raw data were imported into R v2.11 (http://cran.r-project.org) and processed by the methods included in the BioConductor facilities [40]. Briefly, after quality control [41], the data were normalized using the quantile method [42] and the gene expression matrix from the tumors of 10 PTEN promoter variant carriers and 10 wild type controls matched with tumor histology, estrogen and progesterone receptor status, HER2 overexpression/amplification, grade, tumor size, p53 status and Ki67 expression was obtained by averaging the probes mapping to the same Entrez Gene IDs (http:// www.ncbi.nlm.nih.gov/Entrez). Differential expression between the tumors of promoter variant carriers and non-carriers was assayed by moderated t-test. Genes with P < 0.01 were considered to be significant [43] and further analyzed. Functional annotation was performed on the differentially expressed gene list using the DAVID functional annotation tools (http://david.abcc. ncifcrf.gov/) [44]. The categories with Fisher's exact test P-value < 0.05 were considered to be significantly enriched.
The 160 differentially expressed genes were further used to cluster the patients of the whole data set and three publicly available breast cancer gene expression series into two groups by an unsupervised clustering method. The K-means algorithm was iterated 100,000 times to ensure maximum reliability and the results were stabilized by imposing a pre-defined random number selection algorithm at the beginning of the process.

Statistical analyses
Statistical analyses were performed using SPSS 15.0 software (SPSS, Inc., Chicago, IL, USA). Pearson's χ 2 test was used to evaluate the association of different genotypes on tumor characteristics. Fischer's exact test was applied when the expected number of cell count was less than five. All P-values were two-sided. The effects of the variants on the prognosis of the patients were analyzed using Kaplan-Maier survival plots with Log-Rank test. The survival analyses were performed on long-term breast cancer specific 10-year follow-up and on short term combined analysis on breast cancer death or distant metastasis (BDDM) with 5-year follow-up. The survival hazard ratios were calculated using univariate Cox's regression analysis. Independence of the variants in relation to common prognostic factors (tumor size, nodal status, primary metastasis, estrogen receptor, progesterone receptor, Her2, p53, Ki67, grade) was evaluated by constructing Cox's multivariate models with SPSS 15.0 backward conditional algorithm. All variables were set as categorical. For increased statistical power, the three promoter variants were combined for the multivariate analysis.

Variant discovery
Two previously known polymorphic variants (-1026CA/ rs34149102 and -903GA/rs1044322) and one novel variant (-975GC) were observed in the screening of the 330 familial breast cancer patients. The promoter variant numbering as nucleotides from translation start site is matched with the numbering used by Zhou et al. [12]. In the complete data set of 1,870 unselected and 542 additional familial breast cancer cases the -1026CA genotype was present in 2%, -975GC in 1%, and -903GA in 3% of the patients. Two patients were also found with both -1026CA and -903GA, and one patient with -903GA and -975GC variants.
PTEN promoter variants associate with markers of aggressive disease and survival of breast cancer patients To evaluate the possible associations of the three recurrent PTEN promoter variants detected in the patient samples (-903GA, -975GC, and -1026CA) with the tumor phenotype, we calculated the correlations between the PTEN promoter variant status and tumor characteristics (tumor size, lymph node and distant metastasis at diagnosis, tumor histology, grade, estrogen and progesterone receptor status, HER2 over-expression, p53 status, and Ki67 proliferation marker expression) (Additional file 1). The -975GC variant was found to associate with a significantly higher frequency with distant metastasis at diagnosis (OR 4.99, 95% CI 1.69 to 14.78; P (Fischer) = 0.013) and the -1026CA variant with high expression of Ki67 (OR 2.21; 95% CI 1.15 to 4.28; P = 0.015).
The effects of the variants on disease outcome were evaluated for 10-year breast cancer specific survival and 5-year BDDM free survival by Kaplan-Meier estimates with LogRank test and Cox's regression analyses. All three promoter variants were found to be significantly associated with worse long term survival and variants -903GA and -975GC also with short term breast cancer death or distant metastasis free survival (Figure 1). Patients with -903GA, -975GC or -1026CA variant had 10-year breast cancer specific survival rates of 71% (P = 0.016), 57% (P = 0.002), and 65% (P = 0.014) respectively, compared to survival rate of 83% of non-carriers ( Figure 1A). In the five-year analysis, the cumulative survival for breast cancer specific death or distant metastasis was 64% for -903GA carriers (P = 0.002), 64% for -975GC carriers (P = 0.010), and 77% for -1026CA carriers (P = 0.279) compared to 83% for the non-carriers, with the difference for the first two variants being statistically significant ( Figure 1B). The univariate Cox's regression analyses concurred with the results of the Kaplan-Meier analysis ( Table 1), showing that harboring any of the three variants affected the 10year breast cancer specific survival with a hazard ratio of 2.17 (P = 0.00002) and the 5-year BDDM with a hazard ratio of 1.97 (P = 0.00011). The multivariate Cox's regression models adjusted with conventional prognostic markers demonstrate that carrying any of the three promoter variants is an independent prognostic factor, with approximately two-fold increased risk of death or distant metastasis for the carriers ( Table 2).

Patients with PTEN promoter variants develop breast tumors with differential gene expression signature
We used microarray expression analysis to compare gene expression patterns in breast tumors from 10 patients with -903GA, -975GC, or -1026CA PTEN promoter variant and 10 matched non-carriers. A total of 160 genes were found to be differentially expressed (Additional file 2) with P < 0.01. No genes were significant after post hoc correction, hence we used a nominal P-value of 0.01 as the threshold of significance. Of the 160 genes, 104 genes were over-expressed and 56 under-expressed in the tumors of variant carriers. The differential expression between the subjects of two groups spanned between 1.81 and -1.85 base 2 log fold changes (3.5/-3.6 in natural scale). The samples successfully clustered using the expression profiles of the 160 differentially expressed genes as visualized in Figure 2, suggesting that this gene-set can be considered a signature for PTEN promoter variant carriers. The gene expression profiles of the tumors of different variant carriers co-segregated within the same branches of the hierarchical cluster. There was no segregation when the whole expression matrix (24,660 genes) was used in hierarchical clustering analysis. Differences in the expression levels of the PTEN gene could not be detected in microarray analysis between promoter variant carriers and non-carriers, possibly due to the limitations of the technology to detected small differences. The differentially expressed genes were further annotated for their biological themes using DAVID functional annotation tools (http://david.abcc.ncifcrf.gov/) [44]. The genes up-regulated in the tumors of PTEN promoter variant carriers represented several biological families related to ATP-binding, protein phosphorylation and protein kinases. Likewise, functional groups of DNA-binding and transcription factor proteins were over-represented among the down-regulated genes (Additional file 3). We compared the 160 differentially expressed genes with the gene expression signature of PTEN deficient tumors defined previously by Saal et al. [30]. The entities in the Saal signature list could be mapped to 151 unique Entrez Gene IDs. Of these, only one (TUBB2C, EG. 10383) was also found among the differentially expressed genes reported here. On the functional level, however, the Saal genes represented, among others, biological themes overlapping with those found here, such as phosphoprotein and ATP-binding from the up-regulated genes as well as the DNA-binding and transcription regulation from the down-regulated gene list.
The gene signature of the PTEN promoter variant carrier tumors stratifies patients into two groups with different recurrence free survival We investigated the effects of the 160 signature genes on breast cancer survival and recurrence in a larger data set of 183 breast tumors (Helsinki data set) and in three independent publicly available breast cancer gene expression data sets with survival information from Sweden (GSE1456 [45] and GSE4922 [46]) and from the Netherlands (GSE2034 [47]). For all the data sets analyzed, the expression patterns for the signature genes were retrieved from the larger expression matrix and unsupervised segregative clustering (k-means) was used to assign the samples to two groups, which were then compared by log-rank test. The survival curves were visualized in Kaplan-Meier plots. The two groups of patients defined according to the expression of the 160 genes had distinct BDDM survival in the Helsinki data set ( Figure 3A) (P = 2.699 × 10 -6 ). The survival effect was further confirmed in two independent data sets ( Figure 3B, C), and while the trend of the effect could be seen also in the third data set, the difference did not reach a statistically significant level ( Figure 3D). We further applied the same approach to see if the gene expression signature of somatic loss of PTEN expression defined by Saal et al. [30] would have similar survival effects. The Saal signature also divided patients into two survival groups in all data sets with LogRank P-values in the Helsinki set P = 1.06 × 10 -6 , in the Stockholm set P = 0.003, in the Rotterdam set P = 0.001, and in the Uppsala set P = 0.002. In these data sets, 93%, 74%, 69% and 76% of the tumors, respectively, clustered in corresponding survival groups as those in the analysis by our PTEN promoter variant signature.

Discussion
Here we have investigated germline genetic variation in the promoter region of the PTEN gene for breast cancer patient survival, with further gene expression analysis of breast tumors from the variant carriers. The -903GA, -975GC and -1026CA promoter variants were found to predict poor survival of breast cancer patients. In multivariate analysis adjusted for conventional prognostic factors, carrying any of the three promoter variants was an independent predictor of poor prognosis and approximately doubled the risk of the patients for distant metastasis or death within the follow-up time. One of the variants, -975GC, was also found to associate with having distant metastasis already at the time of diagnosis, and the -1026CA was found to be associated with increased proliferation of the tumor cells. These results suggest that the PTEN promoter variation has an effect on increased metastatic potential and progression of the tumor.
All the variants detected lie on the promoter region on nucleotides evolutionarily conserved in higher mammals (Additional file 4) and may affect the PTEN expression, although other functional mechanisms of the variants cannot be excluded at this phase. The PTEN gene expression levels in the variant carrier tumors were comparable to those in the non-carriers, possibly due to the limitations of the technology to detect small differences. This is consistent with the recent findings showing that even a subtle decrease in PTEN dose increased cancer susceptibility in mice, affected the cellular proliferation particularly in mouse mammary tissue, and altered the expression of proliferation related genes [31]. Indeed, the -1026CA variant associated significantly with proliferation rate of the tumors. Moreover, considering the role of PTEN as a dosage dependent tumor suppressor [48], even subtle changes in the expression of the PTEN gene might play a role in the very early stages of tumorigenesis. However, further studies will be needed to determine the functional effects of the variants.
The gene expression analysis showed 160 genes being differentially expressed in the tumors of the variant carriers compared to non-carrier tumors, with similar and indistinguishable patterns in the tumors of the patients with -903GA, -975GC or -1026CA variant. The up-regulated genes annotated in large proportion to entities related to ATP or nucleoside binding and phosphorylation while DNA binding and transcriptional functions were common among down-regulated genes (Supplementary Table S4). The most strongly down-regulated gene in the signature was interestingly BAMBI (BMP and Activin Membrane-Bound Inhibitor) (Supplementary Table S3). High expression of BAMBI has been shown to predict metastatic potential in colorectal cancer [49] and it is epigenetically silenced in high grade bladder carcinomas [50]. However, the signature represents a complex network of downstream effects of the PTEN promoter variants on tumor progression, with an overall effect on tumor phenotype and patient survival.
In the comparison of the differentially expressed genes found in this study and the somatic PTEN expression signature (by PTEN protein loss) in the study by Saal et al. [30] little overlap was seen at the single gene level. This was not surprising considering the profound differences in the microarray technologies used in the two studies, such as probe lengths and densities, as well as the laboratory procedures and analysis protocols, which together can make the comparison of the expression levels of individual genes in different studies challenging [51]. Furthermore, the changes in the expression of the PTEN gene caused by the inherited germ line variation could predispose the carriers already in early tumorigenesis to specific tumor progression pathways, whereas the Saal signature represents effects of somatic loss of PTEN expression at some later stage of tumor progression. However, the Saal gene list and the signature described here were overall functionally overlapping, representing similar gene functions. The survival analysis of the 160-gene signature in the Helsinki expression data set of 183 breast tumors revealed a correlation with the disease recurrence. This effect was also confirmed in independent, publicly available breast cancer gene expression data sets. The goal of this analysis was to evaluate the potential of the signature genes in identifying groups of patients with different survival rates. In this analysis, some of the signature genes can be either up-or down-regulated in different patients. Hence, rather than evaluating the differential expression as such, the overall combinatorial effect of the signature genes needs to be evaluated. To achieve this, we applied the approach successfully used by Lukes et al. [52], in which the patients are divided into two groups by clustering analysis based on the overall expression of the signature genes. The survival differences between the two groups were then evaluated. Our results show that the expression signature of 160 genes associates with breast cancer prognosis in independent breast tumor gene expression data sets. These results emphasize the biological importance of the signature genes and their impact on breast cancer progression and show that the signature has a similar effect on breast cancer recurrence as the promoter variants. Indeed, the metastasis susceptibility effect by the germline variants is likely mediated by their downstream effects on the signature genes. Furthermore, when the gene signature of somatically aberrant PTEN expression by Saal et al. was used for the survival analysis, we found the clustering of the tumors into corresponding survival groups in each data set was highly correlated between the two signatures. This highlights the A B C D Figure 3 Survival differences by the expression of the signature genes in independent breast cancer data sets. Expression patterns of the 160 signature genes affect the five-year breast cancer recurrence in Helsinki breast cancer data set GSE24450 (A) as well as in other publicly available breast cancer gene expression data sets from Stockholm GSE1456 (B) and from Rotterdam GSE2034 (C). Similar trend, although not significant was present also in Uppsala GSE4922 data set (D).
biological similarity of the two signatures and overall supports the inherited promoter variant's effect on the PTEN mediated tumor suppression.

Conclusions
The PTEN gene and the AKT/PI3K pathway are among the most intensively studied targets in cancer research. So far, no associations of germline genetic variation in PTEN have been shown to exist with clinically relevant features outside hamartomatous polyposis syndromes. The genetic analyses together with gene expression analysis in this study suggest that inherited genetic variation in the PTEN promoter region affect the metastatic potential and tumor progression as well as gene expression profile in breast cancer, also with clinical implications for reduced survival of breast cancer patients. These findings also further strengthen the proposed role of germline variation on gene expression signatures and on metastatic potential of the tumors [52]. Further studies are warranted to establish the PTEN promoter variants as clinical markers for prognosis in breast cancer.

Additional material
Additional file 1: Table S1. Associations of the PTEN promoter variants on tumor characteristics.
Additional file 2: Table S2. A list of 160 differentially expressed genes in tumors of PTEN promoter variant carriers and matched non-carriers.
Additional file 3: Table S3. Functional annotations for up and down regulated genes in the tumors of PTEN promoter variant carriers and non carriers.
Additional file 4: Figure S1. Evolutional conservation of the PTEN promoter variants in higher mammals.