- Research article
- Open Access
Breast cancer research output, 1945-2008: a bibliometric and density-equalizing analysis
Breast Cancer Research volume 12, Article number: R108 (2010)
Breast cancer is the most common form of cancer among women, with an estimated 194,280 new cases diagnosed in the United States in 2009 alone. The primary aim of this work was to provide an in-depth evaluation of research yield in breast cancer from 1945 to 2008, using large-scale data analysis, the employment of bibliometric indicators of production and quality, and density-equalizing mapping.
Data were retrieved from the Web of Science (WOS) Science Citation Expanded database; this was searched using the Boolean operator, 'OR', with different terms related to breast cancer, including "breast cancer", "mammary ductal carcinoma" and "breast tumour". Data were then extracted from each file, transferred to Excel charts and visualised as diagrams. Mapping was performed as described by Groneberg-Kloft et al. in 2008.
A total of 180,126 breast cancer-associated items were produced over the study period; these had been cited 4,136,224 times. The United States returned the greatest level of output (n = 77,101), followed by the UK (n = 18,357) and Germany (n = 12,529). International cooperation peaked in 2008, with 3,127 entries produced as a result; relationships between the United States and other countries formed the basis for the 10 most common forms of bilateral cooperation. Publications from nations with high levels of international cooperation were associated with greater average citation rates. A total of 4,096 journals published at least one item on breast cancer, although the top 50 most prolific titles together accounted for over 43% (77,517/180,126) of the total output.
Breast cancer-associated research output continues to increase annually. In an era when bibliometric indicators are increasingly being employed in performance assessment, these findings should provide useful information for those tasked with improving that performance.
In 2009, an estimated 194,280 new cases of breast cancer were diagnosed in the United States; breast cancer was estimated to account for 27% of all new cancer cases and 15% of cancer-related mortality in women . Similarly, in Europe in 2008, the disease was reckoned to account for some 28% and 17% of new cancer cases and cancer-related mortality in women, respectively .
The last 50 years have seen an exponential increase in scientific yield generally, and particularly in oncology; a recent report demonstrated that in January of 2009 alone there were 11,215 new cancer-related papers and 1,220 review articles indexed in Pubmed . The importance of quantitative and qualitative assessment of scientific output has increased in tandem with this information explosion, and these assessments now play an integral role in decisions regarding grant funding and prioritisation of resources, as exemplified by the Research Assessment Exercise in the UK . Despite its aforementioned disease burden, relatively little effort has previously been made to understand the trends emanating from the breast cancer-associated literature. While there has been some concentration on the bibliometrics of cancer research generally [5, 6], just three publications have evaluated breast-related output specifically; Dalpe et al. focused on the identification of BRCA1 and BRCA2 in the 1990 s , while Donato et al. published an analysis of the Portuguese contribution , and Li and McCain focused specifically on the development of research themes in the radiological detection of breast cancer . The primary aim of this present work was thus to provide an in-depth evaluation of research yield in breast cancer from 1945 to 2008, using large-scale data analysis, the employment of bibliometric indicators of production and quality, and density-equalizing mapping.
Materials and methods
Data were retrieved from the Web of Science (WOS) Science Citation Expanded database (SCI-Expanded) produced by Thomson Reuters. In order to approximate the overall number of published items on breast cancer, the following search strategy was employed; TS = ((phyllodes tumo$r$) OR (Cystosarcoma Phyllo$des) OR (Malignant Cystosarcoma Phyllodes) OR (breast invasive ductal carcinoma) OR (infiltrating duct carcinoma$) OR (mammary ductal carcinoma$) OR (breast cancer) OR (breast neoplasm$) OR (breast tumo$r$) OR (human mammary neoplasm$) OR (human mammary carcinoma$)) where TS = Topic search, $ = any character. Because this work was designed to assess overall activity in relation to breast cancer, we did not refine our search to include some document types such as original articles or reviews, or to exclude others such as letters and editorials. The time span analysed was 1945 to 2008 inclusive. The search was performed in November 2009, and thus 2009 was excluded as database entries for this period would not have been complete at the time of the search.
Each item of information downloaded from the WOS was contained in a 'data block'. Each block was preceded by a tag which gave information about the content of the block (that is, AU = authors, TI = title, PY = publication year). Software developed at the Charite University in Berlin was then employed to parse the data. Each time it found a tag it read the associated data and saved it to an Access database; the information was then later transferred to an Excel database for analysis. Published items were analysed using the citation report method as described previously [10, 11]. The number of citations per year and the average number of citations per item were assessed, thereby indicating the average number of citing articles for all items in the set. This is the sum of the times cited divided by the number of results found.
Mapping was performed as described by Groneberg-Kloft et al. in 2008 . Those nations which had contributed output were resized according to one of a number of different variables under study; that is, the average number of citations per item from each country. As part of this resizing procedure, the area of each country was scaled relative to, for example, the total number of items it had published on breast cancer. Specific calculations were based on Gastner and Newman's algorithm , published in 2004. These calculations employ a diffusion equation in the Fourier domain borrowed from elementary physics, which allows variable resolution by tracking moving boundaries [13, 14].
Cooperation analysis was employed to determine bilateral and multilateral cooperation between countries on breast cancer research. A cooperation network between countries was computed by checking all combinations of those countries which registered international cooperation on at least 25 items over the study period. These data were then saved to a "matrix" or two-dimensional table, and the software then read this matrix and produced a density-equalising map which graphically represented this data. The threshold of 25 articles was set to improve readability.
Journals which had published on breast cancer were analysed relative to both the Journal Impact Factor (IF) and the recently developed Eigenfactor (EF). The former is based on two elements; the numerator, which is the number of citations in the current year to items published in the previous two years, and the denominator, which is the number of substantive articles and reviews published in the same two years . The EF is calculated based on a complex algorithm that takes into account not only the quantity of citations but also their "quality" by assigning weights to the source of the citations. The full details of the algorithm can be found online .
Total number of published items
The number of published items on breast cancer was employed as an index of research productivity. During the period 1945 to 2008 (1974 excluded, n = 352), a total of 180,126 items were produced on this topic, as catalogued in the WOS. The earliest studies catalogued were published in 1945 (n = 17), although it was 1990 before activity began to increase considerably, year on year (Figure 1); output more than doubled from 1990 (n = 1,436) to 1992 (n = 3,342). The greatest output for any year was that for 2008 (n = 17,413).
Total number of citations
The 180,126 indexed items have been cited 4,136,224 times since 1945. Figure 1 demonstrates the parallel increase in the number of citations in conjunction with the increase in published items. Articles published in 2001 were responsible for more citations than those published in any other year (n = 274,601). The average number of citations per item was greatest in 1957, however, when 40 items were responsible for 2,767 citations, returning an average of 69.01 citations per item published. There has been a downward trend in the average number of citations per item since the millennium.
Country of origin
A total of 155 different countries contributed to the literature on breast cancer over the study period. The United States was responsible for the greatest output, returning 77,101 items. Other high output countries included the United Kingdom (n = 18,357), Germany (n = 12,529), Italy (n = 10,828) and Japan (10,109) (Table 1). Density equalising mapping of this dataset demonstrates that a relatively small number of countries was responsible for the majority of the output (Figure 2). The Gambia had the highest average citation rate per item (67.67), followed by Kenya (40.69), and Costa Rica (39.53) (Table 1). When confined to those countries which had produced at least 30 items, however, those with the highest average citation per item were Iceland (56.62), Finland (35.48), Denmark (32.88) and Switzerland (31.85) (Figure 3).
Cooperation analysis was employed to assess bilateral and multilateral cooperation from 1973 to 2008; the first item in the dataset produced as a result of international cooperation was published in 1973. In total, 142 different countries had collaborated on at least one item published. International cooperation increased steadily through the study period, reaching a peak in 2008, with 3,127 entries produced as a result of cooperation. Bilateral cooperation was the most common form of cooperation (19,437 entries), followed by trilateral cooperation (n = 3,157) and quadrilateral cooperation (n = 836). Cooperation between the United States and Canada was the most common form of bilateral cooperation (n = 2,223), followed by that between the United States and the United Kingdom (n = 2,007) (Figure 4). Relationships between the United States and other countries formed the basis for the 10 most common forms of bilateral cooperation (Table 2).
A total of 4,096 journals had published at least one item on breast cancer. The journals which have published most prolifically on breast cancer, led by Cancer Research (5,290 items), are listed in Table 3. The top 50 most prolific titles, representing just 1.2% of all contributing journals, together accounted for over 43% (77517/180126) of the total output. Thirty of these top 50 titles were in the category 'Oncology' of the Journal Citation Report; other represented subject categories included 'Surgery' (n = 5), 'Pathology' (n = 4), 'Radiology, Nuclear Medicine and Medical Imaging' (n = 4). 'Biochemistry and Molecular Biology' (n = 3), and 'Medicine, General and Internal' (n = 3). The median impact factor (IF) and Eigenfactor (EF) of these titles was 4.73 and 0.05, respectively. Cancer Research also recorded the highest number of citations overall (n = 309,568), followed by the Journal of Clinical Oncology (n = 177,189), Cancer (n = 166,834), the Journal of the National Cancer Institute (JNCI) (n = 131,637), and the British Journal of Cancer (n = 110,307) (Table 3).
In his seminal work on the exponential growth of science, Little Science, Big Science, Price noted in 1963 that all of the scientific periodicals founded since the first, the Journal de Scavaus (first published in 1665), had together produced a world total of six million scientific papers over the course of the preceding 300 years . By contrast, Druss demonstrated that in just 23 years, from 1978 to 2001, a total of 8.1 million articles were published in Medline . The results of this present analysis have demonstrated this growth in breast cancer research specifically, with an average 15% increase in output annually since 1945, and a greater than 100% increase since the millennium alone. This compares with a recent analysis of total scientific output from PubMed, which estimated an average growth rate of 4% per year between 1957 and 2007 .
This analysis has employed the citation count as a proxy measure of research quality. Forming an essential component in the dialogue of medical research , citations are regarded as a key indicator of the relevance and importance of a published item. We have shown a parallel increase in citation count with the number of breast cancer-related articles, a not unexpected finding recently mirrored in analyses of scientific output on scoliosis  and asthma . The average number of citations per year was highest in 1957, although this was thanks largely to the citation classic by Bloom and Richardson in which they outlined their system for the histological grading of breast cancer and its association with prognosis ; it has since been cited 2,259 times. To put this figure into perspective, Garfield noted in 2006 that of 38 million items cited from 1900 to 2005, only 0.5% were cited more than 200 times . Although there has been a decreasing trend in the average number of citations per item since the mid-1990 s, it is difficult to draw firm conclusions on the relevance of this finding; it may be explained by the sharp increase in the number of outputs in the intervening years, or indeed by the time-lag associated with citation analysis which results in an inherent bias towards older publications.
This analysis has demonstrated the leading role which the United States plays in breast cancer research, a finding previously noted in other scientific disciplines [22, 23]. This is not surprising given the enormous amount of money spent on the management of breast cancer there annually; it has been estimated that new cases of breast cancer diagnosed globally in 2009 alone will have cost an estimated $28 billion; of this $28 billion, $16 billion was spent in the United States . In addition to being the single largest contributor to the literature on breast cancer, the United States has further played a key role in fostering international cooperation, in particular with its neighbour Canada, but also with many European nations, including Germany, the United Kingdom and Italy.
The large number of nations involved in breast cancer research reflects its global burden. That said, the map of global production shown in Figure 2 clearly demonstrates the dramatic underrepresentation of South America, Africa, and to a lesser extent, Asia. Given that the majority of the predicted 26% increase in the incidence of breast cancer by 2020 will occur in the developing world , there needs to be a concerted effort to further involve these areas in future research initiatives, particularly focusing on how the cost-effective diagnosis and management of breast cancer can be delivered with levels of efficacy similar to those presently seen in Europe and the United States.
The quality of breast-related output from both the United States and the United Kingdom was high as measured using the average citation rate per published item as a proxy measure for quality. In addition, the contribution of many smaller countries, including Iceland, Finland, Switzerland and Denmark, was of high quality, with all four associated with impressive average citation rates. Interestingly, all of these countries collaborated internationally in a high proportion of their output (Figure 4) (Iceland 110/216, 50.92%; Finland 1,045/2,334, 44.77%; Switzerland 1,741/2,989, 58.24%; Denmark 1,050/2,377, 44.17%), suggesting perhaps that this form of cooperation results in improved quality, and hence citation rate, of associated output.
Our finding that the breast cancer-associated research has been published across over 4,000 journals reiterates the view that it is now impossible for those working in breast cancer to ensure that they appraise all of the relevant literature. Our work has, however, identified a core set of journals publishing on breast cancer, with the top 50 accounting for 43% of the total output. The median IF and EF of these titles compares particularly well with the median values for all 143 journals in the JCR category oncology in 2008 (2.66, 0.01, respectively), and alludes to the quality of output in this subject area.
There are a number of limitations to this work. Output from 1974 (n = 352, 0.2% of total output) was accidentally excluded during data collection, and hence, was not included in the subsequent analysis. In addition, this study has focused on entries contained in the Web of Science only, and it should be noted that the employment of other databases including PubMed and Scopus may have yielded slightly different results. That said, Web of Science covers the oldest publications with archived records back to 1900 , and should provide an accurate overview of output over the entire study period. Finally, while we have provided an overview of geographic output on breast cancer, we have not related our findings to underlying socio-economic and demographic variables, and clearly this would be an interesting future avenue for investigation.
This work represents the first bibliometric assessment of research quantity and quality in breast cancer-associated literature. The results have demonstrated the ongoing expansion of that literature, while also identifying the key nations and journals involved in its production over the past half-century. In an era when bibliometric indicators are increasingly being employed in the assessment of individual, institutional and national performance, these findings should provide useful information for those tasked with improving that performance.
Journal Impact Factor
Web of Science.
Jemal A, Siegel R, Ward E, Hao Y, Xu J, Thun MJ: Cancer statistics, 2009. CA Cancer J Clin. 2009, 59: 225-249. 10.3322/caac.20006.
Ferlay J, Parkin DM, Steliarova-Foucher E: Estimates of cancer incidence and mortality in Europe in 2008. Eur J Cancer. 2010, 46: 765-781. 10.1016/j.ejca.2009.12.014.
Michon F, Tummers M: The dynamic interest in topics within the biomedical scientific community. PLoS One. 2009, 4: e6544-10.1371/journal.pone.0006544.
Hannaford P: Assessing the quality of primary care research in the United Kingdom: the 2008 Research Assessment Exercise. Ann Fam Med. 2009, 7: 277-278. 10.1370/afm.1009.
Ugolini D, Mela GS: Oncological research overview in the European Union. A 5-year survey. European Journal of Cancer. 2003, 39: 1888-1894. 10.1016/S0959-8049(03)00431-3.
Grossi F, Belvedere O, Rosso R: Geography of clinical cancer research publications from 1995 to 1999. European Journal of Cancer. 2003, 39: 106-111. 10.1016/S0959-8049(02)00239-3.
Dalpe R, Bouchard L, Houle AJ, Bedard L: Watching the race to find the breast cancer genes. Sci Technol Human Values. 2003, 28: 187-216. 10.1177/0162243902250904.
Donato HM, De Oliveira CF: [Breast pathology: evaluation of the Portuguese scientific activity based on bibliometric indicators]. Acta Med Port. 2006, 19: 225-234.
Li G, McCain KW: Visualizing research themes in radiological applications for breast cancer detection, diagnosis and treatment. AMIA Annu Symp Proc. 2008, 1023-
Borger JA, Neye N, Scutaru C, Kreiter C, Puk C, Fischer TC, Groneberg-Kloft B: Models of asthma: density-equalizing mapping and output benchmarking. J Occup Med Toxicol. 2008, 3: S7-10.1186/1745-6673-3-S1-S7.
Groneberg-Kloft B, Dinh QT, Scutaru C, Welte T, Fischer A, Chung KF, Quarcoo D: Cough as a symptom and a disease entity: scientometric analysis and density-equalizing calculations. J Investig Allergol Clin Immunol. 2009, 19: 266-275.
Groneberg-Kloft B, Scutaru C, Kreiter C, Kolzow S, Fischer A, Quarcoo D: Institutional operating figures in basic and applied sciences: scientometric analysis of quantitative output benchmarking. Health Res Policy Syst. 2008, 6: 6-10.1186/1478-4505-6-6.
Gastner MT, Newman ME: From the cover: Diffusion-based method for producing density-equalizing maps. Proc Natl Acad Sci USA. 2004, 101: 7499-7504. 10.1073/pnas.0400280101.
A perfect distortion? Cartograms deserve more attention. [http://www.geoplace.com/]
Garfield E: The history and meaning of the journal impact factor. JAMA. 2006, 295: 90-93. 10.1001/jama.295.1.90.
Price D: Little Science Big Science. 1965, Columbia University Press
Druss BG, Marcus SC: Growth and decentralization of the medical literature: implications for evidence-based medicine. Journal of the Medical Library Association. 2005, 93: 499-501.
Kulkarni AV, Busse JW, Shams I: Characteristics associated with citation rate of the medical literature. PLoS One. 2007, 2: e403-10.1371/journal.pone.0000403.
Vitzthum K, Mache S, Quarcoo D, Scutaru C, Groneberg DA, Schoffel N: Scoliosis: density-equalizing mapping and scientometric analysis. Scoliosis. 2009, 4: 15-10.1186/1748-7161-4-15.
Bloom HJG, Richardson WW: Histological grading and prognosis in breast cancer - a study of 1409 cases of which 359 have been followed for 15 years. British Journal of Cancer. 1957, 11: 359-
van Rossum M, Bosker BH, Pierik E, Verheyen C: Geographic origin of publications in surgical journals. British Journal of Surgery. 2007, 94: 244-247. 10.1002/bjs.5571.
Klar M, Foldi M, Denschlag D, Stickeler E, Gitsch G: Estimates of global research productivity in gynecologic oncology. Int J Gynecol Cancer. 2009, 19: 489-493. 10.1111/IGC.0b013e3181a40561.
Breast cancer in developing countries. Lancet. 2009, 374: 1567-10.1016/S0140-6736(09)61930-9.
Falagas ME, Pitsouni EI, Malietzis GA, Pappas G: Comparison of PubMed, Scopus, Web of Science, and Google Scholar: strengths and weaknesses. Faseb Journal. 2008, 22: 338-342. 10.1096/fj.07-9492LSF.
The authors declare that they have no competing interests.
All four authors have made substantial contributions to the conception and design, acquisition, analysis and interpretation of the data in this study. All have also been involved in drafting the manuscript or revising it critically for important intellectual content and all have given final approval of the version to be published.
About this article
Cite this article
Glynn, R.W., Scutaru, C., Kerin, M.J. et al. Breast cancer research output, 1945-2008: a bibliometric and density-equalizing analysis. Breast Cancer Res 12, R108 (2010). https://doi.org/10.1186/bcr2795
- Breast Cancer
- Citation Count
- Scientific Output
- Bibliometric Indicator
- Breast Cancer Research