Breast cancer research output, 1945-2008: a bibliometric and density-equalizing analysis
© Glynn et al.; licensee BioMed Central Ltd. 2010
Received: 12 August 2010
Accepted: 22 December 2010
Published: 22 December 2010
Breast cancer is the most common form of cancer among women, with an estimated 194,280 new cases diagnosed in the United States in 2009 alone. The primary aim of this work was to provide an in-depth evaluation of research yield in breast cancer from 1945 to 2008, using large-scale data analysis, the employment of bibliometric indicators of production and quality, and density-equalizing mapping.
Data were retrieved from the Web of Science (WOS) Science Citation Expanded database; this was searched using the Boolean operator, 'OR', with different terms related to breast cancer, including "breast cancer", "mammary ductal carcinoma" and "breast tumour". Data were then extracted from each file, transferred to Excel charts and visualised as diagrams. Mapping was performed as described by Groneberg-Kloft et al. in 2008.
A total of 180,126 breast cancer-associated items were produced over the study period; these had been cited 4,136,224 times. The United States returned the greatest level of output (n = 77,101), followed by the UK (n = 18,357) and Germany (n = 12,529). International cooperation peaked in 2008, with 3,127 entries produced as a result; relationships between the United States and other countries formed the basis for the 10 most common forms of bilateral cooperation. Publications from nations with high levels of international cooperation were associated with greater average citation rates. A total of 4,096 journals published at least one item on breast cancer, although the top 50 most prolific titles together accounted for over 43% (77,517/180,126) of the total output.
Breast cancer-associated research output continues to increase annually. In an era when bibliometric indicators are increasingly being employed in performance assessment, these findings should provide useful information for those tasked with improving that performance.
In 2009, an estimated 194,280 new cases of breast cancer were diagnosed in the United States; breast cancer was estimated to account for 27% of all new cancer cases and 15% of cancer-related mortality in women . Similarly, in Europe in 2008, the disease was reckoned to account for some 28% and 17% of new cancer cases and cancer-related mortality in women, respectively .
The last 50 years have seen an exponential increase in scientific yield generally, and particularly in oncology; a recent report demonstrated that in January of 2009 alone there were 11,215 new cancer-related papers and 1,220 review articles indexed in Pubmed . The importance of quantitative and qualitative assessment of scientific output has increased in tandem with this information explosion, and these assessments now play an integral role in decisions regarding grant funding and prioritisation of resources, as exemplified by the Research Assessment Exercise in the UK . Despite its aforementioned disease burden, relatively little effort has previously been made to understand the trends emanating from the breast cancer-associated literature. While there has been some concentration on the bibliometrics of cancer research generally [5, 6], just three publications have evaluated breast-related output specifically; Dalpe et al. focused on the identification of BRCA1 and BRCA2 in the 1990 s , while Donato et al. published an analysis of the Portuguese contribution , and Li and McCain focused specifically on the development of research themes in the radiological detection of breast cancer . The primary aim of this present work was thus to provide an in-depth evaluation of research yield in breast cancer from 1945 to 2008, using large-scale data analysis, the employment of bibliometric indicators of production and quality, and density-equalizing mapping.
Materials and methods
Data were retrieved from the Web of Science (WOS) Science Citation Expanded database (SCI-Expanded) produced by Thomson Reuters. In order to approximate the overall number of published items on breast cancer, the following search strategy was employed; TS = ((phyllodes tumo$r$) OR (Cystosarcoma Phyllo$des) OR (Malignant Cystosarcoma Phyllodes) OR (breast invasive ductal carcinoma) OR (infiltrating duct carcinoma$) OR (mammary ductal carcinoma$) OR (breast cancer) OR (breast neoplasm$) OR (breast tumo$r$) OR (human mammary neoplasm$) OR (human mammary carcinoma$)) where TS = Topic search, $ = any character. Because this work was designed to assess overall activity in relation to breast cancer, we did not refine our search to include some document types such as original articles or reviews, or to exclude others such as letters and editorials. The time span analysed was 1945 to 2008 inclusive. The search was performed in November 2009, and thus 2009 was excluded as database entries for this period would not have been complete at the time of the search.
Each item of information downloaded from the WOS was contained in a 'data block'. Each block was preceded by a tag which gave information about the content of the block (that is, AU = authors, TI = title, PY = publication year). Software developed at the Charite University in Berlin was then employed to parse the data. Each time it found a tag it read the associated data and saved it to an Access database; the information was then later transferred to an Excel database for analysis. Published items were analysed using the citation report method as described previously [10, 11]. The number of citations per year and the average number of citations per item were assessed, thereby indicating the average number of citing articles for all items in the set. This is the sum of the times cited divided by the number of results found.
Mapping was performed as described by Groneberg-Kloft et al. in 2008 . Those nations which had contributed output were resized according to one of a number of different variables under study; that is, the average number of citations per item from each country. As part of this resizing procedure, the area of each country was scaled relative to, for example, the total number of items it had published on breast cancer. Specific calculations were based on Gastner and Newman's algorithm , published in 2004. These calculations employ a diffusion equation in the Fourier domain borrowed from elementary physics, which allows variable resolution by tracking moving boundaries [13, 14].
Cooperation analysis was employed to determine bilateral and multilateral cooperation between countries on breast cancer research. A cooperation network between countries was computed by checking all combinations of those countries which registered international cooperation on at least 25 items over the study period. These data were then saved to a "matrix" or two-dimensional table, and the software then read this matrix and produced a density-equalising map which graphically represented this data. The threshold of 25 articles was set to improve readability.
Journals which had published on breast cancer were analysed relative to both the Journal Impact Factor (IF) and the recently developed Eigenfactor (EF). The former is based on two elements; the numerator, which is the number of citations in the current year to items published in the previous two years, and the denominator, which is the number of substantive articles and reviews published in the same two years . The EF is calculated based on a complex algorithm that takes into account not only the quantity of citations but also their "quality" by assigning weights to the source of the citations. The full details of the algorithm can be found online .
Total number of published items
Total number of citations
The 180,126 indexed items have been cited 4,136,224 times since 1945. Figure 1 demonstrates the parallel increase in the number of citations in conjunction with the increase in published items. Articles published in 2001 were responsible for more citations than those published in any other year (n = 274,601). The average number of citations per item was greatest in 1957, however, when 40 items were responsible for 2,767 citations, returning an average of 69.01 citations per item published. There has been a downward trend in the average number of citations per item since the millennium.
Country of origin
Leading countries by output and average citations per item, 1945-2008
Top countries - output
Top countries - average citings per item
Top 25 collaborating relationships, breast cancer-related items, 1949-2008
Leading titles, breast cancer-related items, 1945 to 2008
Top 50 journals by output
Breast Cancer Res Tr
Brit J Cancer
J Clin Oncol
Biochem Biophys R Co
Eur J Cancer
Breast Cancer Res
Int J Cancer
J Nucl Med
Clin Cancer Res
J Surg Oncol
Cancer Chemoth Pharm
J Biol Chem
Am J Surg
Am J Pathol
Int J Radiat Oncol
Cancer Epidem Biom
Am J Roentgen
Int J Oncol
Am J Epidemiol
Top 15 journals - by citation count
Ann Surg Oncol
Eur J Cancer Suppl
J Clin Oncol
Brit J Cancer
N Engl J Med
Int J Cancer
J Steroid Biochem
J Biol Chem
Proc Amer Ass Cancer Re
Brit Med J
Clin Canc Res
Breast Cancer Res Tr
Brit J Surg
In his seminal work on the exponential growth of science, Little Science, Big Science, Price noted in 1963 that all of the scientific periodicals founded since the first, the Journal de Scavaus (first published in 1665), had together produced a world total of six million scientific papers over the course of the preceding 300 years . By contrast, Druss demonstrated that in just 23 years, from 1978 to 2001, a total of 8.1 million articles were published in Medline . The results of this present analysis have demonstrated this growth in breast cancer research specifically, with an average 15% increase in output annually since 1945, and a greater than 100% increase since the millennium alone. This compares with a recent analysis of total scientific output from PubMed, which estimated an average growth rate of 4% per year between 1957 and 2007 .
This analysis has employed the citation count as a proxy measure of research quality. Forming an essential component in the dialogue of medical research , citations are regarded as a key indicator of the relevance and importance of a published item. We have shown a parallel increase in citation count with the number of breast cancer-related articles, a not unexpected finding recently mirrored in analyses of scientific output on scoliosis  and asthma . The average number of citations per year was highest in 1957, although this was thanks largely to the citation classic by Bloom and Richardson in which they outlined their system for the histological grading of breast cancer and its association with prognosis ; it has since been cited 2,259 times. To put this figure into perspective, Garfield noted in 2006 that of 38 million items cited from 1900 to 2005, only 0.5% were cited more than 200 times . Although there has been a decreasing trend in the average number of citations per item since the mid-1990 s, it is difficult to draw firm conclusions on the relevance of this finding; it may be explained by the sharp increase in the number of outputs in the intervening years, or indeed by the time-lag associated with citation analysis which results in an inherent bias towards older publications.
This analysis has demonstrated the leading role which the United States plays in breast cancer research, a finding previously noted in other scientific disciplines [22, 23]. This is not surprising given the enormous amount of money spent on the management of breast cancer there annually; it has been estimated that new cases of breast cancer diagnosed globally in 2009 alone will have cost an estimated $28 billion; of this $28 billion, $16 billion was spent in the United States . In addition to being the single largest contributor to the literature on breast cancer, the United States has further played a key role in fostering international cooperation, in particular with its neighbour Canada, but also with many European nations, including Germany, the United Kingdom and Italy.
The large number of nations involved in breast cancer research reflects its global burden. That said, the map of global production shown in Figure 2 clearly demonstrates the dramatic underrepresentation of South America, Africa, and to a lesser extent, Asia. Given that the majority of the predicted 26% increase in the incidence of breast cancer by 2020 will occur in the developing world , there needs to be a concerted effort to further involve these areas in future research initiatives, particularly focusing on how the cost-effective diagnosis and management of breast cancer can be delivered with levels of efficacy similar to those presently seen in Europe and the United States.
The quality of breast-related output from both the United States and the United Kingdom was high as measured using the average citation rate per published item as a proxy measure for quality. In addition, the contribution of many smaller countries, including Iceland, Finland, Switzerland and Denmark, was of high quality, with all four associated with impressive average citation rates. Interestingly, all of these countries collaborated internationally in a high proportion of their output (Figure 4) (Iceland 110/216, 50.92%; Finland 1,045/2,334, 44.77%; Switzerland 1,741/2,989, 58.24%; Denmark 1,050/2,377, 44.17%), suggesting perhaps that this form of cooperation results in improved quality, and hence citation rate, of associated output.
Our finding that the breast cancer-associated research has been published across over 4,000 journals reiterates the view that it is now impossible for those working in breast cancer to ensure that they appraise all of the relevant literature. Our work has, however, identified a core set of journals publishing on breast cancer, with the top 50 accounting for 43% of the total output. The median IF and EF of these titles compares particularly well with the median values for all 143 journals in the JCR category oncology in 2008 (2.66, 0.01, respectively), and alludes to the quality of output in this subject area.
There are a number of limitations to this work. Output from 1974 (n = 352, 0.2% of total output) was accidentally excluded during data collection, and hence, was not included in the subsequent analysis. In addition, this study has focused on entries contained in the Web of Science only, and it should be noted that the employment of other databases including PubMed and Scopus may have yielded slightly different results. That said, Web of Science covers the oldest publications with archived records back to 1900 , and should provide an accurate overview of output over the entire study period. Finally, while we have provided an overview of geographic output on breast cancer, we have not related our findings to underlying socio-economic and demographic variables, and clearly this would be an interesting future avenue for investigation.
This work represents the first bibliometric assessment of research quantity and quality in breast cancer-associated literature. The results have demonstrated the ongoing expansion of that literature, while also identifying the key nations and journals involved in its production over the past half-century. In an era when bibliometric indicators are increasingly being employed in the assessment of individual, institutional and national performance, these findings should provide useful information for those tasked with improving that performance.
Journal Impact Factor
Web of Science.
- Jemal A, Siegel R, Ward E, Hao Y, Xu J, Thun MJ: Cancer statistics, 2009. CA Cancer J Clin. 2009, 59: 225-249. 10.3322/caac.20006.PubMedView ArticleGoogle Scholar
- Ferlay J, Parkin DM, Steliarova-Foucher E: Estimates of cancer incidence and mortality in Europe in 2008. Eur J Cancer. 2010, 46: 765-781. 10.1016/j.ejca.2009.12.014.PubMedView ArticleGoogle Scholar
- Michon F, Tummers M: The dynamic interest in topics within the biomedical scientific community. PLoS One. 2009, 4: e6544-10.1371/journal.pone.0006544.PubMedPubMed CentralView ArticleGoogle Scholar
- Hannaford P: Assessing the quality of primary care research in the United Kingdom: the 2008 Research Assessment Exercise. Ann Fam Med. 2009, 7: 277-278. 10.1370/afm.1009.PubMedPubMed CentralView ArticleGoogle Scholar
- Ugolini D, Mela GS: Oncological research overview in the European Union. A 5-year survey. European Journal of Cancer. 2003, 39: 1888-1894. 10.1016/S0959-8049(03)00431-3.PubMedView ArticleGoogle Scholar
- Grossi F, Belvedere O, Rosso R: Geography of clinical cancer research publications from 1995 to 1999. European Journal of Cancer. 2003, 39: 106-111. 10.1016/S0959-8049(02)00239-3.PubMedView ArticleGoogle Scholar
- Dalpe R, Bouchard L, Houle AJ, Bedard L: Watching the race to find the breast cancer genes. Sci Technol Human Values. 2003, 28: 187-216. 10.1177/0162243902250904.PubMedView ArticleGoogle Scholar
- Donato HM, De Oliveira CF: [Breast pathology: evaluation of the Portuguese scientific activity based on bibliometric indicators]. Acta Med Port. 2006, 19: 225-234.PubMedGoogle Scholar
- Li G, McCain KW: Visualizing research themes in radiological applications for breast cancer detection, diagnosis and treatment. AMIA Annu Symp Proc. 2008, 1023-Google Scholar
- Borger JA, Neye N, Scutaru C, Kreiter C, Puk C, Fischer TC, Groneberg-Kloft B: Models of asthma: density-equalizing mapping and output benchmarking. J Occup Med Toxicol. 2008, 3: S7-10.1186/1745-6673-3-S1-S7.PubMedPubMed CentralView ArticleGoogle Scholar
- Groneberg-Kloft B, Dinh QT, Scutaru C, Welte T, Fischer A, Chung KF, Quarcoo D: Cough as a symptom and a disease entity: scientometric analysis and density-equalizing calculations. J Investig Allergol Clin Immunol. 2009, 19: 266-275.PubMedGoogle Scholar
- Groneberg-Kloft B, Scutaru C, Kreiter C, Kolzow S, Fischer A, Quarcoo D: Institutional operating figures in basic and applied sciences: scientometric analysis of quantitative output benchmarking. Health Res Policy Syst. 2008, 6: 6-10.1186/1478-4505-6-6.PubMedPubMed CentralView ArticleGoogle Scholar
- Gastner MT, Newman ME: From the cover: Diffusion-based method for producing density-equalizing maps. Proc Natl Acad Sci USA. 2004, 101: 7499-7504. 10.1073/pnas.0400280101.PubMedPubMed CentralView ArticleGoogle Scholar
- A perfect distortion? Cartograms deserve more attention. [http://www.geoplace.com/]
- Garfield E: The history and meaning of the journal impact factor. JAMA. 2006, 295: 90-93. 10.1001/jama.295.1.90.PubMedView ArticleGoogle Scholar
- eigenFACTOR. [http://www.eigenfactor.org/]
- Price D: Little Science Big Science. 1965, Columbia University PressGoogle Scholar
- Druss BG, Marcus SC: Growth and decentralization of the medical literature: implications for evidence-based medicine. Journal of the Medical Library Association. 2005, 93: 499-501.PubMedPubMed CentralGoogle Scholar
- Kulkarni AV, Busse JW, Shams I: Characteristics associated with citation rate of the medical literature. PLoS One. 2007, 2: e403-10.1371/journal.pone.0000403.PubMedPubMed CentralView ArticleGoogle Scholar
- Vitzthum K, Mache S, Quarcoo D, Scutaru C, Groneberg DA, Schoffel N: Scoliosis: density-equalizing mapping and scientometric analysis. Scoliosis. 2009, 4: 15-10.1186/1748-7161-4-15.PubMedPubMed CentralView ArticleGoogle Scholar
- Bloom HJG, Richardson WW: Histological grading and prognosis in breast cancer - a study of 1409 cases of which 359 have been followed for 15 years. British Journal of Cancer. 1957, 11: 359-PubMedPubMed CentralView ArticleGoogle Scholar
- van Rossum M, Bosker BH, Pierik E, Verheyen C: Geographic origin of publications in surgical journals. British Journal of Surgery. 2007, 94: 244-247. 10.1002/bjs.5571.PubMedView ArticleGoogle Scholar
- Klar M, Foldi M, Denschlag D, Stickeler E, Gitsch G: Estimates of global research productivity in gynecologic oncology. Int J Gynecol Cancer. 2009, 19: 489-493. 10.1111/IGC.0b013e3181a40561.PubMedView ArticleGoogle Scholar
- Breast cancer in developing countries. Lancet. 2009, 374: 1567-10.1016/S0140-6736(09)61930-9.
- Falagas ME, Pitsouni EI, Malietzis GA, Pappas G: Comparison of PubMed, Scopus, Web of Science, and Google Scholar: strengths and weaknesses. Faseb Journal. 2008, 22: 338-342. 10.1096/fj.07-9492LSF.PubMedView ArticleGoogle Scholar
This article is published under license to BioMed Central Ltd. This is an open access article distributed under the terms of the Creative Commons Attribution License (<url>http://creativecommons.org/licenses/by/2.0</url>), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.