Since the end of 2019, research on Covid-19 has been carried out at high speed and generates international cooperation in the form of co-authorship of scientific articles. China, the hotbed of the epidemic, is the main producer of research on the new coronavirus, but nearly 80 countries are involved as of 20 April 2020. This note reviews the geography of research on Covid-19 and its evolution between 23 March and 20 April 2020.
Foreword on Scientific Production
The rapid spread of Covid-19 and the need to find solutions to curb the epidemic and deal with its consequences explain why we are witnessing an accelerated mobilization of scientists to publish their results. These publications take the traditional form of articles in peer-reviewed journals and the more direct form (as they do not pass through the filter of peer-review) of preprints on open archives (in particular bioRxiv, medRxiv and ChemRxiv).
From the point of view of the dissemination of knowledge, the academic biomedical publishing sector, dominated by private corporations with impressive sales margins (see the revenues of Elsevier or Springer), hastened to provide open access to academic literature on the subject of coronavirus (example at Elsevier: https://www.elsevier.com/connect/coronavirus-initiatives). Exceptionally, the knowledge produced by researchers from all over the world is presented as a public good and its open access as indispensable for a global and accelerated circulation of knowledge.
Wishing that these initiatives do not remain exceptional and confined to the case of the crisis we are going through, Vincent Larivière, Fei Shu and Cassidy R. Sugimoto call for an opening “without delay” of academic literature (Larivière, Shu & Sugimoto, February 2020).
Of course, this race for a vaccine and publications is not without suspicion of fraud and scientific error, as evidenced by the number of retracted articles and preprints (see the Retraction Watch tracking site) and its share of scientific controversy. The Raoult case in the field of virology has received sufficient media (and political) coverage to bear witness to this (Science at the time of the Coronavirus, Gingras, 2020). Whether relying on the literature published in peer-reviewed journals or on publications available in open archives, one should therefore take a cautious look at the currently available data on Covid-19 related research.
Tracking progress on Covid-19 research
Precautions taken, the web application NETSCITY, set up for researchers, scientific information specialists and scientific journalists to process data from major bibliographic databases, provides an interesting and quick overview of the origin of the first publications on Covid-19.
This application, currently under development, is already available in beta version at: https://www.irit.fr/netscity. It has been jointly developed by three CNRS research units: UMR Géographie-cités (Paris), UMR LISST and UMR IRIT (Toulouse) with the support of the NETSCIENCE group of the SMS LABEX.
In the context of the crisis we are going through, this application can help answer the following questions:
- Where do the scientific articles indexed with “COVID-19”, “2019-nCoV” or “SARS-CoV-2” come from, since December 2019?
- Does this geography reflect the geography of the epidemic or are there specificities characteristic of the traditional geography of the field of virology, with a special effort of the areas where the historical laboratories of this field are located?
- What can be said about scientific cooperation in the form of co-authorship of publications? Despite the epidemic and the closing of borders, are we seeing the emergence of connections between researchers located in different cities, countries and continents?
A first analysis, carried out using Web of Science data on three dates in 2020, March 23rd, April 6thand April 20th, highlights the pre-eminence of publications from China and the progressive growth of publications from other areas, including the sub-Saharan area.
Here are the details of the data collected, followed by some graphical representations extracted from NETSCITY.
On March 23rd 2020, 197 publications were available in the WoS (SCI-EXPANDED, CPCI-S, ESCI), including 70 peer-reviewed articles, 65 editorials, 35 letters, 17 reviews, 9 news, and 1 correction. By way of comparison, for similar requests, Alexei Lutay found the following day: 386 publications in Scopus, 1262 in Semantic Scholar and 1766 in Dimensions. Given the coverage (the number of journals covered by the Wos remains more limited, these differences do not seem surprising) (Lutay, March 2020). From a thematic point of view, the main fields covered by these publications are general medicine, virology, infectious disease, immunology, microbiology, medical imaging and tropical medicine. The medical journal Lancet shows the highest number of publications to date (Table 1). Let us recall here that many weeks/months can elapse between paper acceptance, paper inclusion in an issue of a journal, and journal TOC indexing.
On April 6, 2020, the same query in the Web of Science returned 442 publications (twice as many as 15 days earlier). These included 146 articles, 137 editorials, 79 letters, 41 news, 34 reviews, and 5 corrections. The fields of pediatrics, biology and intensive care are more prominent. The field of tropical medicine is becoming more marginal. The main contributions remain in general medicine, infectious diseases and virology. At this time, the British Medical Journal (BMJ) surpassed Lancet in number of publications. The first three Web of Science journals publishing on the subject remain the BMJ, Lancet and Journal of Medical Virology (Table 2).
On April 20, 2020, there is 1095 publications (2.5 times more than 15 days earlier). These include 346 editorials, 334 articles, 180 news briefs, 127 letters, 92 reviews, and 16 corrections. Contributions in the field of public and environmental health, as well as in anesthesiology, immunology and oncology have increased. Following the same four first journals, we note the journal Cureus, which had only 3 publications at the previous date. Created in 2009, it has the originality of being open access and of practicing the principle of crowdsourcing in its evaluation process. This means that evaluation is open and the evaluators’ remarks are public. It is interesting to see this innovative journal, from the point of view of scientific publishing, at the forefront in this context of urgency (Table 3). This confirms, with the simultaneous development of open archive repositories, the attraction of new methods of knowledge dissemination and production.
Now let’s move to geography!
The geography of Covid-19 research
March 23, 2020
As of March 23, the publications that can be geographically referenced (177 out of 197) come from 39 different countries (Table 4).
The top 5 countries produced 69% of all publications on the subject. In descending order of production, these countries were China, the United States, the United Kingdom, South Korea and Switzerland. They were closely followed by Italy, Germany and France (Map 1).
Production comes from 159 separate urban areas. The top 55 urban areas contributed nearly 80% of production (Table 5).
Thanks to NETSCITY, the data is normalised so that when a publication comes from several different agglomerations, each one receives a fraction of the publication in proportion to the total number of participating agglomerations. To produce these statistics, the urban level considered is that of the agglomeration in the sense that we have grouped together the central city and its suburbs (see the methodology explained here). The main reporting urban areas are Wuhan, Beijing, Hong Kong, Guangzhou and Seoul. The primacy of the city of Wuhan and the fact that the top 5 cities are Asian suggests that the geography of research in this area is directly linked to that of the epidemic (Map 2). These agglomerations are followed by London, which, at this date, is not the European city most affected by the epidemic. It should therefore be seen as having a special place in the scientific fields concerned and as home for many scientific journals (to date, half of London’s publications are editorials).
Of the 177 publications, 96 were signed from at least two different agglomerations and 10 were affiliated to more than 6 agglomerations. This density of co-publications makes it possible to focus on networks of cooperation between places. At the country level, the main collaborative links are between China and the rest of the world: United States, Canada, Australia, Germany, United Kingdom, Belgium, France. Italian scientists have collaborated more specifically with the United States and Brazil (Graph 1).
At the interurban level, sub-national collaborations are predominant (in China: Wuhan-Beijing and Wuhan-Shanghai links; in France: Paris-Bordeaux link – cities of the first french Coronavirus patients; in Korea: Seoul-Taejon/Daejeon link). Then, there is renewed international cooperation between Philadelphia and Guangzhou; Bangkok and Singapore; Taipei and Wuhan; Rome and Rio; Atlanta and Riyadh; New Haven and Sydney; Copenhagen and Porto; Paris and Wuhan; and between Geneva and Shanghai (Graph 2).
April 6, 2020
As of 6 April, the publications that can be geographically referenced (381 out of 442) come from 57 different countries (Table 6).
The top 5 countries now account for 66% of all publications on the subject, indicating that production is less concentrated than 15 days earlier. The top three countries remain China, the United States, and the United Kingdom. On the other hand, South Korea and Switzerland are overtaken by Italy, the European country most affected by the epidemic, and Germany (Map 3).
Production comes from 262 separate urban areas (that’s one hundred more than 15 days earlier!). The top 54 urban areas contributed nearly 70% of the production, also indicating a deconcentration shift in production between cities (Table 7).
The main reporting urban areas are Wuhan, Beijing, Shanghai, Hong Kong, and Guangzhou. London and Singapore are ahead of Seoul, which was the fifth most publishing city 15 days earlier (Map 4). Tokyo’s normalized number of publications has increased from 1 to 5, propelling the Japanese metropolitan area among the 10 most publishing cities on the subject. A few urban spaces stand out in the southern hemisphere, whose dynamics will be interesting to follow in the coming weeks, especially Singapore, Melbourne and Sydney. Riyadh, Tehran and Beirut in the Middle East are also active, no doubt influenced by the importance of the epidemic in Iran.
Of the 381 publications, 187 were signed from at least two different agglomerations and 20 were signed from more than 6 agglomerations. At the country level, the main collaborative links remain between China and the rest of the world. The United Kingdom is developing cooperation with the United States and Singapore. India (Pune in particular) is connected to China and Thailand. Tanzania is integrated into the global scientific network through one co-publication with South Africa. Similarly, Lebanon is connected to the network through Iran (Graph 3).
At the inter-city level, sub-national collaborations remain important, especially between Chinese cities. In addition to those recorded 15 days earlier, there is privileged cooperation between Atlanta and Seattle in the United States, as well as between Sapporo, Naha and Tokyo in Japan. In addition, there is a very large number of new international cooperations. Links between Toronto and Xian, London and Singapore, Ann Arbor and Shanghai are proving important (Graph 4).
April 20, 2020
On April 20, 2020, out of the 1095 publications retrieved from the Web of Science, 886 contain addresses (professional affiliations) allowing the geographical location of their authors. The remaining 209 publications are mainly news briefs, editorials and letters.
The 886 publications retrievable as of 20 April come from 77 different countries (Table 8).
While publications from China doubled between 6 and 20 April, those from the United States and Italy tripled and those from Iran increased fivefold. The number of contributions from the United Kingdom, Switzerland and Germany also more than doubled over the period (Map 5). On the other hand, despite the importance of the epidemic in Spain, the participation of this country in scientific production related to the disease remains very low.
The top 5 producing countries are the same as 15 days earlier and again account for 66% of the total. In the top 10, Singapore is overtaken by Switzerland, South Korea, Iran (which joins the top 10) and Canada. Japan leaves the top 10 and finds itself in 12th place, behind Australia. The slower increase in production in Asian countries seems to confirm a shift in the centre of gravity of research.
Even more than a diffusion of the theme to new countries, we observe a multiplication of the number of urban areas involved. The number of urban areas involved has risen from 262 to 456. The first 54 urban areas now account for only 62% of the total output, indicating the rapid continuation of the spatial deconcentration movement previously identified (Table 9).
London confirms its position as the leading publishing city on the subject in Europe by joining the world’s top 5 instead of Guangzhou. Rome and Taipa (an island located opposite the Macao peninsula in China) saw their contributions triple and are now among the top 10 publishing cities, to the detriment of Chengdu and Tokyo. Tehran records an important jump from 34th to 11th place (Map 6).
Of the 886 publications, almost half (431) were signed from at least two agglomerations and almost 50 were affiliated to more than 6 agglomerations. At the country level, the main collaborative links remain between China and the rest of the world. The United States’ network with the rest of the world is becoming significantly denser and the link between India and Thailand is growing stronger (Graph 5).
Sub-national collaborations remain important, particularly in China, where there are over 100 co-publications between Beijing and Wuhan. In the United Kingdom, London-Sheffield and London-Bristol collaborations are developing. Internationally, there is privileged cooperation between Sao Paulo and Gothenburg (24 co-publications) and the development of repeated exchanges between Tehran (Iran) and Suleimani (Iraq). The United States’ main link with China is via Wenzhou, while researchers in Melbourne find their main Chinese collaborators in the city of Shanghai (Graph 6).
Understanding this geography
It may come as a surprise that we are dealing with such a dense network of cooperation at a time when the question of research is only just emerging and when we are in a situation where opportunities for exchange are weakened by the closure of borders.
To better understand what we are observing, it would be useful to differentiate between the different types of publications considered and to conduct interviews with the researchers involved. For example, we can imagine that cooperation with China has proved essential both for the medical management of the crisis and for knowledge of the virus: chinese scientists having rapidly sequenced the genome, followed by those at the Pasteur Institute in Paris (Lemke, January 2020). The laboratories had to coordinate, share their results, schedule clinical trials and exchange biological specimens. This is the case with the Doherty Institute in Melbourne, which communicated at the end of January 2020 on the fact that it had succeeded in replicating the virus in the laboratory (University of Melbourne, January 2020).
In addition to the accelerated exchanges justified by the urgency of the crisis, we need to combine the pre-established exchanges between laboratories and researchers who are part of pre-existing scientific communities and who had already worked together before. One can think of the community of specialists in coronaviruses, which are a particular type of virus that Professor Bruno Canard (Aix-Marseille University) has been studying since the early 2000s (Sauvons l’Université, March 2020). Thus, within the ICTV (International Committee on Taxonomy of Viruses), there is the Coronaviridae Study Group with a majority of American, German and Dutch members.
The role of the historical laboratories in virology that are the Pasteur Institutes in Paris, Hanoi and Dakar as well as the Robert Koch Institute in Germany in monitoring the spread of the virus and in the search for vaccines is also worth mentioning. To learn more about the history of these two scientists and the institutes that took their names, see the book and documentary of the same name Pasteur and Koch: a duel of giants in the world of microbes. Finally, we note the coordinating role played by the STAG-IH (Strategic and Technical Advisory Group for Infectious Hazards), a committee of experts set up in 2005 at the time of the Ebola epidemic, which provides reports and advice for the World Health Organization.
This justifies that initiatives to make the scientific literature associated with the epidemic available should include literature that predates December 2019. The knowledge base needed to make progress in this field is not limited to publications published since the emergence of the new coronavirus.
For those who would be interested in delving further into these questions, we can distinguish several corpus made available to researchers in recent months:
- The COVID-19 open research database (CORD-19), a free resource of more than 44,000 scientific articles, made available by the Allen Institute for AI and its partners. A sub-part of this corpus has been geographically analysed and is available as an online preprint (Dousset & Mothe, 2020). In addition, the Neural Covidex project (University of Waterloo and NYU) provides automated means to explore this corpus.
- All publications with the keyword “coronavirus” from January 2000 to March 2020 available on the PubMed database (6560 documents). These publications are being searched to extract semantic relationships using Gargantext software (ISCPIF, 2020). For further analyses of this type, the first results of Chaomei Chen can be followed using CiteSpace software (Chen, 2020).
- The publication database specially set up by the World Health Organization on COVID-19, which as of April 12, 2020 includes: 5014 articles including 170 from the BMJ, 120 from the journal Nature and 112 from the Lancet (WHO, 2020).
- The open archives, including a database of 1555 preprints deposited on MedRxiv and BioRxiv relating solely to the new coronavirus (MedRxiv, 2020). For a review of the number of contributions related to the new coronavirus in open archives, see the analyses by Nicholas Fraser and Bianca Kramer (Fraser and Kramer, 2020). In the United-States, the Harvard library is fast-tracking the deposit of Covid-19 research into DASH. In France, the HAL open archive also offers facilitated access to publications related to the epidemic that have been deposited there (Magron, 2020).
- A review of the open access literature from several databases (Dimensions, Scopus etc.) by a team of scientists from Bandung Institute of Technology in Indonesia (Irawan et al., 2020). A complementary analysis also taking into account the rate of evolution of content on the Web of Science and Scopus is also available (Torres-Salinas, 2020).
- The covid-nma database fed by the Cochrane Institute, INSERM and APHP, which currently includes 275 clinical trials. It has been the subject of an initial analysis including mapping (Vuillemot et al., 2020).
- The list provided by the World Health Organization of current vaccine development programmes (Covid-19 candidate vaccines, 2020). This list has just been the subject of an analysis published in Nature reviews (Thanh Le et al., 2020). This work indicates that the majority of initiatives are currently being driven by private North American industries.
- The platform for data mining of scientific publications, research projects and patents on Coronaviruses and Covid-19 by the European Commission in cooperation with TIM Analytics (Knowledge for Policy, 2020).
- The Bibliovid scientific monitoring initiative set up by a collective from the Grenoble Hospital Center with the help of a lung specialist from the Marseille Hospital Center. This platform allows to browse the scientific literature on Covid classified according to 5 main types: prognostic, epidemiological, therapeutic, diagnostic and recommendations.
- Easy access to metadata from Crossref (Kemp, 2020).
- The leak on Reddit of nearly 5,000 documents made available by a hacker wishing to facilitate access to scientific literature that is usually paid for, to researchers in all countries, including sub-Saharan African countries (Freethink, 2020).
Finally, while this contribution has focused solely on biomedical research by restricting the query to the Web of Science’s Science and Technology databases and excluding the Human and Social Sciences indexes, this does not mean that coronavirus research is limited to the fields of medicine and biology. Indeed, the epidemic affects all parts of our society, both from the point of view of the response of public services and epidemiology and from the economic, social and environmental aspects. The contribution of the Human and Social Sciences is particularly important in this context, as evidenced by the number of specialists who have been summoned to the media in recent weeks to address the lockdown issue. Specific research coordination initiatives are currently initiated to facilitate exchanges between biomedical research and research in the human and social sciences, particularly in the field of epidemiology. In France, one can think of the actions of the CARE committee, as well as research pooling initiatives such as CovidFight.
This note and the results presented were obtained using the NETSCITY application. This web application applies the methodology developed as part of a research program on the geography of science that began in 2010. It allows the rapid processing of large volumes of bibliographic data, the geographic location of publications, the aggregation of data at the level of comparable urban areas, and the building of networks of places between cities and between countries on a global scale.
This web application, still under development (feedback is welcome), is available online at https://www.irit.fr/netscity.
The development team includes Laurent Jégou, geographer and geomatician at the UMR LISST in Toulouse, Guillaume Cabanac, computer scientist and scientometrist at the UMR IRIT in Toulouse and myself, geographer at the UMR Géographie-cités in Paris – Aubervilliers.
Two students from the IUT of computer science in Toulouse also contributed to the web development: Nikita Yakimovich and Nils Bourgon.
A scientific conference paper presented at the International Conference on Science and Technology Indicators in Rome in 2019 allows to situate the application in the context of science data processing applications and to explain how to use the web application. To refer to it:
Maisonobe, Marion, Laurent Jégou, Nikita Yakimovich, and Guillaume Cabanac. 2019. NETSCITY: A Geospatial Application to Analyse and Map World Scale Production and Collaboration Data between Cities’. in ISSI’19: 17th International Conference on Scientometrics and Informetrics. Rome.
Laurent Jégou and Guillaume Cabanac contributed to this note.