Setting Your Cites on Open

April 7, 2017 Guest Contributor Advocacy Announcement Community Data Debate Editorial policy News Open access Policy Publishing

The Initiative for Open Citations (I4OC) was launched on April 6th, 2017. Over the course of about 6 months, the initiative has made a large fraction of the citation data that link all scholarship freely available. Mark Patterson (eLife) and Catriona MacCallum (PLOS) were two of the people involved and below they describe how this initiative started and where it might lead.

Blog post by Mark Patterson (eLife) and Catriona MacCallum (PLOS)

It is enormously satisfying when a good idea captures the imagination and takes off and that’s precisely what happened with the Initiative for Open Citations (I4OC) over the past 6 months.

Citations are the way that researchers communicate how their work builds on and relates to the work of others and they can be used to trace how a discovery spreads and is used by researchers in different disciplines and countries. Creating a truly comprehensive map of scholarship, however, relies on having a curated machine-readable database of citation information, where the provenance of every citation is clear and reusable. With the launch of I4OC that map, and the potential for anyone to use it to explore the scholarly landscape, comes much closer.

The idea of an open database of citation information is not new. We were both present when David Shotton (one of the people who helped launch I4OC) gave a talk at the OASPA conference back in 2013. David set out his vision for a completely open citation corpus and was already doing everything he could to build and implement this. But what hampered David’s efforts (and some other related efforts) most was lack of access to the data.

The good news was that these data were already being made available by all the major scholarly publishers as part of the metadata they send to Crossref. The bad news was that the data were closed by default.

What was even more frustrating was that many publishers – even open-access ones – didn’t realise they could ask Crossref to make these data openly available for public consumption. At the same time, these publishers were also giving their data freely to services such as the Web of ScienceTM and Scopus (when they were selected for indexing). They did this because it’s so important for journals to be included in these services to improve the discoverability of their content. Access to these services, however, requires a hefty subscription – in part because the citation harvesting and necessary curation is very costly. David (and others) argued that there was much more to be gained, including by publishers themselves, if the publishers simply made the data freely available to everyone.

Moving quickly forward to the OASPA meeting last year, however, not much had changed. Quite a few Open Access Publishers, such as Hindawi, Faculty of 1000, Pensoft and Ubiquity Press had made their data open. And subscription publishers had also expressed support for the idea, with some pioneers like Rockefeller University Press and the Royal Society, making their reference data open. The default, however, was still to restrict access. At the OASPA meeting in 2016, we heard another passionate and compelling call for open citation data, this time from Dario Taraborelli of the Wikimedia Foundation.

As a result, Dario coordinated a small group of us to come together and make a concerted effort over the following months to see whether we could recruit some influential publishers to make their citation data open. We knew that many publishers were supportive, so it was just a question of making the case and pointing out how easy it would be to do this. Crossref had helpfully explained all this in a blog post last year as well.

The upshot is that it’s worked better than we could have possibly hoped. We approached several publishers and many of them made the decision to open up their reference data very quickly. Others are still in discussion, but our hope is that we have a critical mass of publishers doing this now, and so others will easily be able to follow suit. Six months ago, before the initiative started, only 1% of the citation data hosted by Crossref was publicly available. Today – with the pioneering publishers and the recent agreement of the American Geophysical Union, Association for Computing Machinery, BMJ, Cambridge University Press, Cold Spring Harbor Laboratory Press, EMBO Press, Royal Society of Chemistry, SAGE Publishing, Springer Nature, Taylor & Francis, and Wiley, it’s reached 40% (the full list is available on the I4OC website). That amounts to almost 14 million publications with open references.

Why are publishers doing this now? It’s partly because publishers are beneficiaries, given the potential to increase the discoverability and usage of their content, but also because it is so very easy (a simple email to Crossref does the trick and then citations can be set to open in a matter of a day or two). Another factor is that when the first couple of publishers indicated their intention to do this and gave us permission to tell others it increased everyone’s confidence. A deadline also provided momentum: we wanted to announce I4OC at a certain time and include as many publishers as possible in that announcement.

The depth of interest in these data is indicated by the stakeholders who have endorsed the initiative, including funders such as, the Bill & Melinda Gates Foundation, the Alfred P. Sloan Foundation and the Wellcome Trust, institutions such as the California Digital Library and the Max Planck Digital Library, as well as existing service providers such as Altmetric, Dryad, Figshare, ImpactStory and the Internet Archive. It’s important to note that datasets are also increasingly being cited – they too can become part of the open citation corpus.

These data, once openly available, are completely free to reuse. The impact cannot be overstated. Researchers, funders, librarians, publishers, policy makers or any interested person will be able to explore the data. In an impoverished world where citation data is most infamously used for deriving the impact factor of a journal, fully open data will allow the creation of new, transparent and reproducible methods with which to evaluate and study research. The data will also be subject to independent scrutiny and quality control and should foster competition for new tools and services – to everyone’s benefit.

The initial success of I4OC is down to the small band of collaborators who got it off the ground, the publishers who have rapidly decided to share their citation data openly, and the stakeholders who have backed the strength of the idea. Now let’s set our sights on getting closer to 100% open citation data, and creating a publicly available corpus as described on the I4OC homepage. Although it might take years, we are now well on the way to building and curating a well-structured, open database of literally millions of datapoints that anyone can query, mine, consume and explore.

This article is also posted on the eLife Blog

Featured Image: “Co-authorship network map of physicians publishing on hepatitis C” by Andy Lamb