Digital Metadata: CAUL Research Repository Days 2019

The 2018 CAUL Research Repository Days were held in Melbourne over 29-30 October. Although there was much discussion over many different topics, the program was very much focused on interoperability between systems which is a trend that I have observed the IR community heading towards. With a well running platform, repository work is less about the 'publication' and more about how systems interact with each other.

The below is a summary of some of the themes that were of particular interest to my institution and myself.

CAUL Projects

Review of Australian Repository Infrastructure Project

Much of Day 1 was in discussion of FAIR. Drafted in 2014 but published in 2016, the FAIR principles (of Findable, Accessible, Interoperable and Reusable) are a set of 14 metrics designed to determine the level of FAIRness of an output or system. In response to this, CAUL proposed a project in 2017 to determine how improvements to repository infrastructure can be made across the sector to increase the FAIRness of Australian-funded research outputs. The final report has just been released.

The project followed seven project working groups designed to examine the current repository infrastructure, international repository infrastructure developments, repository user stories, ideal state for Australian repository infrastructure, next generation repository tools, and make recommendations for the possible "Research Australia" collection of research outputs. The first six group findings are included in the report, while the seventh, the "Research Australia" recommendations, is due at the end of 2018.

Each working group provided a report of their findings. Most were not surprising and were generally what we have known to be the case for some time. In summary (and in no particular order), they include:

Although nine institutions had new generation repository software, many of the others had ageing infrastructure that perhaps had not been able to be funded since the ASHER funding in 2007, with VITAL particularly mentioned for dropping in number
Ageing software was identified as a weakness of repository infrastructure, as was the lack of automation and identifiers
Research outputs were the most common output in IRs, followed by theses and research data. Other output types included archival collections, journal, images and course materials
Institutions numbers were almost equal in terms of having an OA policy, a statement or partial policy, or no OA policy at all
Only 5 institutions supported research activity identifiers (although they didn’t specify RAiDs in particular)
13 institutions had a digital preservation strategy for the IR content, with a further 3 developing a strategy
Most successful initiatives have stable secure funding.
Recommendation that CAUL seek consortia membership of COAR
List of general repository requirements.

The seventh group is looking at the feasibility of a "Research Australia" portal as a single-entry point to a collection of all Australian Research outputs. This is similar to the RUN proposal some years ago. Views were extremely mixed regarding this. Responses included that it is duplicating what we already have with Google Scholar and TROVE, whether it would be OA or metadata only, the quality of metadata, and questions over unique institutional requirements. Three possibilities have been proposed - upgrade TROVE to provide all necessary reporting needs, develop a new portal harvesting repositories (similar to the OpenAIRE model), or developed a shared infrastructure.

Collecting and Reporting of Article Processing Charges (APCs)

Another CAUL project currently underway is the APC project determining the cost of article processing charges for institutions. Several options are proposed. Less preferred include creating a fund code in the finance system of the institution or querying the finance system using a selection of keywords. Another less preferred option is obtaining reports from publishers or making them provide this information as part of the subscription agreement. What is likely to be proposed is a very manual method of extracting a dataset from Web of Science, Scopus and Dimensions, either by institution or nationally, run it against the unPaywall API to find which are OA publications, deduping on DOI then, using the publisher list price for APCs, determining the cost of the APC payment based on the corresponding author institution. A couple of institutions have done this calculation internally with varying results. My own use of the unPaywall API has shown it to be unreliable in terms of finding OA outputs as false positives can be returned, however it seems to be the most promising tool to date in this respect.

Retaining Rights to Research Publications

A survey of Australian university IP policies has been undertaken to identify potential barriers to the implementation of a national licence in Australia, similar to the UK-SCL licence. The key element of the UK-SCL licence is to retain the right to make the accepted manuscript of scholarly articles available publicly for non-commercial use (CC BY NC 4.0) from the moment of first publication. An embargo can be requested (by either the author or the publisher) for up to 12 months. However only 13 Australian universities have an IP policy that would be supportive of this licence. Recommended as the next step by CAUL is to approach Universities Australia for consideration and the development of guidelines for alignment of IP policies.

Statement on Open Scholarship

A final CAUL project is the Statement on Open Scholarship which is a call to action around advocacy, training, publishing, infrastructure, content acquisition and education resources. The review period ends at the end of October.

FAIR Data

Natasha Simons, ARDC, reported on an American Geophysical Union project designed to enable FAIR data. The project objectives were to look at FAIR-aligned repositories and FAIR-aligned publishers. There is a push for repositories to be the home for data rather than the supplementary section of journals. A commitment statement has been produced with a set of criteria that repositories must meet in order to enable FAIR data. (USC can meet about half of the requirements with the current infrastructure and policies).

In terms of Australian repositories, the AGU project may influence subsequent projects in other research disciplines. As publishers are moving away from data in supplementary sections of journals to data in (largely domain) repositories, trusted repositories (the Core Trust Seal) are becoming increasingly important.

Ginny Barbour, AOASG, proposed a new acronym, "PID+L" (pronounced, piddle) as the essential minimum of metadata required for research outputs to be FAIR:

ORCID
DOI for all outputs
PURL for grants

Licence (machine readable)

Note that we are unable do this with our current infrastructure.

ORCID

Simon Huggard, Chair of the ORCID Advisory Group, provided a snapshot of the ORCID Consortium in Australia. There are 41 organisations that are part of the consortium with 32 integrations completed (by 29 consortium members). The most popular system used for integration are custom integrations, followed by Symplectic, Pure, Converis, IRMA, Scholar One and ViVo. Seven institutions have done full ORCID authentication integration so that researchers can sign into ORCID using their institutional credentials. Currently there are 90K Australian researchers registered with an ORCID, up from 30K at the beginning of 2016.

The ORCID Consortium has developed a Vision 2020 which aims to have all active researchers in Australia with an ORCID, and all using their ORCID throughout the research lifecycle. The ARC and NHMRC will integrate ORCID into their grant management systems (which they have done, and which will be live in the next couple of weeks), and where possible, government agencies to draw upon ORCID data for research performance reporting and assessment.

There are challenges in integrating ORCID institutionally, most common being private profiles (early profiles were set to private by default) and synchronisation issues, particularly duplicates where metadata may be slightly different in varying source data. Another challenge is getting ORCID to be displayed in IRs. When asked about this, the ARC replied that although this is a requirement of their OA mandate, at present it is not a problem although it will be in the future.

Digital preservation

Jaye Weatherburn, University of Melbourne, gave a keynote presentation on digital preservation and the role that libraries, in particular IRs, need to play in this. Digital preservation is a series of managed activities necessary to ensure continued access to digital materials for as long as necessary. There are several reasons for looking at digital preservation - decay of storage media, rapidly advancing technology leading to obsolescence, fragility of digital materials, and protection against corruption and accidental deletion. A digital preservation strategy can be used to monitor these risks. Long term preservation however is not a 'set and forget'. It is an iterative process to ensure the life of a document is maintained. Without digital preservation there is no access to materials in the long term.

It should be noted that our IR doesn’t ‘do’ digital preservation beyond saving PDF files of outputs where available, along with metadata. The FIA collection does digital preservation slightly better, in that the PDF/A standard is used for master representations. While the Herbarium perhaps does it the best, with RAW, TIFF and JPG files being saved for each image. However, without a digital preservation system such as Rosetta, we are not so much preserving our digital data but rather just backing it up to protect against deletion.

Closely aligned with this theme of preservation is that of trustworthiness of a repository (which also includes the organisation). There are two frameworks that are commonly used for examining the trustworthiness of repositories - the Core Trust Seal, and the Audit and Certification of Trustworthy Digital Repositories based on ISO16363. Both can be self-assessed and provide a good means of documenting gaps, although the Core Trust Seal is less intensive on resourcing and time. This is something that I have been keen to do for USC since I first heard about it at the iPRES conference in 2014 and is something I will complete once a decision is made regarding a future system.

Below is a word-cloud of what attendees thought digital preservation meant to them:

Other interesting things:

Idea of incentivising scholarly communication via cryptocurrency.

Chris Berg, RMIT, opened with a keynote on blockchains as a tool to govern the creation of knowledge. Blockchains are economic infrastructure on which new forms of social organisation can be built. Chris states that academic publishing is a subset of a general problem that has afflicted publishing and the knowledge economy since the invention of the internet. The RMIT Blockchain Innovation Hub project had the idea of incentivising scholarly communication via cryptocurrency - a token to pay and reward for peer review, sharing citations, reading, etc. In terms of economic modelling, journal publishing can be viewed as a 'club'. The aim of the project was to bring transparency to the peer review process, provide digital copyright authentication and verification, and to provide incentives and rewards for the different aspects of the scholarly communication lifecycle. Enter 'JournalCoin'… Subscriptions, article processing fees and peer reviewers could be paid by JournalCoin, and rewards for such things as fast peer reviews, formatting, royalties, rankings and citations paid via JournalCoin. The journal is then the platform upon which the incentives are paid.

IRUS-UK pilot in Australia

CAVAL is currently running a project on implementing IRUS-UK in Australia. IRUS (Institutional Repository Usage Statistics) started in the UK in 2012 and sought to provide a standards-based service with auditable usage data. The aim was to reduce duplication of effort by IR managers and present a uniform set of usage data regardless of the IR platform. IRUS data is COUNTER-compliant. IRUS-UK now does this for about 140 IRs in the UK. A pilot has been underway in Australia involving University of Melbourne, Victoria University, University of Queensland, University of Sydney and Monash University to evaluate the usefulness of IRUS in Australia. Several of these institutions reported on their experience, which was largely positive. One advantage of the IRUS statistics is that they exclude 'false positive' metrics, resulting in slightly lower statistics than the native IR ones. CAVAL reported that if usage of IRUS goes ahead, maximum benefit will be realised if the majority of Australian universities participate and individual universities will be able to benchmark against each other.

Social Media campaigns

Susan Boulton, GU, provided an outline on a pilot the Library ran to promote their IR through social media. By using national/international events (such as World Malaria Day, Sustainability Week, and Dementia Month), blog posts and social media mentions were written showcasing the research that was in their IR. To prepare time was spent planning, sourcing open access content, identifying champion event owners, and preparing the social media material. These small social media events provided a significant jump in IR traffic and downloads. Another benefit was the improved relationship between researchers and the Library, as researchers can see another value-added service.

Digital Metadata

Thursday, 1 November 2018

CAUL Research Repository Days 2019