Digital Metadata: Notes from CAUL Repository Community Day 2017

The annual CAUL Repository Community Day was held last Monday as a satellite event alongside the Open Repositories 2017 conference in Brisbane. It was an excellent exchange of ideas and knowledge (as always) with lots of institutions doing many wonderful things in their institutional repository space.

Program: here

Kathleen Shearer (COAR) started the day with a wonderful talk about COAR (Confederation of Open Access Repositories) and the problem with the current scholarly publishing system. As library journal subscriptions continue to rise, Kathleen posed the question that if we are collecting published content and putting it into our repositories, are we then just perpetuating a flawed system? In addition, the need researchers feel to be published in "luxury" journals with high impact factors is forcing some researchers to work in 'trendy' areas, rather than doing the more important less glamorous work that needs to be done. Kathleen used the example of the recent Zika virus outbreak. Prior to the outbreak those researching in this area had problems getting their articles published in top journals. However as soon as the virus made it to the US, it became a 'trendy' topic and suddenly researchers could publish anywhere. This model of selective, elitist journal publishing is something that we need to move away from, although in today's academic climate of performance measurement (both internally in our institutions, and externally by the government) being based on citation rates and visibility in a particular subscription database, it will be a long time before we change.

COAR is also investigating the so-called 'next generation' repositories, and what such a repository may look like. As new technologies and services are developed, we need to strive to continue to make our repositories relevant. This is something that I feel really strongly about. The answer to this, according to Kathleen, is to strengthen and add value to our repository networks. Repositories are critical to our future visions of libraries and serve two roles: to showcase and provide access to the scholarly record of our own institutions; and as nodes in a global knowledge commons. To support this global nature of repositories, COAR launched the Aligning Repository Networks International Accord on 8th May 2017. A shared vision of this strategic coordination will facilitate data exchange such as cross regional harvesting between networks and repositories. The problem is that Australia doesn't have a formal 'repository network', something that the Australasian Repository Working Group (ARWG) is examining. Australia also lacks a national aggregator to exchange information with international aggregators such as OpenAIRE.

Another shared vision of the Accord is interoperability - common vocabularies and metadata guidelines. This is also a focus of the ARWG, and is seen as a core feature of repositories. There are however many challenges in improving interoperability with so many disparate systems and 'business-purposes' for our repositories. The ARWP sees interoperability requiring collaboration and a common understanding between repositories in Australia, as well as globally. Part of this common understanding is providing a consistent approach to metadata standards and vocabularies The NISO 'free_to_read' and 'licence_ref' tags are a beginning as this will help to identify open access content across systems. University of New England and Deakin University are the pioneers in this area, having implemented these tags in their institutional repositories already.

In so much as we have a national aggregator, TROVE is especially important to Australian repositories in that it harvests content into a single database. Due to the number of disparate systems in Australia, there is much variation of the quality of the data going into TROVE, with a large variety of metadata schemas and formats. Julia Hickie from TROVE spoke about the sort of data that is going into TROVE from our repositories. Identifiers, in particular, have proliferated in the last five years. In spite of this, ORCIDs are nearly invisible in the TROVE data, accounting for only about 1% of harvested records. This indicates that very few repositories are actually recording the ORCID identifier in their metadata records. Having said this, there are now over 17,000 ORCIDs in TROVE, up from 2000 two years ago. In order to improve the quality of the data in TROVE, Julia advised that it is a important to do a 'health check' on your repository data that is being harvested by TROVE every now and then, and particularly if you change something or move to a new system. Things to look at are:

make sure the repository URL is in a dc:identifier
author ORCIDS are URLs in dc:relation
ARC/NHMRC grant identifiers as a URL in dc:relation, and in the format http://dx.doi.org/[doi]
DOIs are URLS in dc:relation or dc:identifier
open access indicator uses free_to_read
Creative Commons licence information in full URL form in dc:rights or ali:licence_ref
rights statements are in full URL form in dc:rights or ali:licence_ref.

Check the OAI-PMH feed yourself to ensure that it is working correctly.

There are a number of institutions doing fantastic things with their repositories. Some of these presented on the day include:

Robin Burgess from the University of Sydney spoke about collecting non-traditional research outputs (NTROs) and bridging the gap between these outputs and traditional outputs, which had previously been kept in separate repositories. Consultation with academics showed that a repository that would suit these "defiant objects", or the "rule breakers", had to have a visually rich interface, with many academics already having these outputs showcased on personal websites. The system had to be able to display these outputs along with the ephemeral information that accompanied them.

Janice Chan from Curtin University reported on their recent move to Dspace using an external vendor, Atmire. They took the opportunity during the migration to change some work practices, including reducing the organisational structure in the system from many faculties to just two groups (research papers and theses, although they plan to add a third for grey literature) and to improve their metadata standards with the addition of funder and rights fields.

Kate Croker from University of Western Australia talked about their Repository Project, whereby they enriched their Pure repository with grant data, researcher profiles, publication collections, and the like. The project provided scope for extensive collaboration between stakeholders and external departments, strengthening the relationship in the process. Through this collaboration, a shared vision for the repository was produced which has helped to shape the direction, as well as build support, for the repository. [Kate also has presented this at ALIA Online - paper and presentation available here]

Bernadette Houghton from Deakin University spoke on using Omeka software for digital collections. She especially mentioned some recommendations surrounding the use of third party plugins versus those available on Omeka.org.

Bernedette Houghton also spoke about self assessment for repositories against ISO 16363 which Deakin University completed in 2013. This assesses features such as governance, technical infrastructure, security and preservation. In completing the project Deakin University looked at the existing literature, performed the self assessment, and addressed the areas of improvement. Bernedette provided a list of recommendations for anyone wanting to do a repository self assessment, including:

choose a tool (it doesn't have to be ISO 13636)
review the criteria from the start
understand ISO 13636's conceptual nature (based on OAIS)
preference local knowledge over ISO suggested documentation
allocate resources to address areas of improvement.

[Note: I first heard about this at iPRES 2014, and have been wanting to do this for our repository ever since, however I know how dismally ours would fail so haven't had the nerve until we move to a new system. More from Bernedette can be found here http://dx.doi.org/10.1045/march2015-houghton]

The day finished up with a general discussion on how institutions are dealing/coping with ERA and the absence of publications for HERDC (and if anything had taken it's place). Most (all?) institutions are still collecting publications on an annual basis, whether as ERA prepation, internal reporting, KPI's, or for "just-in-case" the government changes it's mind about HERDC. There have also been many institutions where the responsibility for publication collection has shifted from the research office to the library.

There was some discussion of the Engagment and Impact Assessment and how institutions went about completing this. Mary-Anne Marrington from University of Queensland reported on their method.

[Note: Our Office of Research decided not to participate in the pilot (held in 2017), so I can't comment on our process.]

The definition of "open access" for ERA (and more generally), in particular the difference between open access and free access, and which was correct for ERA, produced some lively debate. Different institutions have different definitions, and would require some clear guidelines. The ARC Open Access Policy is due out soon (the draft has been out for some months), so hopefully, at least for ERA purposes, this will be much clearer.

There were others that presented, and their presentations were fantastic. It is always good (although a bit deperssing) to hear what others are doing in this space. I always leave with many good ideas that I would love to implement back at home, but as always, resourcing (staffing and money) are always a problem. So, many of these ideas remain in limbo. I think it would be good for the powers that be to take note of Kathleen Shearer's comment, "repositories are a technology and technologies change". We need to continue to strive to make our repositories relevant in this changing landscape, and continue to value add services in either the repository layer or in the network layer above.

Digital Metadata

Wednesday, 28 June 2017

Notes from CAUL Repository Community Day 2017

No comments:

Post a Comment