Digital Metadata

Wednesday, 28 June 2017

Notes from CAUL Repository Community Day 2017

The annual CAUL Repository Community Day was held last Monday as a satellite event alongside the Open Repositories 2017 conference in Brisbane. It was an excellent exchange of ideas and knowledge (as always) with lots of institutions doing many wonderful things in their institutional repository space.

Program: here

Kathleen Shearer (COAR) started the day with a wonderful talk about COAR (Confederation of Open Access Repositories) and the problem with the current scholarly publishing system. As library journal subscriptions continue to rise, Kathleen posed the question that if we are collecting published content and putting it into our repositories, are we then just perpetuating a flawed system? In addition, the need researchers feel to be published in "luxury" journals with high impact factors is forcing some researchers to work in 'trendy' areas, rather than doing the more important less glamorous work that needs to be done. Kathleen used the example of the recent Zika virus outbreak. Prior to the outbreak those researching in this area had problems getting their articles published in top journals. However as soon as the virus made it to the US, it became a 'trendy' topic and suddenly researchers could publish anywhere. This model of selective, elitist journal publishing is something that we need to move away from, although in today's academic climate of performance measurement (both internally in our institutions, and externally by the government) being based on citation rates and visibility in a particular subscription database, it will be a long time before we change.

COAR is also investigating the so-called 'next generation' repositories, and what such a repository may look like. As new technologies and services are developed, we need to strive to continue to make our repositories relevant. This is something that I feel really strongly about. The answer to this, according to Kathleen, is to strengthen and add value to our repository networks. Repositories are critical to our future visions of libraries and serve two roles: to showcase and provide access to the scholarly record of our own institutions; and as nodes in a global knowledge commons. To support this global nature of repositories, COAR launched the Aligning Repository Networks International Accord on 8th May 2017. A shared vision of this strategic coordination will facilitate data exchange such as cross regional harvesting between networks and repositories. The problem is that Australia doesn't have a formal 'repository network', something that the Australasian Repository Working Group (ARWG) is examining. Australia also lacks a national aggregator to exchange information with international aggregators such as OpenAIRE.

Another shared vision of the Accord is interoperability - common vocabularies and metadata guidelines. This is also a focus of the ARWG, and is seen as a core feature of repositories. There are however many challenges in improving interoperability with so many disparate systems and 'business-purposes' for our repositories. The ARWP sees interoperability requiring collaboration and a common understanding between repositories in Australia, as well as globally. Part of this common understanding is providing a consistent approach to metadata standards and vocabularies The NISO 'free_to_read' and 'licence_ref' tags are a beginning as this will help to identify open access content across systems. University of New England and Deakin University are the pioneers in this area, having implemented these tags in their institutional repositories already.

In so much as we have a national aggregator, TROVE is especially important to Australian repositories in that it harvests content into a single database. Due to the number of disparate systems in Australia, there is much variation of the quality of the data going into TROVE, with a large variety of metadata schemas and formats. Julia Hickie from TROVE spoke about the sort of data that is going into TROVE from our repositories. Identifiers, in particular, have proliferated in the last five years. In spite of this, ORCIDs are nearly invisible in the TROVE data, accounting for only about 1% of harvested records. This indicates that very few repositories are actually recording the ORCID identifier in their metadata records. Having said this, there are now over 17,000 ORCIDs in TROVE, up from 2000 two years ago. In order to improve the quality of the data in TROVE, Julia advised that it is a important to do a 'health check' on your repository data that is being harvested by TROVE every now and then, and particularly if you change something or move to a new system. Things to look at are:

make sure the repository URL is in a dc:identifier
author ORCIDS are URLs in dc:relation
ARC/NHMRC grant identifiers as a URL in dc:relation, and in the format http://dx.doi.org/[doi]
DOIs are URLS in dc:relation or dc:identifier
open access indicator uses free_to_read
Creative Commons licence information in full URL form in dc:rights or ali:licence_ref
rights statements are in full URL form in dc:rights or ali:licence_ref.

Check the OAI-PMH feed yourself to ensure that it is working correctly.

There are a number of institutions doing fantastic things with their repositories. Some of these presented on the day include:

Robin Burgess from the University of Sydney spoke about collecting non-traditional research outputs (NTROs) and bridging the gap between these outputs and traditional outputs, which had previously been kept in separate repositories. Consultation with academics showed that a repository that would suit these "defiant objects", or the "rule breakers", had to have a visually rich interface, with many academics already having these outputs showcased on personal websites. The system had to be able to display these outputs along with the ephemeral information that accompanied them.

Janice Chan from Curtin University reported on their recent move to Dspace using an external vendor, Atmire. They took the opportunity during the migration to change some work practices, including reducing the organisational structure in the system from many faculties to just two groups (research papers and theses, although they plan to add a third for grey literature) and to improve their metadata standards with the addition of funder and rights fields.

Kate Croker from University of Western Australia talked about their Repository Project, whereby they enriched their Pure repository with grant data, researcher profiles, publication collections, and the like. The project provided scope for extensive collaboration between stakeholders and external departments, strengthening the relationship in the process. Through this collaboration, a shared vision for the repository was produced which has helped to shape the direction, as well as build support, for the repository. [Kate also has presented this at ALIA Online - paper and presentation available here]

Bernadette Houghton from Deakin University spoke on using Omeka software for digital collections. She especially mentioned some recommendations surrounding the use of third party plugins versus those available on Omeka.org.

Bernedette Houghton also spoke about self assessment for repositories against ISO 16363 which Deakin University completed in 2013. This assesses features such as governance, technical infrastructure, security and preservation. In completing the project Deakin University looked at the existing literature, performed the self assessment, and addressed the areas of improvement. Bernedette provided a list of recommendations for anyone wanting to do a repository self assessment, including:

choose a tool (it doesn't have to be ISO 13636)
review the criteria from the start
understand ISO 13636's conceptual nature (based on OAIS)
preference local knowledge over ISO suggested documentation
allocate resources to address areas of improvement.

[Note: I first heard about this at iPRES 2014, and have been wanting to do this for our repository ever since, however I know how dismally ours would fail so haven't had the nerve until we move to a new system. More from Bernedette can be found here http://dx.doi.org/10.1045/march2015-houghton]

The day finished up with a general discussion on how institutions are dealing/coping with ERA and the absence of publications for HERDC (and if anything had taken it's place). Most (all?) institutions are still collecting publications on an annual basis, whether as ERA prepation, internal reporting, KPI's, or for "just-in-case" the government changes it's mind about HERDC. There have also been many institutions where the responsibility for publication collection has shifted from the research office to the library.

There was some discussion of the Engagment and Impact Assessment and how institutions went about completing this. Mary-Anne Marrington from University of Queensland reported on their method.

[Note: Our Office of Research decided not to participate in the pilot (held in 2017), so I can't comment on our process.]

The definition of "open access" for ERA (and more generally), in particular the difference between open access and free access, and which was correct for ERA, produced some lively debate. Different institutions have different definitions, and would require some clear guidelines. The ARC Open Access Policy is due out soon (the draft has been out for some months), so hopefully, at least for ERA purposes, this will be much clearer.

There were others that presented, and their presentations were fantastic. It is always good (although a bit deperssing) to hear what others are doing in this space. I always leave with many good ideas that I would love to implement back at home, but as always, resourcing (staffing and money) are always a problem. So, many of these ideas remain in limbo. I think it would be good for the powers that be to take note of Kathleen Shearer's comment, "repositories are a technology and technologies change". We need to continue to strive to make our repositories relevant in this changing landscape, and continue to value add services in either the repository layer or in the network layer above.

Friday, 19 February 2016

Easy as 1, 2, 3...

I know what I am interested in, and I know what I am passionate about, but seldom do these coincide with things that others are interested in and passionate about. Except ORCID.

ORCID (Open Researcher and Contributor Identifier) is something that seems to resonate with a whole bunch of people, from hard core researchers to administrators and librarians. For those that are saying "but what do flowers have to do with researchers?", ORCID is a way to disambiguate researchers, especially those with similar sounding names. By giving each researcher a unique number, they can then go and 'tag' their research publications, data, grants, and many other research outputs as theirs. It is like the grand-daddy of researcher identifiers. And it has landed in a big way.

However, I get the feeling that ORCID is more popular with research administrators than with the researchers themselves. There are a multitude of reasons why research institutions can benefit from ORCID (streamlining processes, reporting on research undertaken,identifying research resulting from grants awarded to staff, etc), but the benefits to researchers are not as obvious. Sure, being about to differentiate between the various "Tim Smiths" that work at the institution would be nice, but what else? What is there that drives the researcher to maintain their ORCID profile?

I recently read an article by The Research Whisperer that sums this up nicely, and I created a sketch note about it. And I must say, speaking as someone that rarely gets a like or retweet on Twitter, this has gone galactic! It has definitely hit a chord with many people who work either in research or around research. It is a credit to the author of the article (Jonathan O'Donnell) for writing such a wonderful piece.

Some of the comments I have received via Twitter include:

@BecOwen74, Just made my day. Thanks! (from @jod999)

I love this summary of ORCID from the Research Whisperer. This image makes its uses very straighforward (from @Ashley_UQL)

Get your research & profile out there - for all academics, postdocs, PhD students. Love the graphic! (from @LareenNewman)

And to top it off, the author even included it in his blog post on The Research Whisperer!

How chuffed am I!!

So, without further ado, here is my sketch note. I hope you enjoy.

Tuesday, 12 January 2016

Open Access in scholarly communication

Open Access (OA) has come along way since the idea was formalised in 2002 as the Budapest Open Access Initiative. It has since become the catch call for scholarly communication, providing an ideal tool in the dissemination of research results and publications far and wide.

Since it's inception, there are many models of implementing OA - gold, hybrid, delayed and green. Green OA is the preferred, at least from this Repository Manager's point of view, however I fully appreciate that some researchers do not want a less than perfect version of their work out on display to the world. This is where the Gold/Hybrid/Delayed route comes into play - still all very legitimate OA options.

No matter which OA method is chosen, the most important thing is that research results are made available to anyone that is interested, regardless of access to subscription library databases, and that universities and other research institutions recognise the importance in providing the infrastructure and the means to make this research accessible.

(Based on the article by Mohammad Reza Ghane (2014) Open Access Policy. International Journal of Information Science and Management)

Thursday, 7 January 2016

Impact of research on society

As I experiment in the sketch noting world, I use as my test topic an excellent article from academics at Charles Stuart University.

Societal impact has come under intense discussion lately in Australia as the Government prepares to trial an Assessment and Impact Framework from 2018, with a pilot to be run in 2017. This will be the first time that institutions nationwide will be involved in an assessment of this nature, which is touted to be along the lines of the UK Research Excellence Framework (REF).

But what is societal impact? According to the Australian Research Council (ARC), impact is

"the demonstrable contribution that research makes to the economy, society, culture, national security, public policy or service, health, the environment, or quality of life, beyond contributions to academia" (ARC Research Impact Principles and Framework).

This has started many conversations by worried university administrators as to how such impact can be measured.

This is where this article, and my naive efforts at sketch noting, helps us to understand. Bracing for Impact: The role of information science in supporting societal impact, by Lisa Given, Wade Kelly and Rebekah Willson, was presented at the ASIST 2015 Annual Meeting held in the United States.

Tuesday, 27 October 2015

Researcher identifiers

Researcher identifiers.....these unique sets of characters that identify a particular researcher as themselves, removing any ambiguity with other researchers of a similar name, are so important to the career of a researcher. Not only the researcher though, but also the institution that needs to report on many different metrics relating to research output.

For those that do not know what a researcher identifier is or what the advantages are, here is a quick rundown.

However I am constantly amazed at the lack of care factor that researchers show towards researcher identifiers. Don't get me wrong, some researchers "get it" and appreciate the importance of these characters. These are the ones that actively maintain their publications and add any missing ones. I love these researchers. But the other 90%....I just don't get it! I don't understand why they do not invest in the process.

The problem with Scopus:
Scopus is slightly different from the rest in that it is a system generated number assigned by Elsevier. When a publisher sends metadata to Scopus, a fancy algorithm tries to match the author based on name spelling, format and affiliation. If there is insufficient evidence to match with an existing author in the system, Scopus will automatically create a new one. You can see how it is very easy for authors to end up with multiple identifiers in Scopus. We recently did a "data cleansing" exercise in Scopus to try to identify and de-dupe multiple Scopus identifiers. I contacted each author to explain the situation and provided step by step instructions on what to do (I thought it was good that the author engages with the process so that they may then learn and keep on top of it themselves in the future). The most I found was one author that had 13 different identifies!! When you think about how much their research impact metrics were diluted out by having publications spread over 13 different identifiers, the mind boggles.

So why do researchers just not care? Is it that they are too busy, the process to complicated or do they just not understand the importance? Or maybe they just don't even know about researcher identifiers - no one has told them?

At USC we are trying to address the lack of care-factor with regards to researcher identifiers. A comprehensive online guide has been produced and is regularly updated (however the limitations of our website mean that it is not very discoverable). The librarians, when talking to researchers about outputs and metrics mention it. And we have even run a competition during our recent USC Research Week conference for a chance to win a $100 voucher for every researcher identifier reported to the Library. Emails to new staff ask if they have any researcher identifiers (from which we can obtain publications metadata for entry into our institutional repository, the USC Research Bank).

During the recent USC Research Week conference, the Library had a display encouraging researchers to think about their online research profile and what they can do to improve it. One of the most contentious posters was a "Top 10 @ USC" which listed the top 10 authors with, amongst other things, citations in Scopus, Web of Science and Google Scholar. The bottom line is that if a researcher doesn't have a researcher identifier or has a poorly managed researcher identifier then their publications will not be able to be measured using conventional recognised metrics.

Perhaps an addendum to the Top 10 @ USC poster is to put "We really struggled to find you because you didn't have a researcher identifier".

eResearch Australasia 2015

Last week I attended what is one of the best conferences of the year - eResearch Australasia, held in Brisbane. It is always a very inspiring event and I always come back filled with ideas to put into practice at work.

This year was a bit different to previous years in that it had a more library/human capital focus. Previous years have been heavy on the technology side which, while interesting, was often slightly over my head. This year is different.

The main themes prevalent throughout the conference were:

The importance of libraries and librarians for open data
Linked open data, not just shared data
Data as an institutional asset
The connected researcher.

Next year is being held in Melbourne during October - only a year to implement all my ideas before the next round.

Friday, 26 June 2015

Research data sharing

Sharing research data is increasingly becoming more popular, and while not synonymous with traditional scholarly publishing yet, it is nevertheless moving in that direction. We, as an aspiring research institution, need to start thinking about depositing and sharing “publications and data”, rather than treating research data as a special entity, if in fact we treat it as anything at all.

There are a number of benefits to the institution and the researcher for sharing data. Demonstrating good practice and research integrity raises the profile of the university and individual researcher. Sharing data makes it citable, which in turn can lead to increased citation metrics for both the publications associated with the data and the data itself. This is a good thing. Increased exposure from the data records can help foster new collaborations in research areas not previously thought of. And funding opportunities may improve due to a healthier research ecosystem and greater integration between systems and researcher profiles.

When reading about the positives for data sharing it is hard to understand why there is such a resistance to sharing within academic circles. Do researchers fear they will not be recognised or credited for their data? If the data has a good framework around it making it easy to obtain, understand and cite, then this risk should be reduced. Or do they fear “getting scooped”? Embargoing the data may be the solution to this.

Often institutions and policy makers have a perception that it is the “big” data that needs the most help when it comes to managing and sharing. This is usually not the case. Big data often has a more robust framework surrounding the collection and management of it – often due to requirements of funding organisations. The problem is with small data – the multitude of small spreadsheets that researchers maintain, often without adequate management, code keys, storage, backup… If data is managed correctly during the collection and analysis stage, it makes it all the easier for sharing once work has been completed. Data that is managed correctly – i.e. has a good framework around it – is more likely to be used and therefore cited. Unfortunately, citations are the name of the game in order to stay current in research.

For every risk or concern that researchers or institutions can throw up for sharing, there will always be a solution. Data should be shareable. Publically funded data should definitely always be shareable. The risk to institutions for not sharing data – non-compliance with policy and funding agreements, reputational damage, poor practice, low awareness –means that institutions should lead by example and facilitate the sharing infrastructure.

Sometimes there are legitimate concerns about sharing data – it is identifiable, confidential, private? What is the best way to manage this sort of data? Is it shareable? In these cases, the metadata can be available with mediated access to the data. When it comes to data of a sensitive nature, there will always need to be someone that can respond to requests.

Research data is an institutional asset, and as such should be treated as such. Unrecognised effort is a prime precursor to disengagement from researchers, staff and the community. And as an asset, you (whether the researcher, lab technician, administrator, executive, institution) should be treating research data with the respect it deserves.

“Products of research are not just publications” – NSF senior policy specialist Beth Strausser.

Graphic: http://d7.library.gatech.edu/research-data/home