Digital Metadata

Monday, 1 April 2019

Projects 2019

From: https://pxhere.com/en/photo/1287932

It's a new year, nearly (it is April after all), and it is at this time that my organisation tackles the annual "Performance Planning and Review" (PPR) cycle for the next 12 months. This used to be one of the most valuable exercises that I complete in terms of setting the goals that I wanted to achieve over the next year. Nowadays it has become a "pick-a-couple-of-your-bosses-goals-and-do-them-as-your-own" exercise.

So with this in mind, and knowing that I like personal goals and challenges, I have come up with my own set of planning for the year. Of course I will still do those that I am required to do as part of the PPR process, but there are inevitably many more things that need to be accomplished during the year than the one or two selected from my supervisors list.

So what am I going to attempt to achieve in 2019? The list that I have come up with is quite ambitious, and I know that I will likely not get everything done (I am in a team of two, with limited resources, and very limited time), but nevertheless this is what I am aiming to achieve.

The work of the my team continues to expand as the sector grows and evolves. While the focus for the team is the institutional repository (IR) and associated tasks such as internal and external assessments and open access, other priority areas such as research data management, digital collection management, digital curation and preservation, system and resource advocacy, and education and awareness are important and need to be addressed.

The following is an overview of the projects that my team is interested in completing during 2019. These are in no particular order and are outside of the normal day-to-day activities of the team.

Review of Research Guides

The research guides utilise the SpringShare LibGuides platform and the take-up of them has been phenomenal. They are flexible, easy to use, and most importantly in this organisation, can be done without any assistance from IT or our web team.

However this has also provided problems. The guides, which are created by various librarians, have blown out of control with much of this content poorly organised and overlapping, which leads to conflicting and out of date information.

In 2016 my team initiated an overhaul of the research guides to streamline the content to make it easier for users to find the information they required. This work stalled for various reasons. During 2019 it is proposed that this work recommence to ensure that all information is easy to find, easy to view, and up to date. At the same time, additional content reflecting the current research agenda can be created.

I should note that this project is not the responsibility of my team, however as no one else seems to be taking control of this, it falls to us to do.

Review of Library presence on staff intranet

The Research Support information on our staff intranet is intentionally minimal due to duplication with guide content. In the past linking from the intranet to the guides has been sufficient. However recent feedback from researchers is that guide content, which is not indexed by the corporate website, is not findable (unless through Google) and the existing information on the intranet is not sufficient to be returned in search results.

The content on the intranet for research support (research data management, open access, for example) needs to be expanded. Feedback from staff that manage this system suggest that the current layout of the Library Research pages is not appropriate for the type of information that needs to be added, suggesting a review of the pages is required.

This review will happen following the Research Guide review.

Digital collections

The digital collection space continues to grow. We currently have three collections that have their own segment of work that is required.

This includes digitisation, cataloguing, auditing, software testing and configurations, and promotions. I have recommended that we start an Archival Collections Working Group to facilitate this work, and to gather interested parties throughout the institution together to discuss future projects.

Audit of research drive content

Growth in our network drive dedicated to research data has increased exponentially and continues to grow, currently at 140TB! Old, archivable, duplicate or orphaned data needs to be identified to better manage the space. Current data custodians and status of existing project folders need to be ascertained, and personal data removed off research drive onto a more appropriate location. In light of recent internal audit recommendations, completing this work is even more pressing.

My team started to audit content on the research drive at the end of 2018. There is much legacy data for which there is minimal metadata and that is not being stored as per current practice.

Updating Research Data Management Plans, to date, has been ad hoc, so this is another aim of this project.

Research Data Management

Various RDM activities are required, including:

Continue the review of the Research Data Management Plan that stalled in late 2018
Review the Research Data Registry (currently an Access database)
Review of the awareness/education channels of communication to researchers.

I have also recommended to my supervisor that we recommence the Research Data Working Group meetings (discontinued about 4 years ago) to re-engage the various stakeholders in the institution.

Self-Assessments

There are a number of self-assessments that should be undertaken during the year to assess the maturity of Library research services. It should be stressed that these are self-assessments and not all criteria will be able to be satisfactorily met due to the immaturity of the institutional commitment to research support. However, these will provide a gauge of what direction should be taken in the future.

Core Trust Seal (CTS)

The CTS measures the trustworthiness of a repository in both system architecture and institutional policies and workflows based on 16 criteria. Two systems are currently being used, which should be supported by two sets of institutional policy – the IR and the Digital Collections.

Research Infrastructure Self-Assessment (RISE)

RISE is the UK Digital Curation Centre’s capability model for research data management support services. Based on 21 capabilities across ten research data support areas, it is designed as a benchmarking tool to facilitate research data management service planning and development at an institutional level.

NDSA Levels of Digital Preservation

The National Digital Stewardship Alliance (NDSA) Levels of Digital Preservation is a tiered set of recommendations for how institutions begin to build their digital preservation strategies. It allows institutions to assess the level of preservation achieved for specific material in the areas of storage and geographic location, file fixity and data integrity, information security, metadata, and file formats.

IR software replacement

...And to finish it off, we have a project being completed this year to replace our IR software. We are finally looking at moving from VITAL to ExLibris Esploro. So stay tuned, more to come on this topic no doubt.

(Image: https://pxhere.com/en/photo/1287932)

Tuesday, 11 December 2018

Trustworthy digital repository certification

Today I proposed that my institution complete a trustworthy digital repository certification self-
assessment. I wasn't aiming high, not the ISO 16363 one, but rather the entry level CoreTrustSeal (CTS). Our repository service hasn't been evaluated since it was launched in 2007, and as the landscape has changed immensely since then, I thought it was high time that we had a good solid look at our system and see how it stacks up when evaluated against international standards.

So I approached my boss and asked when the next Library Management meeting was. Our Library is incredibly hierarchical in it's staffing, so mere peeps like me are not welcome at the Library Management meetings, and as we have no Library Manager, the three Team Leaders (of which my boss is one) are essentially running the Library. But more on that in the future.

So, finding out that the final Library Management meeting for the year was scheduled for this week, I informed my boss that I would be giving her a paper on repository certification for her to table at the meeting. The gist of it is that this is what it is, this is why we should do it, and this is what we expect.

I'm expecting much resistance, especially from one of the other Team Leaders who likes to think she knows everything but doesn't, and from my Director who doesn't seem to value the repository as part of the research infrastructure.

So here is my proposal (slightly edited for public consumption)....

Trustworthy Digital Repository Certification Proposal

Background

Repositories, whether they be institutional, data, disciplinary or archival, are becoming an increasingly critical part of an organisations research infrastructure not only as a place to store material and make them accessible and discoverable, but as a secure location that is committed to the preservation and long-term custodianship of scholarly activity. This involves having a technology solution backed by institutional policies that will ensure the longevity and continuity of service.

One increasingly popular method for organisations to demonstrate their commitment to this is through repository certification which aims to evaluate the sustainability and 'trustworthiness' of a repository service. The concept of trust underpins the relationship between repositories and their users. Depositors trust that the repository will accept responsibility for and safeguard their digital objects, and users trust that the objects they access are accurate and true to their original form. This is a key difference between repositories and other types of information systems - the need to permanently store and ensure ongoing access, authenticity and integrity to digital objects.

The development of the OAIS model (Open Archival Information System) provides an independent consensus of the requirements of an archive or repository for providing long term preservation and access to digital information. Created in 2002 and approved as an ISO standard in 2003, the OAIS model defines six functional entities of a trustworthy system: ingest, archival storage, data management, administration, preservation planning and access. The rise of the OAIS model accompanied a demand for assurance that repositories claiming to use OAIS actually adhere to those standards; that is, a demand for trustworthiness.

From: https://www.dpconline.org/docs/technology-watch-reports/1359-dpctw14-02/file

This has led to several international assessment methodologies for trustworthiness of a repository, all of which assess to varying degrees three primary areas: organisational infrastructure, digital object management and technical infrastructure.

The three most well-known certifications are:

Trusted Repositories Audit and Certification (TRAC)

based on OAIS and ISO 16363 Audit and Certification of Trusted Digital Repositories
over 100 metrics evaluated
evidence-based audit framework
full external auditing process

nestorSeal

based on 34 criteria
written by German NESTOR-group
based on the DIN Standards Committee in Germany, DIN 31644 Information and documentation - Criteria for trustworthy digital archives
self-assessment and evidence
review of assessment by two reviewers appointed by NESTOR

CoreTrustSeal (CTS)

launched in 2017 and is a combination of two precursor trust seals - ICSU World Data System (WDS) and Data Seal of Approval (DSA)
based on 16 criteria
self-assessment with peer review
three-year certification period

From: https://services.phaidra.univie.ac.at/api/object/o:584413/diss/Content/get

The CTS is the entry-level certification for trustworthiness of repositories. Currently in Australia there are four organisations that have obtained the CTS (or its precursor, WDS or DSA).

Space Weather (Sydney) - WDS Certified Repository
Australian Antarctic Data Centre (Hobart) - CTS Certified Repository
Australian Data Archive (Canberra) - CTS Certified Repository
CSIRO Data Access Portal (Canberra) - CTS Certified Repository

There are several other institutions that have gone through the certification self-assessment in order to identify gaps in their organisational environment, including Deakin University (who underwent the full TRAC certification).

From Our Perspective
We have two repositories - the Research Collections (currently the Research Bank) and the Research Archives. However, our repositories are not just a place to store digital objects. They are a combination of both software and services that are provided to support digital objects and related archival and scholarly communication material

Our organisation has not undergone an evaluation of its repositories since the launch of the Research Bank in 2007, so the degree of ‘trustworthiness’ (how well it meets the criteria) is unknown. Currently both the collections and the archives are on two different software platforms, with different objectives and missions. Although there is an ongoing commitment in providing access to these digital objects, this is not documented well in policy or guidelines. At present it is expected that our repositories would fail many of the trustworthiness criteria.

Note: ExLibris have made a commitment to ensure that Alma-D meets the criteria for the CTS in 2019. It is anticipated that VITAL (Innovative) would fail the technical components of CTS assessment.

My Project Proposal
Out of the three certification methodologies, it is recommended that the Library complete the self-assessment of the CTS. This will benefit us as it will provide an opportunity to examine our research infrastructure (technical and organisational) as well as our data structure against an internationally recognised set of criteria. This will help to determine the Library (and the repositories) strengths and weaknesses. Although we will be unable to achieve CTS certification alongside ExLibris in 2019 it is nevertheless an opportunity for the Library to begin to look at the gaps in our commitment, so we can better advocate for change. It is hoped that in the future we will be able to obtain the CTS certification for both the research collections and the archival collections. This will provide a solid foundation to apply for higher-level certifications in the future.

The main objectives of this project proposal would be:

Assess our infrastructure against the CTS framework for the research collections and archival collections.
Identify the function and mission for the research collections and archival collections.
Identify areas where the organisational policy and workflows do not meet the criteria.
Identify areas where the technical infrastructure does not meet the criteria.
Develop a plan of areas that need improvement in order to gain the CTS in 2020 or 2021.

It is anticipated that this project would take 6 months to complete and would not require any additional resourcing. Expected end date would be December 2019 with the outcome being a clearer understanding of the Library’s strengths and weaknesses in terms of our repository landscape. Findings and recommendations would then provide the foundation on which future policies and workflows could be built.

Appendix

CoreTrustSeal Critiera

(listed in the project proposal, but linked here)

Thursday, 29 November 2018

World Digital Preservation Day 2018

Today is World Digital Preservation Day. It is the day when the digital preservation community around the world come together to celebrate the collections that have been preserved, the access has been maintained, and the work that is being done to preserve our digital legacy.

Organised by the Digital Preservation Coalition and supported by digital preservation networks all over the world, World Digital Preservation Day raises awareness of the strategic, cultural and technological issues which make up the digital preservation challenge. Since the first public website was launched nearly 30 years ago, there has been an explosion of digital content worldwide. This ‘born digital’ content is tomorrow’s cultural heritage, and it’s our job to ensure that we collect and preserve this digital history for future generations.

When I first heard about this day I was excited. Ever since I attended the iPRES (International Conference on Digital Preservation) conference in 2014 when it was held in Melbourne, I have been fascinated, passionate, intrigued with digital preservation. Unfortunately it isn't something that is shared with my institution. Digital preservation costs money, sometimes a lot of money. And that is something that my institution is very loathe to part with. (I should know, I've been campaigning for a new institutional repository system since 2010 with no luck, and that is peanuts compared to a preservation system).

But, in my own way, I have been steadily pushing the preservation agenda, one step at a time.

So, many people don't really understand what digital preservation actually is. Essentially it is the coordinated and ongoing set of processes and activities that ensure long-term, error-free storage of digital information with a means for retrieval and interpretation for the entire time span the information is required. Preservation isn't just digitisation, however digitisationis part of preservation.

So with that in mind, this is what we have been doing in this space for the last few years.

Institutional repository (IR)

Our IR actually does preservation reasonably well (which is surprising as not much else works with it). It produces a PREMIS datastream, which is the international standard for metadata to support the preservation of digital objects and ensure their long-term usability. If only we understood how the IR platform actually uses it..... Our IR also does versioning very well. While not strictly 'digital preservation', it is a form of preservation as each version is maintained in the system (metadata and documents). However this is where our investment into digital preservation ends. Our documents are at best stored as standard PDFs, at worst Word documents or other proprietary file formats. And we do not actively maintain the files in the system, but rather just 'believe' that they will work when we next try to download one.

Digital collections

Our institution has a number of digital research collections and the degree of "preservation-ness" for each one varies. One has PDF/A as a standard for any scanned documents, another has straight PDFs, and the third (a collection of images) has RAW, TIFF and JPG files. However, like the IR, the files are not actively maintained for fixity and access. This is a new focus area for us (and for the system that we use) and I'm hoping that it will further develop in this space in the future.

Research data

Unfortunately, beyond backup, no digital preservation activities are performed on our research data. Like musch of our research infrastructure we are working with sub-standard ad hoc systems. When we can better manage it from an administrative aspect, then hopefully we will be able to better manage the digital preservation of it.

Digital preservation is something that I get very excited about. However I'm not naive enough to realise that it can and probably is a very dry subject to many people. So, just to make it a bit more interesting, here are some websites that I think are pretty cool.

The Museum of Obsolete Media has over 500 current and obsolete physical media formats covering audio, video, film and data storage.

And the ‘Bit List’ of Digitally Endangered Species, a crowd-sourcing list of which digital materials the community things are most at risk.

But perhaps the coolest thing about World Digital Preservation Day is that it gives us a chance to have cake!

Happy World Digital Preservation Day everyone!

Saturday, 24 November 2018

Just the Pure truth...

On Friday we had another demonstration from Elsevier of their research management system, Pure. We had a demonstration/sales pitch a few years ago, but at the time things were not as desperate as they are now. The research FTE has grown by over 100% in the last eight years, with journal articles alone growing by a whopping 230% in the last five years (as submitted for ERA evaluation). So, in spite of the Library having chosen a preferred new institutional repository, the Research Office decided that they knew better and organised a new demonstration of Pure.

It say that it was interesting would be an understatement. It was an informative demonstration that showed that there are so many holes in Pure that I’m surprised that it is keeping afloat!

But, let us look at the reasons why our institution is interested. The Research Office uses Research Master currently for Grants, Ethics and HDR management. They used to use it for Publications but have recently turned this module off in favour of ‘linking’ their reporting to our very flaky and unstable Access database which shadows our institutional repository (a whole other story). The HDR management is being migrated as we speak to PeopleSoft, which just leaves the Grants and Ethics. In an ideal world the institution would source a new system that would be able to cope with both the Research Office’s needs as well as the Libraries (which include research outputs, data management and digital collections), but it seems that just such a system has yet to be made. The Library is keen on signing up for ExLibris Esploro, which initially would only cater for publications however research data management and possibly grants management are on their roadmap. The Research Office is “luke warm” about Esploro (I know not why), so when the invitation came for another demonstration of Pure, I was ready and waiting to see what had changed in the last few years.
I have to say, not much. Below are some random thoughts from the demonstration (which was delivered as a webinar).

Many institutions, including University of St Andrews (UK) which was featured in the Elsevier PPT, use Pure but also still maintain a separate institutional repository. When I asked the Elsevier presenter why this would be, he responded that there may be a number of reasons including the need for a system that could handle collections (Pure doesn’t handle collections at all). As a Library we have digital archival and research collections (for example, the K'gari (Fraser Island) Research Archive and the USC Art Gallery Ephemera Collection) that are currently using the ExLibris Alma/Primo VE platforms. Although there have been and still are some teething problems to using this, on the whole it is a big improvement on our previous system, Canto Cumulus. So it would be likely that we would continue using Alma/Primo for these types of collections. However there are other research collections housed in our institutional repository that would not be able to be catered for in Pure, such as publications relating to a particular project a particular research group (especially if it were an informal group), theses, conferences that USC hosts, etc.

Reporting and analytics are always a problem with any system, and often the deal-breaker if it doesn't deliver. During the recent CAUL Research Repository Days in Melbourne it was mentioned that Pure’s reporting was “diabolical” and that some coding experience was required. Pure doesn't share direct access to the database, but provides all reporting through APIs. The API structure is the same as the Scopus API. Reporting of the backend is via APIs whereas the reporting from the front end is via the dashboard. Elsevier is currently enhancing their reporting module including building a ‘write’ API. Interestingly my advance question on output analytics (page views, downloads) was ignored and I didn’t realise until afterwards that it wasn’t answered.

In our previous demonstration some years ago, Pure had no Ethics module. Now they have a basic Ethics record, with some institutions such as Monash University using Infonetica as their main Ethics system. I do not know enough about Ethics workflows to know what we would need, however it seems that Pure will not cater for the Ethics requirements of the institution.

When I asked at the end about compliance with the ARC/NHMRC OA mandate, I expected a system such as Pure to be up to code. However I was surprised to hear that they are not compliant. Elsevier is meeting with the ARC in the near future to discuss the requirements, so who knows when it will be released into production. That being said, our current institutional repository is only about half compliant. But shhhhh!, don’t tell anyone.

Preservation is becoming increasingly important for digital data, and research information is no difference. It is something that I am very passionate about, although acknowledging that I am a novice in the field. When asked about Pure’s preservation strategy the Elsevier presenter mentioned that a history is kept of metadata records in the background (no clue as to if this is accessible to the administrator or only Elsevier staff). No versioning of documents is kept as the system is not designed for this. However I believe that Pure can plug into third party proprietary preservation systems, although this wasn’t confirmed by the presenter.

During the demonstration several years ago it was mentioned that any metadata harvested into Pure from Scopus was unable to be edited. Scopus is not the most perfect of metadata aggregators and there are often mistakes with the metadata. So to hear that the system would have a subset of records that are unable to be edited was alarming. Happy to say that this has now changed. Pure now lets you edit Scopus records, and is even considering allowing users to edit records for errors which would then feed back to the Scopus database!

Editing records is a bug-bear in our current institutional repository with many of our older records unable to be bulk edited. The Elsevier presenter said that all the records could be edited in bulk. The documentation however states that not all fields could be edited, and this was one of my advanced questions, but like the analytics question I forgot to ask about this during the demonstration.

One of the advantages of Esploro is that research data management, in particular dynamic data management plans, are on the roadmap to be developed. Pure is not going to have research data management capabilities but is going to rely on Mendeley Data for the data management tool. I haven’t heard anything about Mendeley Data, so will need to look into it to see what it’s capabilities are.

The Grants management in Pure seems to be a fairly superficial module, at least compared to the richness of the Research Master data. However, like Ethics, I do not know enough about Grants to know if this would be a suitable system. I do know that Esploro will interoperate with Pure, so if the Research Office chooses to use Pure for Grants and we end up using Esploro for our institutional repository, they will at least talk to each other.

On the whole, Pure is a system that could be used as long as you make allowances for it’s limitations - in reporting, institutional repository, collections, Ethics, Grants and research data. No system is perfect, however some systems are more perfect than others. And I fear that Pure isn’t one of them.

Thursday, 1 November 2018

CAUL Research Repository Days 2019

The 2018 CAUL Research Repository Days were held in Melbourne over 29-30 October. Although there was much discussion over many different topics, the program was very much focused on interoperability between systems which is a trend that I have observed the IR community heading towards. With a well running platform, repository work is less about the 'publication' and more about how systems interact with each other.

The below is a summary of some of the themes that were of particular interest to my institution and myself.

CAUL Projects

Review of Australian Repository Infrastructure Project

Much of Day 1 was in discussion of FAIR. Drafted in 2014 but published in 2016, the FAIR principles (of Findable, Accessible, Interoperable and Reusable) are a set of 14 metrics designed to determine the level of FAIRness of an output or system. In response to this, CAUL proposed a project in 2017 to determine how improvements to repository infrastructure can be made across the sector to increase the FAIRness of Australian-funded research outputs. The final report has just been released.

The project followed seven project working groups designed to examine the current repository infrastructure, international repository infrastructure developments, repository user stories, ideal state for Australian repository infrastructure, next generation repository tools, and make recommendations for the possible "Research Australia" collection of research outputs. The first six group findings are included in the report, while the seventh, the "Research Australia" recommendations, is due at the end of 2018.

Each working group provided a report of their findings. Most were not surprising and were generally what we have known to be the case for some time. In summary (and in no particular order), they include:

Although nine institutions had new generation repository software, many of the others had ageing infrastructure that perhaps had not been able to be funded since the ASHER funding in 2007, with VITAL particularly mentioned for dropping in number
Ageing software was identified as a weakness of repository infrastructure, as was the lack of automation and identifiers
Research outputs were the most common output in IRs, followed by theses and research data. Other output types included archival collections, journal, images and course materials
Institutions numbers were almost equal in terms of having an OA policy, a statement or partial policy, or no OA policy at all
Only 5 institutions supported research activity identifiers (although they didn’t specify RAiDs in particular)
13 institutions had a digital preservation strategy for the IR content, with a further 3 developing a strategy
Most successful initiatives have stable secure funding.
Recommendation that CAUL seek consortia membership of COAR
List of general repository requirements.

The seventh group is looking at the feasibility of a "Research Australia" portal as a single-entry point to a collection of all Australian Research outputs. This is similar to the RUN proposal some years ago. Views were extremely mixed regarding this. Responses included that it is duplicating what we already have with Google Scholar and TROVE, whether it would be OA or metadata only, the quality of metadata, and questions over unique institutional requirements. Three possibilities have been proposed - upgrade TROVE to provide all necessary reporting needs, develop a new portal harvesting repositories (similar to the OpenAIRE model), or developed a shared infrastructure.

Collecting and Reporting of Article Processing Charges (APCs)

Another CAUL project currently underway is the APC project determining the cost of article processing charges for institutions. Several options are proposed. Less preferred include creating a fund code in the finance system of the institution or querying the finance system using a selection of keywords. Another less preferred option is obtaining reports from publishers or making them provide this information as part of the subscription agreement. What is likely to be proposed is a very manual method of extracting a dataset from Web of Science, Scopus and Dimensions, either by institution or nationally, run it against the unPaywall API to find which are OA publications, deduping on DOI then, using the publisher list price for APCs, determining the cost of the APC payment based on the corresponding author institution. A couple of institutions have done this calculation internally with varying results. My own use of the unPaywall API has shown it to be unreliable in terms of finding OA outputs as false positives can be returned, however it seems to be the most promising tool to date in this respect.

Retaining Rights to Research Publications

A survey of Australian university IP policies has been undertaken to identify potential barriers to the implementation of a national licence in Australia, similar to the UK-SCL licence. The key element of the UK-SCL licence is to retain the right to make the accepted manuscript of scholarly articles available publicly for non-commercial use (CC BY NC 4.0) from the moment of first publication. An embargo can be requested (by either the author or the publisher) for up to 12 months. However only 13 Australian universities have an IP policy that would be supportive of this licence. Recommended as the next step by CAUL is to approach Universities Australia for consideration and the development of guidelines for alignment of IP policies.

Statement on Open Scholarship

A final CAUL project is the Statement on Open Scholarship which is a call to action around advocacy, training, publishing, infrastructure, content acquisition and education resources. The review period ends at the end of October.

FAIR Data

Natasha Simons, ARDC, reported on an American Geophysical Union project designed to enable FAIR data. The project objectives were to look at FAIR-aligned repositories and FAIR-aligned publishers. There is a push for repositories to be the home for data rather than the supplementary section of journals. A commitment statement has been produced with a set of criteria that repositories must meet in order to enable FAIR data. (USC can meet about half of the requirements with the current infrastructure and policies).

In terms of Australian repositories, the AGU project may influence subsequent projects in other research disciplines. As publishers are moving away from data in supplementary sections of journals to data in (largely domain) repositories, trusted repositories (the Core Trust Seal) are becoming increasingly important.

Ginny Barbour, AOASG, proposed a new acronym, "PID+L" (pronounced, piddle) as the essential minimum of metadata required for research outputs to be FAIR:

ORCID
DOI for all outputs
PURL for grants

Licence (machine readable)

Note that we are unable do this with our current infrastructure.

ORCID

Simon Huggard, Chair of the ORCID Advisory Group, provided a snapshot of the ORCID Consortium in Australia. There are 41 organisations that are part of the consortium with 32 integrations completed (by 29 consortium members). The most popular system used for integration are custom integrations, followed by Symplectic, Pure, Converis, IRMA, Scholar One and ViVo. Seven institutions have done full ORCID authentication integration so that researchers can sign into ORCID using their institutional credentials. Currently there are 90K Australian researchers registered with an ORCID, up from 30K at the beginning of 2016.

The ORCID Consortium has developed a Vision 2020 which aims to have all active researchers in Australia with an ORCID, and all using their ORCID throughout the research lifecycle. The ARC and NHMRC will integrate ORCID into their grant management systems (which they have done, and which will be live in the next couple of weeks), and where possible, government agencies to draw upon ORCID data for research performance reporting and assessment.

There are challenges in integrating ORCID institutionally, most common being private profiles (early profiles were set to private by default) and synchronisation issues, particularly duplicates where metadata may be slightly different in varying source data. Another challenge is getting ORCID to be displayed in IRs. When asked about this, the ARC replied that although this is a requirement of their OA mandate, at present it is not a problem although it will be in the future.

Digital preservation

Jaye Weatherburn, University of Melbourne, gave a keynote presentation on digital preservation and the role that libraries, in particular IRs, need to play in this. Digital preservation is a series of managed activities necessary to ensure continued access to digital materials for as long as necessary. There are several reasons for looking at digital preservation - decay of storage media, rapidly advancing technology leading to obsolescence, fragility of digital materials, and protection against corruption and accidental deletion. A digital preservation strategy can be used to monitor these risks. Long term preservation however is not a 'set and forget'. It is an iterative process to ensure the life of a document is maintained. Without digital preservation there is no access to materials in the long term.

It should be noted that our IR doesn’t ‘do’ digital preservation beyond saving PDF files of outputs where available, along with metadata. The FIA collection does digital preservation slightly better, in that the PDF/A standard is used for master representations. While the Herbarium perhaps does it the best, with RAW, TIFF and JPG files being saved for each image. However, without a digital preservation system such as Rosetta, we are not so much preserving our digital data but rather just backing it up to protect against deletion.

Closely aligned with this theme of preservation is that of trustworthiness of a repository (which also includes the organisation). There are two frameworks that are commonly used for examining the trustworthiness of repositories - the Core Trust Seal, and the Audit and Certification of Trustworthy Digital Repositories based on ISO16363. Both can be self-assessed and provide a good means of documenting gaps, although the Core Trust Seal is less intensive on resourcing and time. This is something that I have been keen to do for USC since I first heard about it at the iPRES conference in 2014 and is something I will complete once a decision is made regarding a future system.

Below is a word-cloud of what attendees thought digital preservation meant to them:

Other interesting things:

Idea of incentivising scholarly communication via cryptocurrency.

Chris Berg, RMIT, opened with a keynote on blockchains as a tool to govern the creation of knowledge. Blockchains are economic infrastructure on which new forms of social organisation can be built. Chris states that academic publishing is a subset of a general problem that has afflicted publishing and the knowledge economy since the invention of the internet. The RMIT Blockchain Innovation Hub project had the idea of incentivising scholarly communication via cryptocurrency - a token to pay and reward for peer review, sharing citations, reading, etc. In terms of economic modelling, journal publishing can be viewed as a 'club'. The aim of the project was to bring transparency to the peer review process, provide digital copyright authentication and verification, and to provide incentives and rewards for the different aspects of the scholarly communication lifecycle. Enter 'JournalCoin'… Subscriptions, article processing fees and peer reviewers could be paid by JournalCoin, and rewards for such things as fast peer reviews, formatting, royalties, rankings and citations paid via JournalCoin. The journal is then the platform upon which the incentives are paid.

IRUS-UK pilot in Australia

CAVAL is currently running a project on implementing IRUS-UK in Australia. IRUS (Institutional Repository Usage Statistics) started in the UK in 2012 and sought to provide a standards-based service with auditable usage data. The aim was to reduce duplication of effort by IR managers and present a uniform set of usage data regardless of the IR platform. IRUS data is COUNTER-compliant. IRUS-UK now does this for about 140 IRs in the UK. A pilot has been underway in Australia involving University of Melbourne, Victoria University, University of Queensland, University of Sydney and Monash University to evaluate the usefulness of IRUS in Australia. Several of these institutions reported on their experience, which was largely positive. One advantage of the IRUS statistics is that they exclude 'false positive' metrics, resulting in slightly lower statistics than the native IR ones. CAVAL reported that if usage of IRUS goes ahead, maximum benefit will be realised if the majority of Australian universities participate and individual universities will be able to benchmark against each other.

Social Media campaigns

Susan Boulton, GU, provided an outline on a pilot the Library ran to promote their IR through social media. By using national/international events (such as World Malaria Day, Sustainability Week, and Dementia Month), blog posts and social media mentions were written showcasing the research that was in their IR. To prepare time was spent planning, sourcing open access content, identifying champion event owners, and preparing the social media material. These small social media events provided a significant jump in IR traffic and downloads. Another benefit was the improved relationship between researchers and the Library, as researchers can see another value-added service.

Thursday, 21 December 2017

The libraries role in Research Information Management

Research information management (RIM) is traditionally the domain of the research office in modern institutions, traditionally with little input from other departments. However currently thinking and systems integration is turning this notion on its head.

An OCLC report, Research Information Management: DefiningRIM and the Library’s Role [1], published in October 2017, aims to place libraries as not only a stakeholder in the RIM space, but as an active participant. What follows is a summary of this report.

But what is meant by RIM? According to the OCLC report, RIM is the “aggregation, curation and utilisation of information about research”, or “institutional curation of the institutional scholarly record”. Another very succinct definition by Science Europe is that it is the “data about research activities, rather than the research data generated by researchers.” An important thing to note is that RIM data is metadata only – it rarely, if ever, includes the actual artefact being described. For example, RIM publication data is at the metadata level, and doesn’t include the actual publication itself. This is usually held in a different system, such as an institutional repository.

No matter how it is defined, over the years institutions have developed many systems, practices and workflows in order to capture the varied types of information sourced from many different areas of the institution. At my own institution, RIM activity is sourced from four enterprise systems and an unknown number of local departmental systems, all owned by the research office, library, human resources, finance and the faculties themselves. Not all of this information comes together in the one place but much of it does, with faculties, libraries and the research office selectively (and often manually) supplementing the data with additional information if the situation requires it. This lack of interoperability between systems is something that many institutions, including my own, is struggling with. Having such disparate systems, often managed by different departments with different workflows, creates huge barriers to the flow if RIM metadata. Collaboration and communication between stakeholders is key to effective RIM management.

So what can, and what does, the library bring to the table? The OCLC report states that RIM systems collect and store metadata on research outputs and activities including those shown in the figure below.

Source: 2017. Research Information Management: Defining RIM and the Library’s Role, pg 6.

This data can be utilised in a variety of ways including academic progress, grants management and in researcher profiles, to name a few. This is best displayed in the figure below from the OCLC report.

Source: 2017. Research Information Management: Defining RIM and the Library’s Role, pg 8.

So, where does the library fit into this. Traditionally libraries have been viewed as places of collection development and not much else. My institution isn’t too different. There is a huge student focus on the activities of the librarians, to the extent that many stakeholders are only just realising the value that the library can bring to RIM discussions, and unfortunately some don’t realise it at all. However, libraries are hot-pots of expertise in scholarly communication and they should be pushing to be recognised as such.

The report segregates library expertise in four groups: publications and scholarship; training and support; discoverability, access and reputational support; and stewards of the institutional record, all of which are critical ways in which the library can support RIM activities.

Publications and Scholarship Expertise

Metadata, bibliographic records, publication indexes and more are all bread and butter to many librarians. It is the stuff they have been doing since graduating from university (and in many cases, before). This is what librarians do, and they do it well. They have the relationships with the vendors that supply citation products, many of which also now have their own RIM systems. They are knowledgeable about trends in the publishing sphere, and in my experience, are extremely skilful in identifying and correcting duplicate or incorrect bibliographic metadata for research publications or indeed about the researchers themselves.

Training and Support for End Users

Librarians are expert trainers, having trained scores of students throughout their degrees in bibliographic searches, data management, citation management, publishing and open access, and research metrics. While at my institution librarians do not train directly in RIM, I don’t think anyone else does either. This leaves a perfect opening for the librarians to get involved and fill the gap.

Discoverability, Access and Reputational Support

In a day where reputation is everything, libraries are uniquely situated to provide reputational impact for institutions. Research publications are a major output of universities and they are largely externally facing meaning that the reputation of an institution is judged in part on the quality of the research being conducted. This is largely communicated via publication outputs (although impact metrics and research data are fast becoming legitimate indicators of reputation). Libraries, through institutional open access repositories, provide data to profile pages, comply with funding open access mandates for the institution, make outputs more discoverable thereby providing the potential for higher citation rates, and offer a wide range of bibliometrics and altmetrics that support researcher reputation.

Stewards of the Institutional Record

Libraries, including my own, play an integral role as stewards of the institutional record. Receipt, curation, discoverability and preservation of scholarly outputs, as well as archival material, is a core component of the libraries work. This is often an outward facing collection that is the basis for a large amount of institutional reporting.

The library has a great deal to offer institutions in the research information management space, much of which goes unnoticed by potential stakeholders in those very institutions. RIM reporting benefits institutions in many ways, providing insights into the school, faculty or institutional level, benchmarking and collaboration information, and impact and engagement narratives. I recommend that anyone working in the RIM space to have a read of the OCLC report, particularly in the library field, although any RIM stakeholder will find the information valuable.

[1] Bryant, Rebecca, Anna Clements, Carol Feltes, David Groenewegen, Simon Huggard, Holly Mercer, Roxanne Missingham, Maliaca Oxfam, Anne Raul and John Wright. 2017. Research Information Management: Defining RIM and the Library’s Role, OH: OCLC Research. DOI:10.25333/C3NK88

Saturday, 22 July 2017

Your online scholarly identity...why is it important?

There are so many tools out there now to help researchers build their online identity – ORCID, Google Scholar, ResearchGate, Academia.edu, Twitter, Facebook, LinkedIn, and the list goes on. But what do we mean by ‘online identity’ and why is it so important to researchers? (A side question could also be why we, as librarians, care about this when obviously, many researchers don’t? But that is another story altogether).

So, what do we mean by online identity. Everyone has an online identity if you have had anything to do with the internet. Type your name into Google and most people will get at least one hit on their name with some people getting many hits. But why is this important? Your online identity is what people who don’t know you (but know about you) search for – prospective employers, colleagues, rivals, people you have just met at a conference. From a professional viewpoint, it is important that these people find your information, and most importantly, find accurate information easily.

Maintaining and curating your professional identity is the key to achieving this. But in a minefield of options which ones do you choose?

You have probably been using products that you have either been using in a previous life, or that your colleagues are already using. However, no matter which product (or products) you use, it is critical that these be kept up to date. There is nothing worse that searching and finding someone’s profile only to find that the most recent publication listed is from four years ago, or where they say they work is clearly not the case.

Where I work, we have three that we recommend that researchers keep up to date – ORCID, Google Scholar profile and of course, the institutional repository.

[Note: for the purpose of this I am discounting social media such as Twitter, Facebook and LinkedIn – all of which are important in disseminating information about yourself and your works.]

ORCID has quickly risen to be a universal identifier for scholarly outputs, and, why wouldn’t it? It is free to use, independent, and has a fantastic developing team behind it. Being a truly independent identifier also means that it easily integrates with external systems (unless you work where I do, where nothing really seems to integrate with anything else – but that is a whole other issue). With ORCID you can upload your outputs, or connect to another data source (such as Scopus or Research Data Australia) and any of your works listed in these data sources will be imported into your profile. You can add grant information, education and work details, as well as provide links to other online profiles that you maintain. By sharing your ORCID iD URL with colleagues you can provide a quick, one-stop-shop to all professional information about yourself.

The second online profile we recommend curating is a Google Scholar My Citations Profile. Google Scholar is big….not as big as Google, but in academic circles it is still pretty awesome. Having a Google Scholar profile set up is one key way for other people to find your research. Of course, it relies on your publications being indexed and available in Google, although there is the option of manually adding selected metadata about those that aren’t in Google. A word of warning – it is very easy to accidently add publications that aren’t yours to your profile, or for someone else to add your publications to their profile. Careful checking and periodic searches of your works may be advisable.

Which brings us to the old favourite, the institutional repository. Love it or hate it, it is here to stay and likely tied into your institutions reporting requirements. So, love it or hate it, you may be required to use it. I love our institutional repository, even the funny quirks that make it frustrating to work with, but it serves a purpose and a function in our academic community. And it is easy for our researchers. All they need to do is send in the metadata of their recent publications and the Research Collections team does the rest! It is a way for researchers to archive their outputs, preserve them, make them open access, and provides a nice link that they can send out to colleagues. It is also indexed by Google, so readily discoverable for anyone searching for your publications. I can’t speak highly enough for institutional repositories around the world – they are by far the best way to promote your research.

This then leaves us with the ‘badies’ of the online scholarly identity world – ResearchGate and Adademia.edu. I must admit that each of these have their place in the scholarly identity world, and all three of them are wildly popular in various disciplines. My reasons for disliking them are purely selfish – they are a rival to our institutional repository. And in this day and age where everyone is time poor, why invest your energies in keeping these up to date as well as the other critical profiles. ResearchGate and Academica.edu are both proprietary products that could turn off at any time. They have no preservation strategy and no commitment to keeping your work safe. If you wish to use them to disseminate your works, then please, please, please do so as an additional method to those listed above.

So now that you have all methods for people to find information about you and your works, what do you do? The key is to spend some time each week, month, couple of months (depending on your frequency of publication) updating them. As stated above, there is nothing worse than colleagues or other interested people finding an out of date profile. By keeping these updated, you are presenting your best self to those that want to find you.