Friday, 11 April 2014

The University Repository Project - the What and Why

This entry outlines the purpose and benefits of the University Repository project.

The key systems


Historically, one of TULIP’s functions has been to act as the University’s publications database, holding bibliographic details of all publications / outputs of research by university staff.

Bibliographic data held in TULIP are used for purposes such as REF submissions and providing details of publications for personal web pages and the PDR system.

The TULIP publications database was designed as an internal-facing system, rather than as a system to provide bibliographic details to users external to the university, and does not enforce any specific format for storing bibliographic details.

The Repository

Institutional repositories are online databases of publications, and can include metadata-only records (i.e. just the information about research publications) and metadata-and-full-text records.

The University Repository runs on the EPrints software, developed at the University of Southampton. It is the most widely-used such software in the UK.

The Repository is specifically designed to host and make publicly available bibliographic data in a properly structured fashion. It can also help make the outputs of university research more widely available by hosting the full text of outputs.


Interplay between the two systems

The University requires a publications database, a single source of data on its research outputs (currently provided by TULIP). It also needs to be able to comply with the open access requirements of the next REF (for which it will need an institutional repository).

It is inefficient to require staff to enter and maintain publications details in two separate systems. As such, the repository project aims to integrate the two systems.

The purpose of the project

The project will see the research publications data currently held in TULIP transferred into the Repository, so that the Repository will have all the publications details of the University’s outputs. By the project’s conclusion, there will therefore only be one system for both holding details of publication outputs and, where desired/necessary, the full text of outputs.

Bibliographic data in the Repository will still be available to be called by TULIP for re-use in other systems, such as the PDR system, personal web pages, and so on.

This has the following advantages:

  •  staff need only enter publication details into one system, not two;

  • the system they’ll be using is designed to meet the new requirements of an externally facing repository;

  • the new system will continue to feed those other systems (PDR pages, personal web pages, etc.) already in use;

  • the new system will allow for full text of outputs to be stored on a central University system and, where necessary, made available on an open access basis.

How things will be after switchover

Once publications details from TULIP have been imported into the Repository, all those details will be held in what is called the Review Buffer within the Repository. This means that the data are available for internal viewing by logged in staff, but will not be available on the Repository’s public interface. The data will still be available to other internal systems for PDR, web pages and other purposes outlined above. Staff can then choose if they wish to make their records, or certain ones of their records, publicly available through the public repository interface.

A word on data quality

As already noted, TULIP was not designed to impose full bibliographic standards on data collection.

One example of this is the Author field, which was collected as a single free-format text string, rather than a list of individual names (as in the new Repository). Although most records have been split into individual names for the data transfer process, in some cases names such as “de Bolins” or “van Houten” may not be migrated correctly; et al may be given as an actual author name; and so on.

Whilst starting with fully cleaned data would be ideal, this would require significant manual intervention which would therefore add a considerable delay to the project, so we have chosen to prioritise the release of the new system.

Into the future

After switchover, the Repository will become the University publications database, and bibliographic data on staff outputs will need to be entered into it. This can be done manually or by simple import from other sources of bibliographic data, such as lists of DOIs, PubMed IDs, and other bibliographic data sources. As outlined earlier, other systems that use this data (e.g. profile pages) will continue to work as before.

No comments:

Post a Comment