Library: Archiving and sharing data: Choosing an archive

Benefits of archiving data

Archiving research data means submitting it to a data centre, archive or repository where it will be protected in the long term against loss, deterioration, unauthorised or inappropriate access, and future incompatibility. Archiving is a necessary first step towards data sharing, but it is still important to archive data even if you do not plan to share them with others.

Click on this box to read about the benefits of archiving data

Benefits of archiving your data include:

Your research data will be stored safely and securely in the long-term: you can't keep your data on the University-Managed servers indefinitely.
When you archive your data in a data archive or repository you will get a persistent identifier, such as a Digitial Object Identifier (DOI) for your dataset which means that you can share your data in a robust way. Your data will also be discoverable in a variety of search engines.
By archiving your data you will be complying with University and funder data policies.
Once your data are uploaded to a data archive or repository the administrators of that archive are responsible for managing your data and can manage access to your data on your behalf, if you would like them to.

Choosing which data to archive

It might seem safest to keep all of the data that you generate during the course of your project, but if you do this you may end up with problems. For example, temporary and intermediate processing files can clutter up your file system and get in the way of important data by making it harder to find the files that you actually want to use. Additionally, without robust version control, you might end up using older versions of files by mistake. Additionally, if you are generating large quantities of data you run the risk of exceeding the limits of your storage devices. There can be substantial costs associated with buying additional space, so look carefully to see if you need to keep all of your files or whether you can delete some of them.

In general you should keep:

All data underpinning publications
Data that cannot be easily reproduced, or would be too expensive to reproduce
Data that are of potential future importance to your research field
Data that are re-used regularly by your group or in your field
Data that must be stored for policy, legal or contractual reason

The table in our Weeding Data guide provides some examples of data that you might choose to keep and which you might choose to delete.

Guidance on choosing a data archive (repository)

The only funder that stipulates where data must be archived at the end of a project is the Natural Environment Research Council (NERC). All other funders and journal publishers allow the use of an institutional archive such as the University of Bath Research Data Archive for archiving and sharing data. In addition to allowing institutional data archives to be used they also provide guidance on suitable data archives.

We have a guide on finding and reusing research data which provides a list of all of the major data archives that are recommended within these links.

BBSRC list of suitable respositories
Whist the BBSRC does not specify any particular data archive they do provide a list of data repositories for their grant holders.
CRUK data sharing guidance for clinical researchers
Initiatives and repositories to support clinical researchers with data management and sharing.
CRUK data sharing guidance for Population Research Committee researchers
Initiatives and repositories to support population health researchers.
CRUK Discovery Research data sharing guidance
Initiatives and repositories to support discovery research researchers with data management and sharing
European Research Council document on open research data and Data Management Plans
This guidance document contains extensive information about various discipline-specific repositories.
NERC Data Centres
NERC grant holders are required to submit their data to the most appropriate of the NERC Data Centres.
NIH's supported or recommended repositories
NIH table of Data Sharing Policies
Contains a list of recommended repositories for specific data types
UK Data Service ReShare Data Repository
ESRC grant holders are required to submit their data to the UK Data Service's ReShare data repository or an 'appropriate responsible digital repository (which includes institutional archives or repositories)' within three months of the end of their grant.
Wellcome Trust approved repositories
List of approved repositories for data from Wellcome Trust funded projects.
PLOS recommended archives
Recommended archives / repositories for PLOS journals. It is also acceptable to use an institutional data archive or repository.
Scientific Data (Nature) list of recommended data repositories

If you are considering using an interdisciplinary data archive / data repository to preserve and share your research data our recommendation is that you use our institutional data archive, the University of Bath Research Data Archive. This is free to use (up to 1TB) for University of Bath staff and students and you are fully supported through the process of depositing datasets by expert Research Data Librarians.

Major funders and journal publishers have recommended the following interdisciplinary archives and we have made our own recommendations for use of these data archives through the use of the upload icon icon next to the archive name in the list below. These are archives that provide a persistent identifier for datasets and that provide open access to datasets.

University of Bath Research Data Archive
Institutional data archive for the University of Bath. Free to use for all University of Bath researchers.
Dryad repository
General data repository used mainly for life sciences data.
figshare
General data repository for a wide range of research outputs including datasets.
Zenodo
Data archive specialising in archiving and sharing snapshots of code that can be directly ingested from GitHub.

We have extensive guidance on finding discipline-specific data archives in our 'Finding and reusing research datasets' guide. The links below will take you directly to discipline-specific guides to finding suitable data archives.

If you are planning to preserve your data in an external archive the following features are indicators of a reliable and good quality data archive or repository:

Subject Focus	The subject focus of the archive is suitable for your dataset
Reputation	The archive has a good reputation and is recommended by your funder or journal
Metadata	The archive requires you to enter detailed information about your dataset and upload documentation
Persistent identifier	The archive will issue you with a Digital Object Identifier (DOI) or accession number for your dataset
Access restrictions	The archive allows you to embargo or restrict access to your dataset if you need to for confidentiality purposes
Intellectual Property	Avoid using archives that require you to transfer rights to the data
Licences	There are a range of licences for your data that comply with the University's Research Data Policy
Funding	The archive is well funded and is likely to still be in operation in 10 years

For more guidance, see the Digital Curation Centre's 'Where to keep your research data' checklist (external website).

For advice on the suitability of a given archive contact the Library's Research Data Service.

Each archive will have it's own processes for deposting data.

Once you have deposited your data, you should create or update the record for the dataset in Pure. In the section 'Data availability' provide the name of the archive you used as a publisher, and if your dataset has been assigned a DOI, enter it in the appropriate place.

You can link records held in the University of Bath Research Data Archive to those held in external archives, if they are related to each other, or are from the same project.

Archiving collaborative data

If you are collaborating with other within the University it is possible for you all to be involved in the archival process. If you are working with external collaborators, we recommend that the lead organisation should take responsibility for co-ordinating data archiving, either in a single repository, or in multiple repositories where the data records can be linked together.

The lead researcher should register the datasets in Pure.
Let us know the University of Bath usernames for the collaborators who should have access to the dataset record, and when we set up the data record in the University of Bath Research Data Archive, we will ensure that you can all edit the information and upload files.

Archiving non-digital data

Just as with digital data, you must register any non-digital data that underlie your published findings in Pure. The same principles for selecting which non-digital data to archive apply to non-digital data as to digital data. Where you have both non-digital and digital versions of data you should normally retain the non-digital original as the version of record. If, however, you have digitised your data according to documented procedure, performed systematic quality control. and can back this up with a log of who did what and when, you can retain the digital copy and dispose of the non-digital original.

A limited amount of space is available in the University Records Centre for storing non-digital data. When depositing materials, you will need to pack them in an archival standard box or boxes and sign a records transfer form for each box. For more information contact the University Records Manager.

When registering non-digital data in Pure, under 'Data availability' fill out the section marked 'locally-held data'.