How to archive your data

Posted in: GUIDE, IN DEPTH

Do you ever wonder whether archiving your data is worth the effort?  Archiving your data for the first time might seem like a big task.  But organising and preserving your data in a purpose-built archive can save headaches later.

Why should I archive my data?

Data archives perform two functions.  They keep your data safe for the long term and provide a platform to share your data.

During a project, data are usually stored in a reliable place, where everyone can find it.  But at the end of a project, researchers leave and new projects take over.  Data archives give you a place to put your data where you can find them in the future.  Adding documentation to the dataset ensures you will be able to understand them.

Many funders now expect you to share your data where possible.  It is good practice to provide data to support a publication so readers can verify your results.  You can either cite the dataset, or provide a data access statement.  In both cases, the reference should identify the dataset and explain where to access it.  Archiving the data in a dedicated system gives you a unique identifier.  This is usually a Digital Object Identifier (DOI) which also acts as a link to the data.

For many researchers, getting a DOI is the reason they archive their data.  Planning to archive your data can save a rush to make it available publishing deadlines.

Finding an archive

There are two main options for archiving your data:

Subject archives

Some research areas have well-established subject archives for data.  It is best to check whether an archive exists for your type of data.  These archives will usually collect detailed information about the data.  This makes it easier to find and combine similar data.

Generic archives

Universities and other organisations run generic archives to enable researchers to share their results.  The Library runs the Research Data Archive for University of Bath researchers. You can read more about how the Archive is being used in a sister post to this post.

Read more: Selecting an archive

Using the Research Data Archive

To use the Research Data Archive, create a dataset record in Pure.

This can be basic - just title, creators and year.  Funding information is also helpful at this stage.

We then import your record into the Research Data Archive.  At this point, we tell you what the DOI will be so you can include it in your paper.

Preparing your data

When you prepare your data, you will need to consider two things.

Format

Save your data in open, standard formats if possible.  This makes files accessible to more people and can ensure the data remain usable in the long term.

Structure

If you have many files it's best to arrange these in a folder structure on your computer.  Either create a .zip or .tar.gz file of the data and upload this to the archive.  This will make it easier to download.

Read more: Organising your data

Describing your data

To make your dataset useful, you need to describe it.  Some description helps users find your data.  The rest helps them understand your data.

Title

Give your dataset a clear, descriptive title.  If it supports a specific paper, using "Dataset for [title of paper]" is effective.  Otherwise try to describe the data in a few words, e.g. ENLITEN household dynamic study datasets.

Abstract and Lay Summary

You should provide an abstract to provide a description of the data.  This should cover the main aspects of the data so a visitor can decide whether it is relevant.  You can write this for an expert audience, but make sure to explain acronyms that are not standard.

If you expect your data to be interesting to the general public, consider writing a lay summary.  This should explain your data in plain language, and help a user to decide whether to explore the data.

Documentation

You need to write documentation explaining what the data are and how you created them.  This doesn't have to be long, but should cover topics such as:

  • collection method
  • equipment used
  • software needed to open or interpret files
  • column headers and definitions
  • units of measurement (if not included in column headers)

Read more: Describing your data

Detailed metadata

For some data, it is useful to include details such as geolocation or dates of collection.  Do include this information if it is important to your dataset.

You'll also want to include information about accessing the data.  This is important if they are not open access.  Describe access conditions if users need to make special arrangements to get the data.  Add a licence to help end users make decisions about whether to use your data in their study.

You should let us know by email if there are restrictions on your data.  We can record this so these will be clear for future decisions.

Read more: Archiving your data

Going public

A DOI needs to link to a landing page, even if the data files are not downloadable.  You have several options for timing the release of data:

  • Make the data available and activate the DOI straight away
  • Make the metadata available and activate the DOI straight away, but embargo the files
  • Wait until the paper is online, then make the data available and activate the DOI

The third option is most common, but the option you choose depends on journal policies.  The Library Research Data team can help you select the most appropriate choice.

Get started today

Archiving your data can be quick if you've prepared the files and documentation.  Here are our top five tips:

  1. Link to funding information in Pure if possible.  If not, email research-data@bath.ac.uk with details or confirmation that it is not externally funded.  We can predict your DOI within minutes when you provide this information up front.
  2. Have your data files ready to upload.  A zip file with a folder structure is usually the easiest format for larger datasets.
  3. If you are working with collaborators, make sure you know who has which data.  Arrange access for collaborators or agree one person to upload data for the group.
  4. Write documentation when you collect the data.  Use a template or headings to ensure you don't forget details.
  5. Decide when you want to make the dataset available.  Communicate this to the Archive team.

If in doubt, email research-data@bath.ac.uk and we’ll be happy to help!

 

Written by: Lizz Jennings

Posted in: GUIDE, IN DEPTH

Respond

  • (we won't publish this)

Write a response