Managing data across the institutional research lifecycle

Tagged: Training

Closing workshop poster: sustainability of project outputs

  , , , , , ,

📥  Outreach

Our poster, "Research360: Sustainability of Project Outputs", is now available to download from the University of Bath publication repository. This poster was presented at the JISCMRD programme's final workshop in Birmingham on 25–26 March 2013.

The poster gives an overview of the major strands of work in the Research360 project (Roadmap & Business Case, RDM Policy & Policy Guidelines, RDM Website & Researcher Support, RDM Training Workshops). It describes the outcomes in each area, along with the continuing work which will support and further develop these outcomes going forward.

8th International Digital Curation Conference 2013

  , , , , , ,

📥  Outreach

From Monday 14 to Wednesday 16 January 2013, Cathy and I attended the 8th International Digital Curation Conference (IDCC13) in Amsterdam, Netherlands.

On Monday morning, we ran a workshop, jointly with Hannah Lloyd-Jones from Open Exeter, entitled "Designing Data Management Training Resources: Tools for the provision of interactive research data management workshops". This offered participants the opportunity to learn more about the way we run our face-to-face data management training with staff and students, and to experience first-hand some of the interactive exercises we use.

These included Open Exeter's "Research Data Dating" exercise, in which two concentric circles of participants have 3 minutes to describe their research data to each other before moving on to the next person. I demonstrated how we use "clickers" (audience response systems) to survey our students during face-to-face workshops, and we also had a discussion of the pros and cons of the data management plan (DMP) templates that we've used with students.

For more info, take a look at Marieke Guy's blog post, IDCC13: Exemplar RDM Training Exercises and Jill Evans's summary of tweets from the workshop. For those who are interested, we used PollEverywhere, a website which permits voting on questions via SMS or the web on a smartphone or laptop.

Update: You can now download Hannah's slides: Designing Data Management Training Resources: IDCC 2013 Workshop

Cathy Pink presented a practice paper entitled "Meeting the Data Management Compliance Challenge: Funder Expectations and Institutional Reality". The paper drew together lessons learnt by the Research360 project, based on its experience in meeting the varied data management needs of both an institution and its external stakeholders. The text of Cathy's paper will be available online soon.

Liz Lyon facilitated the "What is a data scientist?" symposium, an interactive panel discussion around roles and skills required to cope with the growing importance of data in scholarship. See Marieke's blog post, "IDCC13: What’s in a name? The ‘data scientist’ symposium" for more details.

Finally, we also presented a poster, "Creating an Online Training Module on RDM", about the process of developing our research data management e-learning module for postgraduate students. The poster was designed and written by our colleague Marieke Guy from UKOLN, and I gave a "minute madness" presentation summarising it as well.

There is also a searchable archive of tweets with the #idcc13 hashtag.

JISC MRD progress workshop slides & poster

  , , , ,

📥  Events

On Wednesday 24 and Thursday 25 October 2012, Cathy and I attended the JISC MRD programme progress workshop in Nottingham. The workshop was an excellent opportunity to share the lessons we're learning on the Research360 project and learn from colleagues working on research data management elsewhere, including a number outside the JISC-funded MRD programme.

Our slides and poster from the progress workshop are now available through our institutional repository:

University of Bath Research Data web pages go live

  , , , ,

📥  Progress updates

I'm pleased to announce that our redeveloped Research Data web pages have now been published.

We hope that this resource will be useful for researchers and support staff both within and beyond the University.

They will continue to be added to and updated as time goes on, particularly over the next 6 months or so as we finalise the University's policy on research data.

For those who are interested in such things, we've taken a conscious decision to keep the content as brief as possible and signpost visitors to good advice held elsewhere, by organisations such as the Digital Curation Centre.

There are two ways into the content:

  • The left-hand navigation bar, which appears on each page, is organised logically into categories. In addition, the navigation bar also includes an email link to our main point of contact for research data, to encourage a) people to seek personalised advice; and b) feedback to improve the site further.
  • The home page is intended to signpost useful resources to help with particular activities and situations, broadly divided into stages of the research lifecycle.

Many of the pages have boxouts with crosslinks pointing to other areas of the site with related content.

Any feedback would be most welcome: contact me at

Research data management training take 2

  , , , ,

📥  Training

On Thursday 28 June, Cathy and I ran the latest of our postgraduate workshops on research data management.

The session was structured similarly to our last workshop, though without the extended hands-on section on data management planning. The loss of this section was down to time pressures (we had an hour less this time).

We started by showing a series of statements about research data management, such as "I am satisfied that my data is safe", and asking the participants to rate (anonymously using clickers) how much they agreed or disagreed with each statement. The answers gave us an opportunity to start some discussions and get a picture of what the current level of knowledge was.

Cathy then gave a more formal presentation (slides available here) covering the major aspects of research data management with me pitching in on more technical bits.

We finished up by revisiting the statements from the start of the session to see how opinions had changed, and handing out some leaflets from the DCC.


Feedback from the attendees was overwhelmingly positive:

  • 95% of respondents were satisfied with the course;
  • 91% would recommend the course to others;
  • 95% found it relevant to their needs.

Some interesting answers to the question "What was most useful?" include:

  • "Need to plan for good management"
  • "Reminding me of how easy it would be to lose my data and how I should look after it better"
  • "Notes on data cycle, info on making data public"
  • "Knowing where to go for help, helping to focus my data management plan"

Actions that participants said they would take as a result of the workshop fell mostly into two categories:

  • "Prepare a management plan"; and
  • "Back up my data"

Although we don't know whether anyone carried out their actions, this was very encouraging to read, as our message had clearly got through.

Room for improvement

One participant felt that there was a bias towards science. This is understandable, since both Cathy and I have science backgrounds, and science/engineering is the main focus of Research360, but we'll see what we can do to rectify this.

Another comment referred to "lack of more contemporary ways of storing data, e.g. Dropbox". We'd intentionally steered clear of Dropbox, as the official University stance on cloud storage is still being decided. Whatever that decision turns out to be, we'll need to deal with Dropbox and other cloud tools.

There was a request for more group discussion, and I think this would be a valuable addition, so we'll try to make the next session a bit more interactive. Group discussions could usefully focus on differences and similarities between disciplines, for example, as I think people in different subjects would have quite a lot of pre-existing knowledge that they could share.

I'd also like to give the participants something concrete to do after they've left the workshop. This could be something specific like "write a data management plan", but I think there would be more likelihood of these actions being carried out if the participants take some ownership. One way to achieve this would be to wrap up the session with an action-planning section and ask each student to define their own data management goal or goals.

Reaching out to staff

Although the session was also open to research staff, only one staff member registered to attend and in the end they didn't turn up.

Helping busy research staff to gain new skills is a difficult task. They have many demands on their time, pulling in many different directions, and many already work far more than their contracted hours.

We aim to deal with this in a number of ways:

  • Providing an e-learning module which researchers can study in their own time at their own pace;
  • Developing a website with concise and practical guidance, structured around specific tasks and situations;
  • Reaching out in a variety of different ways, including:
    • A single point of contact email address for all inquiries relating to research data — this is tied into the university request tracker system, so an individual requests can be passed on to the team best placed to help;
    • Presentations to Deans and Heads of Department by Professor Matthew Davidson, chair of the Research Data Steering Group and Associate Dean (Research) for the Faculty of Science, as well as an active researcher himself;

We're also in discussions with the professional services around the University to understand (and help them understand) how research data management fits into their roles and how we can provide the support they need.

Postgraduate DMP template first draft

  , , ,

📥  Training

A lot of people have asked for this to be made available, so here it is: Data Management Plan for PGRs v0.2. It's also available as a PDF.

This template is licensed under a Creative Commons Attribution 3.0 Unported License.

We welcome comments and suggestions for improvements, and we'd love to hear from anyone who finds it useful. I'll be updating it soon based on feedback from our students.

Research Data Management 101 — Data Management Planning

  , , , , , ,

📥  Training

A few weeks ago, we got together some of our students from the Doctoral Training Centre in Sustainable Chemical Technologies to run a pilot training session on data management. As part of that, we asked them to trial a selection of data management plan (DMP) templates.

We split the students into smaller groups, and assigned each group a different template. We then allowed them about an hour to complete as much as they could, while hovering around to answer questions and make note of the discussions the students were having. After this, we then used an audience response system (ARS or "clickers") to gather feedback on how useful the templates were, using the votes from the clickers as a starting point for discussion.

The templates


We used the "GenInst" template in DMPonline as an example of an institutional template. As this template is aimed at Principle Investigators, students were told to skip any questions that didn't seem relevant.

The students were immediately put off by the amount of detail they were asked to input, though on a positive note, they definitely felt that this was the most comprehensive template! None of the students using this template got anywhere near to finishing it.

The students reported that very little of what they were being asked felt relevant to them, and that for at least some of the questions it was difficult to understand what they were being asked for.

DataTrain post-graduate DMP form

This is a single-page template developed as part of the DataTrain project and now available via the Archaeology Data Service.

This was found to be the quickest and easiest template to fill in, and all of the students attempting this one completed it fully. However, not all of the students felt that it was sufficiently comprehensive.

This view is borne out by a review of the completed plans: it seems that the questions as phrased don't bring out issues like backup and security.

Expanded post-graduate DMP form

This was developed specifically for this session as an expanded version of the DataTrain form, as an attempt to provide more structure and elicit more detailed answers.

Although this form took longer to complete, most of the students managed to finish it, and felt that it was comprehensive enough.

However not all of the students found it completely relevant, and some found some of the questions difficult to understand — both of these could probably be improved by some rephrasing.

Twenty questions about your data

This is a set of questions devised by David Shotton of Oxford University. They are arranged under the headings What, Where, How, When, Who and Why, and include examples of possible responses to each question.

These questions were considered to be mostly relevant and easy to understand, and the students had no problem completing them in the time available. The example responses made it easier to understand what was required for each question.

The only real problem was in the ordering of the questions. Because they were arranged under What, Where, etc., the students found it difficult at times to see how the questions related to each other. Perhaps because of this, the students were undecided as to whether it was comprehensive enough.

Update: Now available on the web — David Shotton's Twenty Questions for Research Data (now restructured based on this feedback)


Getting the template right

The number of students trying each template was very small (2–3), so it's difficult to draw concrete conclusions at this stage, but they have given us some hints as to how to proceed.

The DMPonline approach is attractive, because it is easy to access (being web-based) and comprehensive (mapping directly onto the DCC checklist). However, there isn’t currently a template which seems appropriate for PG students — far too much detail is required, and some of the questions that are relevant are phrased in terms that research students don't really understand.

DMPonline is specifically designed to allow custom templates to be added easily, so it should be possible to greatly improve this situation with some work. In particular, it will be necessary to either reword some of the questions or provide some detailed guidance to clarify what each one is asking for — it became apparent from some of the discussion that part of the perceived irrelevance of some questions came from difficulties understanding them.

It would be useful to be able to not only specify which questions are included in a DMPonline template, but also what order the questions appear in so that they better mirror the research workflow and relate to the aspects of data management that students will already have some experience of.

The students fared better with the shorter templates, managing for the most part to complete them. The DataTrain template, seems a good option to fill in as part of an introductory DMP training session, but needs to be augmented with further prompts, though these could perhaps be administered to the students later as their understanding of their project improves.

The structure of the expanded DMP form appeared to aid students in working through all of the questions, with the resulting plans being fairly comprehensive, while the style of the questions in the Twenty Questions template, with example responses given, made it very easy to understand. These strengths could be usefully combined to produce a better template.

Action planning

One thing common to all of these tools is that they focus on recording facts about the researcher’s data. This is valuable, but doesn’t necessarily lead to action — too often, a data management plan is seen as something that is written at the start of a project then filed away.

For PG student training, we are more concerned with students developing the skills for data management rather than having a comprehensive data management audit for a project. It therefore seems that an action planning approach might be worth trying, along these lines:

  • Where am I now?
  • Where do I need to be?
  • What do I need to do to get there?

with the emphasis placed more on the third point than the first two. This will lead to a plan which is much easier to execute, and hopefully encourage the student to review it periodically by making it easy to measure progress against the plan.

Research Data Management 101 — Intro & definitions

  , , ,

📥  Training

On Wednesday 15 February, we ran our first workshop/focus group with PhD students from the Doctoral Training Centre for Sustainable Chemical Technologies. This is the first of a series of posts summarising the outcomes of that event.


We had three aims for this session:

  • To introduce the participants to data management planning and have them start writing their own data management plan (DMP);
  • To better understand their current knowledge so that we can plan future training activities;
  • To get feedback on what DMP template would be appropriate for PGR students.

We ran the session with 10 students in the 2010 cohort, who all started in October 2010 and are currently in the first year of their PhD proper, having completed an MRes in 2011. We also invited the 2009 cohort (in their second PhD year), of whom 3 volunteered.

The session consisted of an introductory presentation, given by Professor Matthew Davidson, followed by a hands-on session during which the students worked through a DMP template with support from myself and Cathy Pink. Our colleagues Kara Jones and Katy Jordan from the library were also present, and made notes on what was discussed.

Data management definitions

Early on in the session, we split the students up into groups of 2–3 and asked them to discuss what they understood by a handful of common data management terms. Here's what they came up with:

There was general consensus (as you might expect from a single-discipline group) that data is information gathered directly by experiment, survey, etc. for the purposes of research. It became clear that with more thought, ‘data’ isn’t a hard-edged concept — processing data can produce new data, metadata is also data and so perhaps are the samples from which experimental data were derived.
Metadata was described as data behind the data you want to use that gives context and background details. It was noted that this is distinct from the data itself. Chemistry is relatively rare in having a strong history of using metadata in the context of depositing crystallographic data.
Secure storage
The students immediately identified the two sides of security: both guaranteeing that data is (and remains) accessible to those who create and use it, and that it cannot be accessed without permission. It was generally agreed that your required level of security depends on how sensitive your data is.
The most important aspect was seen as ensuring access for the researchers who created the data. Raw data was perceived as not being of much interest to third parties, but a need to better preserve and share experimental protocols was identified.
Intellectual property
It was generally accepted that, for PGR students, the university owns their data and the intellectual property therein. We're hoping to clarify this with our legal team soon, as Bath is unusual in leaving ownership of "scholarly outputs" with the originators — it would be useful to know whether we define data as a scholarly output now. Good data management practice was identified as one way to create a 'paper trail' to prove ownership of ideas in the event of a patent dispute.


Katy Jordan made an interesting comment in her notes:

"Listening in, it struck me quite forcibly that this session needs academics from the relevant department(s) to lead it.  A good level of familiarity with the field, its processes, the department itself, and the way research is carried out, is required to make the session meaningful for the students."

It's occurred to me (and others) before that although the core skills of research data management are mostly discipline-independent, there is a strong need to provide "discipline-flavoured" training sessions, with relevant examples and expertise to ensure that the participants can relate to the content.

We'll be following up soon with more posts on the later part of the session, particularly a discussion of the DMP templates the students tried. Watch this space!