Skip to content


Object stores

Kitchen ShelvesAlthough my involvement in Research360 is at the level where technology and people interact, I’m also doing my best to understand how our infrastructure is developing at a much lower level so that I’m in a position to better advise non-technical stakeholders.

Bath University Computing Services (BUCS) are currently in the process of procuring a new file store which works in a very different way to our existing storage systems, and I recently had the opportunity to learn more about it from our Database & Systems Manager, Paul Jordan. Since this is a very new area for me, my apologies to you and him for anything that I’ve got wrong.

Like our existing storage, this will be arranged into tiers, with Tier 1 containing the most expensive storage with the quickest access times, and lower tiers providing slower but cheaper storage. Data will be moved between tiers automatically (and invisibly to users) based on configured policies.

Where this new storage differs from our existing systems is that the lowest tier will not be a tape carousel, but an “object store”. Where traditional a file system stores data in an ordered, hierarchical way, an object store stores individual data objects in a flat namespace.

The major advantage of this is that much more of the available space on the physical disks can be used to store actual user data: the the overhead is much lower than for traditional filesystems. By virtualising storage across a network in a new way, it’s also very much more scalable than anything we currently use — we could easily grow this to the petabyte level or expand out into the cloud if need be.

Now, most users need never know that their data is stored in an object store, just like they don’t need to know whether the disks were made by Hitachi or Western Digital. An extra layer on top does some translation, allowing you to store files over the network just like any other networked attach storage (NAS). Users can access it via a mapped drive in Windows or an NFS mount .

However the object store is also accessible directly via a RESTful API over HTTP/HTTPS (in fact, that’s how the NAS layer interacts with it too). Despite being sold as a replacement for tape archival, it’s very quick to access over the network, and authentication of users via LDAP or Active Directory is also built in. In addition to this, an object store can perform other clever functions during or after ingestion, such as transforming data into other formats or making use of metadata.

It therefore seems like the perfect back-end to a digital repository such as EPrints, DSpace or Fedora. A load of overhead could be cut down by having the repository target the object store directly, rather than doing so via files on a virtual file system using the NAS layer.

Alternatively, if the object store itself is clever enough, it could be used directly as a repository, using only a very thin user interface on top. A SWORD2-compliant interface would open up even more options.

If you’re interested in learning more, there are a number of white papers and other resources available on the Hitachi Content Platform web page.

Are other institutions implementing similar types of storage? Is it possible to integrate a repository with an object store directly via HTTP and if so has it been done?

It would be interesting to hear from anyone else who’s come across anything similar.

Image credit: Kitchen Shelves by John Martinez Pavliga

Posted in General.

Tagged with , , , , .


South-west meetup

On Wednesday 1 February, we met up with representatives of three other universities in the south-west area to discuss and find common ground on our JISC Managing Research Data projects. Represented were:

Each institution has its own unique set of requirements, and one of the first things we discovered was how well our projects complemented each other. Research360 is focusing on Science and Engineering, data.bris on Arts and Humanities and UWE’s project on Health and Life Sciences; OpenExeter is further down the road than the rest of us, and is rolling out data management across the University of Exeter.

As well as these differences, we also picked out many areas of commonality in which we can work together.

Training

We identified some potential for linking up for shared train-the-trainer events to help our support staff to get up to speed. The data management agenda implies new skills to be acquired right across our institutions, from researchers and research students through to IT supporters and librarians.

Engagement and advocacy

It was noted that “advocacy” shades into “training” quite subtly — especially as many people feel that a need for training implies that they can’t do their job properly. There’s particularly a need to minimise the need for training by integrating data management processes with the research workflow as transparently as possible.

Bristol have a champion in the Faculty of Arts office who is very good at spotting and passing on bids and other queries which relate to data management — this sounds like a useful approach.

There are differing opinions (in the sector generally) about who should have responsibility for data management advice and support. In the long term new staff will need to be recruited, but in the short term it’s about up-skilling existing staff appropriately. The danger here is rising demand for support may outstrip supply, and we’ll all be working hard to manage expectations and ensure this doesn’t happen.

Repositories

We all have an institutional repository of some sort, mostly for publications, and are keen to develop digital repositories for data too. Both UWE and Bath have EPrints-based repositories and are evaluating whether EPrints will be suitable for data as well.

Research information management

As well as developing repositories for data, Bath, Bristol and Exeter are currently implementing Current Research Information Systems to aid centralised monitoring of research outcomes (especially important for REF2014). Bath and Bristol are using Pure, with Exeter already having established Symplectic — we’re all interested in ways to incorporate information about research data into these systems.

Policy

Discussing policy development is tricky, as it can directly affect competitiveness. Nonetheless, it’s clear that some collaboration can be profitable so we’ll be looking at ways that we can do this appropriately.

We’re also all planning to send representatives to the upcoming policy workshop in Leeds.

Requirements gathering

We’re all making use of various structured tools, such as DAF and CARDIO, so will be able to share information about how well these tools work for us, along with the general impressions about the results they bring us.

Conclusion

We all went away from the meeting with a lot to think about and a few interesting ideas, so stay tuned for more there. In addition to this post, there are also blog posts from Exeter and UWE for you to take a look at.

Many thanks to everyone who made the meeting worthwhile by contributing, and to Exeter for agreeing to host another in a few months time.

Posted in General.


Progress update: February 2012

Progress update: February 2012

On Thursday 2 February 2012, we got together for a project team meeting, so leading on from that, here’s a brief progress report:

  • As we’re starting to get a number of requests for help specifically related to research data management, we agreed to set up a queue on RT (Request Tracker, our support ticket tracking system) to deal with these; this will give users a single point of contact and allow us to measure the volume and type of incoming queries and our capacity to deal with them;
  • The university’s data management web page has been updated and tweaked to provide a more useful experience until we have fully redeveloped that area of the website;
  • Project start up (Work Package 1) is now complete;
  • We are currently recruiting participants for a CARDIO survey to assess perceptions of our current data management infrastructure and capacity; it’s likely we’ll be following this up by interviewing selected respondents; (WP2 Requirements analysis)
  • Work has begun on our Roadmap for submission to EPSRC; (WP3.1 Implementation plan/roadmap)
  • Neil Beagrie is making good progress with interviewing stakeholders for the business case; (WP3.2 Sustainability and business model)
  • We will be running a hybrid training session and focus group on data management planning (DMP) with second-year DTC students in mid February; we are also planning how we can get feedback on DMPonline from students in the Centre for Digital Entertainment, given that they are based out in industry with their placement partners; (WP3.3 Data management planning/WP6 Liaison, training & advocacy)
  • A rough draft of our high-level data management policy has been produced and is now in the process of being refined; Cathy Pink will be attending the forthcoming workshop on policy development; (WP4 Policy development)
  • We are getting closer to appointing the systems developer who will help us develop and pilot an interface between our VRE, iSusLab, and a pilot repository via SWORD2; in the meantime, we will be making progress by planning how we will pilot Electronic Lab Notebooks (ELNs); (WP5.2 Research workflow & data deposit)
  • Work has begun on our data storage guidelines; (WP5.3 Data storage guidelines)
  • After a very valuable meeting in January, at which we learned more about CERIF and Pure (the Current Research Information System currently being implemented at Bath), we’ve clarified Deliverable 5.4 — more on this to follow.

Posted in Progress updates.

Tagged with , , , , .


VALA2012: libraries and technology down under

Liz Lyon gave a keynote speech on Wednesday 8 February 2012, entitled “The Informatics Transform: Re-engineering Libraries for the Data Decade“, at the VALA2012 conference in Melbourne, Australia. The talk focused on the transformations required for libraries to keep up with digital trends, and drew on Liz’s own experience for exemplars, including the University of Bath and the Research360 project.

VALA – Libraries, Technology and the Future Inc. (VALA) is “an Australian not-for-profit professional organisation that promotes the use and understanding of information and communication technologies across the galleries, libraries, archives and museum sectors.” (via VALA on Wikipedia)

Posted in Events.

Tagged with , , , .


Introducing… Catherine Pink, Institutional Data Scientist

Our new Institutional Data Scientist, Catherine Pink, started yesterday. She’ll be contributing more to the blog soon, but for now, here’s a short introduction:

I’m Catherine Pink and I’m the new Data Scientist for the Research360 project. I have a broad role that will primarily involve acting as an interface with and between the existing support services and research practitioners. I will be working with researchers to identify persistent descriptive identifers for their work and provide them with technical support. I will also be evaluating current data managing tools and determining how applicable they are to the research performed here at Bath.

Prior to this role, I have been both an undergraduate and postgraduate student at the University of Bath and am about to complete a bioinformatics based PhD in Evolutionary Genetics in the Department of Biology & Biochemistry. I also have over 4 years experience in commercial research working in fungicide development.

Posted in Administration.

Tagged with , , .


Vacancy: Systems Developer (Research Data Management)

We are looking to recruit a Systems Developer to develop an interface between our Virtual Research Environment (Sakai) and a SWORD2-compliant digital repository, as well as to provide general development and support of our research data infrastructure.

The post is part-time and fixed-term. For more information and to apply, please see our jobs site:

Systems Developer – Research Data Management

The closing date for this post is Tuesday 10 January 2012.

Posted in Vacancies.

Tagged with , , , .


Doctoral Training Centres as catalysts for research data management

As I’ve already mentioned, the focal point for a lot of the pilot work in Research360 is the Doctoral Training Centre (DTC) in Sustainable Chemical Technologies. This presents us with some interesting opportunities and challenges, and is well worth investigating: DTCs (or CDTs depending on the research council) are fast becoming the norm for PhD funding in these straitened times.

A Doctoral Training Centre is typically formed by the award of a large grant to a university or consortium to strengthen a particular area of research excellence. This funding covers the cost of training a large number of PhD students in cohorts over a period of several years including the infrastructure and administration requirements. The expectation is that other funding sources will be found and the centre will become self-sustaining.

As an example, our DTC in SCT is funded by the EPSRC for 5 cohorts of 10-15 students. It’s a 4-year integrated PhD course, which begins with an MRes year followed by 3 years of more conventional PhD research. The MRes year involves a lot of generic and specialist training alongside two short research projects, one of which may lead into the main PhD project.

In line with the current culture in science, our research is strongly interdisciplinary, involving both chemists and chemical engineers, along with biologists, mathematicians, mechanical engineers and others.

Funding doctoral training in this way has a number of benefits. Because we recruit in cohorts which all start together at the beginning of the academic year, we have groups of students who all work and train together. Not only is this more efficient in terms of the training courses that we provide, it also means that the students are able to support each other through the challenging transition from undergraduate to professional scientist.

For Research360, this is great, as we’re able to provide RDM training to a whole cohort together, and they can (and will, in my experience) support each other as they develop data management plans at the same point in the PhD process.

Because Doctoral Training Centres are well funded and provide a consistent source of high quality students to work on projects, academics are motivated to engage with them. And because DTCs are typically highly interdisciplinary, these academics and their collaborators will be spread right across the institution, or a whole consortium. This gives us many opportunities to roll out good data management practice organically institution-wide. If our researchers in the centre routinely practice good RDM, they will expect it of their collaborators elsewhere in the University.

Furthermore much of the administration, such as project proposals and transfer reports, still gets processed through the graduate schools in the Faculties of Science and Engineering & Design. By having our students include data management plans in this documentation, we can get it onto the radar of the graduate schools from below as well as above.

This approach is not without its problems however. Many of the features which make the DTC model work so well also mean that they are generally not representative of the institution as a whole. There are still many PhD students funded by specific research projects or institutional studentships who may have begun their studies at any point in the academic year, and training developed for DTC students may not be as appropriate for these.

Whatever the advantages and disadvantages, I look forward to exploring the potential of using our doctoral training centre as a catalyst to improve data management here at the University of Bath. It would be great to hear from anyone facing similar issues.

Posted in Training.

Tagged with , , .


RDM training for Postgraduates and Doctoral Training Centres

Doctoral Training Centre studentsCatching young researchers early looks to be a key strategy for instilling good data management habits, before they start to make bad ones. This is particularly relevant for Research360, as we’re focusing a lot of our pilot implementation effort on the Doctoral Training Centre in Sustainable Chemical Technologies.

I’m aware that there will be a new funding call going out specifically aimed at developing training materials, but many of the projects already under way have training aspects. At the recent programme launch meeting there was a lot of interest in a breakout session to discuss data management training for postgraduate research students (PGRs), and around Doctoral Training Centres (DTCs). Sadly, there wasn’t enough time for every possible breakout session to happen, so I’m getting the ball rolling now.

If this is an issue that’s relevant to you, whether you’re working on a current JISC project or not, please leave us a comment below (your email address won’t be shown publicly). If we can start a conversation here, we can look at ways to coordinate our efforts perhaps in the new year.

Posted in Training.

Tagged with , .


JISC MRD project blogs

If you’ve just found this blog, you may not be aware that the Research360 project is part of JISC’s wider Managing Research Data programme. Blogging has recently been introduced as a core reporting channel for JISC-funded projects, and so all of these projects have, or will soon, set up their own blogs. For convenience, I’ve set up a Google Reader bundle which lets you read or subscribe to all of these blogs in one go:

If you notice a blog which is missing from this bundle, please drop me a line via email.

Posted in General.

Tagged with , , , .


MRD Programme Launch

I attended the JISC MRD Programme Launch meeting in Nottingham on the 1st and 2nd December with Jez Cope, also from Research360. The event was very well organised and led by Simon Hodson and an interesting experience for me on my first JISC funded project. Bath was well represented as we realised that Brian Kelly and Mansur Darlington were also there. Brian gave a presentation on the art of blogging and Mansur was there to represent the REDmMED project.

The presentations and workshops were wide ranging, including details on applying DCC Tools such as DAF, DMP and CARDIO and UMF tools and demos. I found the workshop facilitated by Neil Beagrie to be very thought provoking with a focus on the Benefits and Metrics that we should attach to our projects (maybe that’s with my project management head on). Jez, Mansur and I deecided that this would be a good exercise to have once we are back with the project team. The second day of parallel workshops saw Jez attend the “Identifying and supporting researcher requirements” and I got into the depths of “policy development” as this is one of the requirements on Research360.

The event was very valuable in providing the chance to meet people from the other projects in the Programme and we have arranged for further contact with both regional Universities and those with relevant aspects such as collaboration with industry. Altogether the Launch has set the tone and connected people in a very successful way. Many thanks to Simon and the team!

Posted in Events.