In which I add LDAP authentication to the test Sakai installation, and sort out a bizarre issue with the Resources tool: a complete lack of the hooks and buttons needed to add content.
The full post is on my personal blog here.
Managing data across the institutional research lifecycle
In which I add LDAP authentication to the test Sakai installation, and sort out a bizarre issue with the Resources tool: a complete lack of the hooks and buttons needed to add content.
The full post is on my personal blog here.
Posted in General.
– 4 September 2012
In which I finally get the test Sakai installation working, and learn about some apache modules which I have not used before which are needed to work around the problems.
The full post is on my personal blog here.
Posted in General.
– 7 August 2012
In which I work my way through the compilation of Sakai several times, ending up with a limited amount of success and experience a fair amount of frustration. This is where things start to get quite technical.
The full post is on my personal blog here.
Posted in General.
– 1 August 2012
On Thursday 28 June, Cathy and I ran the latest of our postgraduate workshops on research data management.
The session was structured similarly to our last workshop, though without the extended hands-on section on data management planning. The loss of this section was down to time pressures (we had an hour less this time).
We started by showing a series of statements about research data management, such as "I am satisfied that my data is safe", and asking the participants to rate (anonymously using clickers) how much they agreed or disagreed with each statement. The answers gave us an opportunity to start some discussions and get a picture of what the current level of knowledge was.
Cathy then gave a more formal presentation (slides available here) covering the major aspects of research data management with me pitching in on more technical bits.
We finished up by revisiting the statements from the start of the session to see how opinions had changed, and handing out some leaflets from the DCC.
Feedback from the attendees was overwhelmingly positive:
Some interesting answers to the question "What was most useful?" include:
Actions that participants said they would take as a result of the workshop fell mostly into two categories:
Although we don’t know whether anyone carried out their actions, this was very encouraging to read, as our message had clearly got through.
One participant felt that there was a bias towards science. This is understandable, since both Cathy and I have science backgrounds, and science/engineering is the main focus of Research360, but we’ll see what we can do to rectify this.
Another comment referred to "lack of more contemporary ways of storing data, e.g. Dropbox". We’d intentionally steered clear of Dropbox, as the official University stance on cloud storage is still being decided. Whatever that decision turns out to be, we’ll need to deal with Dropbox and other cloud tools.
There was a request for more group discussion, and I think this would be a valuable addition, so we’ll try to make the next session a bit more interactive. Group discussions could usefully focus on differences and similarities between disciplines, for example, as I think people in different subjects would have quite a lot of pre-existing knowledge that they could share.
I’d also like to give the participants something concrete to do after they’ve left the workshop. This could be something specific like "write a data management plan", but I think there would be more likelihood of these actions being carried out if the participants take some ownership. One way to achieve this would be to wrap up the session with an action-planning section and ask each student to define their own data management goal or goals.
Although the session was also open to research staff, only one staff member registered to attend and in the end they didn’t turn up.
Helping busy research staff to gain new skills is a difficult task. They have many demands on their time, pulling in many different directions, and many already work far more than their contracted hours.
We aim to deal with this in a number of ways:
We’re also in discussions with the professional services around the University to understand (and help them understand) how research data management fits into their roles and how we can provide the support they need.
Posted in Training.
– 24 July 2012
The virtual host is now operational, so it is time to start setting up the environment for installing the Sakai CLE (2.8.2) from source. Lots of fairly technical detail as to how I set up java, tomcat, mysql and so on. The full details are available on my personal blog.
Posted in Technology.
– 18 July 2012
A short post in lieu of not yet having a development environment available. The full content is available on my personal blog.
Posted in Technology.
– 18 July 2012


My next major job will be to write an extension to the Sakai research environment software which enables the deposit of material from Sakai to some SWORD2 compliant repository. (SWORD2 is a protocol for the deposit of digital materials into repositories, so this is a reasonably sensible project to undertake.)
My Sakai background is negligible, so I will be starting from scratch. I have had a look round the Sakai website, and had a chat to some people who know the system better than I do, and have not been left much wiser than when I started. This is really the reason behind this diary: it does not look easy to pick up the information that is needed to start developing for Sakai, and documenting how I went about it and what worked should therefore be helpful to the Sakai community.
First Impressions
In which I look at the online information about Sakai and SWORD2, with an eye to finding the information needed for development work.
For more go here.
Posted in Technology.
– 18 July 2012
Last Friday our colleagues in UKOLN hosted a seminar from Anna Shadbolt, Manager of Information Management Services (part of the library) at the University of Melbourne.
She gave us an overview of e-infrastructure and data management in Australia, including particular data management projects at Melbourne, and finished off with a mention of the DMVitals project from the University of Virginia, which she visited recently as part of her “world tour”.
Australian universities seem to have very strong support for data management and e-research in general. There is a series of national roadmaps to support e-research and, crucially, these are backed by government funding for the development and maintenance of national infrastructure. This is expanding all the time: most recently, digitisation was recognised as being eligible for this infrastructure funding.
NeCTAR is a national e-research infrastructure service, funded by the Australian Government and led by the University of Melbourne. It runs integrated e-research services, including a research cloud and a network of virtual laboratories, which are made available to institutions across Australia. For example, the University of Sydney has been using the LabTrove software for e-lab notebooks. Initially, the University of Southampton (where the software originated) provided hosting, but this service has now been moved to NeCTAR’s infrastructure, making it available to many other Australian universities.
AURIN (Australian Urban Research Infrastructure Network) provides e-infrastructure for urban research. It provides, amongst other things, federated access to data held by members in the network, and has done a great job of showing that old data can continue to be useful.
ANDS (Australian National Data Service) is a collaboration between Monash University, Australian National University and CSIRO to support data management across Australia. Over the last few years, they’ve implemented a national data register, Research Data Australia (RDA) which has made data from all of the country’s major institutions discoverable through a single web interface.
At Melbourne itself, they have had two recent data management successes. The first is Seeding the Commons, part of a national programme supported by ANDS to publish large numbers of existing nationally-relevant datasets and make them available through Research Data Australia. As a result of this project, the University of Melbourne is responsible for, in Anna’s words, “by far the greatest number of records submitted to Research Data Australia by a single institution.”
The second initiative is a series of data capture projects. These have piloted improvements to data management practices in ongoing research projects in order to publish information about valuable collections of data being created. Researchers submit data to an institutional register, which then feeds into RDA to maximise discoverability. This architecture enables the University of Melbourne to retain primary ownership of the records, and avoids the need for researchers to submit information directly to RDA.
Based on these successes they are now engaged in Doing Data Better, a project to roll out data management to the whole institution. They’ve created a policy and had it passed by the university, and each department has to write local rules setting out how they will implement the university-wide policy. That policy, along with other resources they’ve developed for researchers, is available on the project website.
Finally, Anna told us about DMVitals, an assessment tool being developed by the University of Virginia to identify strengths and weaknesses in individual researchers’ data management practice. The aim of this isn’t to “tell off” academics who are “doing it wrong”, but to create a report with a data management action plan to suggest practical steps researchers can take to improve. The tool is partially automated, with a scorecard that takes the answers to a series of yes/no questions and uses a weighted algorithm to create the report. I saw an early version of this tool being demonstrated at IDCC last year and I look forward to seeing it developed further.
Posted in Events.
– 21 June 2012
The University of Bath has spent the past few months working on their response to the EPSRC’s letter to Vice-Chancellors. In this letter, the EPSRC set out their nine expectations for how institutions in receipt of their funding should manage their research data.
Responsibility for responding to the EPSRC’s expectations – the roadmap setting out how compliance would be achieved – lay with the University’s Research Data Steering Group (RDSG), a work group set up in January 2011 to advise on Research Data Management (RDM) across the institution. There is considerable overlap between members of the RDSG and the Research360 project team and as such, the Roadmap for EPSRC was developed alongside Research360 project work on a longer term RDM strategy.
We are now able to share the University of Bath Roadmap for EPSRC: Compliance with Research Data Management Expectations. We also wish to share the process that we went through to develop and obtain approval for our Roadmap, positive feedback that we have received and to tell you what we intend to do next.
As part of the Research360 project we used Monash University’s “Research Data Management Strategy and Strategic Plan 2012-2015″ as a blueprint, from which we developed our own draft strategy and implementation plan. This original strategy consisted of a series of objectives and activities aligned with a number of themes, which in turn demonstrated how management of research data contributes to existing, long term University strategies.
We then turned our attention to the EPSRC’s nine expectations. Following a helpful series of blog posts by the DCC, and based on our experiences over the first few months of Research360, we started by identifying what the University of Bath is already doing to meet the expectations. We then re-structured the proposed objectives and activities from our draft strategy so that they were aligned with the EPSRC’s expectations.
Importantly, this approach meant that whilst fulfilling the requirements of the EPSRC, our proposed activities were primarily focused on building a sustainable infrastructure that will meet the data management needs of the University.
Once approved by the RDSG, we sent the Roadmap for EPSRC to the Pro-Vice-Chancellor (PVC) for Research. Working with the PVC Research was critical to the successful development of the Roadmap and we are extremely grateful for Professor Millar’s support. The PVC Research oversees the RDSG and is Chair of the Research360 Steering Group. As such, she already had a strong awareness of Research Data Management activities at Bath, and was able to provide invaluable guidance and a viewpoint from senior management from within the institution.
In order to meet the EPSRC’s 1st May 2012 deadline, we did not have time to progress the Roadmap through the normal approval process. We therefore submitted the Roadmap directly to the Vice-Chancellor’s Group (VCG). Despite positive comments from the VCG, they were not able to approve the first draft of the Roadmap. This provided us with an opportunity to incorporate their comments – mainly that we had been a little too ambitious in our aims and deadlines – before a resubmission of the Roadmap at the following VCG meeting, where the Roadmap was finally approved.
The University of Bath Roadmap for EPSRC: Compliance with Research Data Management Expectations was submitted by the Vice-Chancellor to EPSRC on 1st May 2012. We have since received some extremely encouraging feedback: Ben Ryan, Senior Evaluation Manager, EPSRC, congratulated Bath on the document, and described it as “an excellent example of an appropriate response”. He stated that the Roadmap “fully meets our needs for assurance that the University is taking our policy framework on research data seriously”. Further comments from Ben Ryan; from Professor Millar, the PVC Research; and Dr Liz Lyon, Director of UKOLN and one of the Roadmap’s authors, can be read in a news item about our Roadmap for EPSRC on the University of Bath website.
Following approval by the VCG, we have been able to present the Roadmap at a number of other relevant committees, to all major stakeholders and to those who will share responsibility for implementing the Roadmap. Over the next few months, we will be working closely with these stakeholders to explain RDM and its benefits in more detail, and to address any concerns that have been raised about the challenging cultural changes that lie ahead.
Now that the RDSG’s Roadmap for EPSRC has been approved, Research360 will continue to work on the developing the long term institutional RDM Strategy. The activities and objectives in the Roadmap for EPSRC will form the basis of a dynamic RDM Operational Plan, which will accompany the RDM Strategy as a Research360 project deliverable. We will also continue to work on the supporting Institutional RDM Business Case. These three documents will then undergo a longer review and approval process, starting to progress through the relevant committees in the autumn.
Posted in General, Progress updates.
– 8 June 2012
This post emerged from discussions at the JISC MRD Hack Days, particularly with Joss Winn of the University of Lincoln’s Centre for Educational Research and Development. The event brought together developers and data management experts for two intensive days to discuss and prototype tools for research data management.
Joss has also written a more discursive post about our discussions of file synchronisation, particularly with respect to handling of large files.
For a bit of context, both Joss and I make regular use of Dropbox and Git where appropriate.
Many researchers store the majority of their live data on local disks, with little or no redundancy, leaving them open to data loss through accident or theft. To solve this problem, we provide research users with high resilience, high performance, high capacity network storage, but in spite of these advantages, they often don’t use it as well as they might.
Another requirement is for easy sharing of files. Most data sharing still takes place via email.
The main reason for this is that many researchers do a lot of work on their laptops, in locations where their network connection may be intermittent, slow or completely sent. On the train, on a plane, in a cafe, they need access to some or all of their data wherever they are.
When faced with this problem, many researchers turn to Dropbox because it is easy to use and requires no user interaction beyond the initial setup. However, there are serious issues with using Dropbox to store research data, primarily the fact that confidential data is being stored on servers outside the institution’s control.
What is needed is a tool to transparently synchronise local and network storage, effectively providing an offline cache which provides the convenience and speed of local disk access combined with the resilience of network attached storage.
For confidential research data, it is highly desirable for all copies to remain under the control of the institution(s) who are responsible for looking after it. Any solution should at least have the option of storing data on a university-run storage service.
Control of the storage locations on their own is not sufficient. If data is sent over the internet with weak (or worse non-existent) encryption, it can easily be intercepted by an attacker. Strong encryption should be used to protect all data.
Many of the files which researchers routinely work with may be tens or hundreds of megabytes, or in some cases gigabytes or terabytes. Clearly, there’s a limit to this – it’s reasonable to expect that researchers will have to manage files of a gigabyte or more differently. But a suitable solution should at least work well for files of tens or hundreds of megabytes.
Two important factors spring to mind here. First, the user needs feedback about the progress of a sync so that they aren’t surprised when changes they were expecting haven’t propagated yet. Second, the tool needs to gracefully handle a user cancellation or a dropped connection without losing or corrupting data. Ideally, if this happens it should be able to resume where it left off.
Once you have two copies of your files, you have two different places to modify them, giving you the possibility of making different changes to the same file prior to synchronising. This becomes even more likely when you are sharing the same files between multiple users.
Storing metadata along with data for later (perhaps automated) deposit in a repository is a core research data management practice. It tends to be readily available only when the data is created, but often only useful when data is finally published or archived. With a dedicated “Academic Dropbox”, it may be possible for users to associate metadata directly files at creation time, and then keep that metadata with the file throughout its life through to deposit in an archive.
Here is a whistlestop tour of some of the options we dug up. I’ve listed the pros and cons as I see them (though feel free to correct/update me in the comments), and I’ve not commented on features for which I couldn’t find enough information to judge.
http://www.cis.upenn.edu/~bcpierce/unison/
I (Jez) use this daily.
Both Joss and I (Jez) use this regularly.
E.g. git-bigfiles http://caca.zoy.org/wiki/git-bigfiles, git-media https://github.com/schacon/git-media, git-annex http://git-annex.branchable.com/, mercurial large files extension http://mercurial.selenic.com/wiki/LargefilesExtension, and also Boar https://code.google.com/p/boar/
Boar is a VCS designed specifically to work with large files, while the others are extensions to existing VCS systems.
https://github.com/chmduquesne/sharebox-fs
There’s no simple solution to this, but we now have a whole range of things to try and to suggest that our users try. Who knows, some of them might even work!
Posted in Technology.
– 4 May 2012
Recent Comments