Although my involvement in Research360 is at the level where technology and people interact, I’m also doing my best to understand how our infrastructure is developing at a much lower level so that I’m in a position to better advise non-technical stakeholders.
Bath University Computing Services (BUCS) are currently in the process of procuring a new file store which works in a very different way to our existing storage systems, and I recently had the opportunity to learn more about it from our Database & Systems Manager, Paul Jordan. Since this is a very new area for me, my apologies to you and him for anything that I’ve got wrong.
Like our existing storage, this will be arranged into tiers, with Tier 1 containing the most expensive storage with the quickest access times, and lower tiers providing slower but cheaper storage. Data will be moved between tiers automatically (and invisibly to users) based on configured policies.
Where this new storage differs from our existing systems is that the lowest tier will not be a tape carousel, but an “object store”. Where traditional a file system stores data in an ordered, hierarchical way, an object store stores individual data objects in a flat namespace.
The major advantage of this is that much more of the available space on the physical disks can be used to store actual user data: the the overhead is much lower than for traditional filesystems. By virtualising storage across a network in a new way, it’s also very much more scalable than anything we currently use — we could easily grow this to the petabyte level or expand out into the cloud if need be.
Now, most users need never know that their data is stored in an object store, just like they don’t need to know whether the disks were made by Hitachi or Western Digital. An extra layer on top does some translation, allowing you to store files over the network just like any other networked attach storage (NAS). Users can access it via a mapped drive in Windows or an NFS mount .
However the object store is also accessible directly via a RESTful API over HTTP/HTTPS (in fact, that’s how the NAS layer interacts with it too). Despite being sold as a replacement for tape archival, it’s very quick to access over the network, and authentication of users via LDAP or Active Directory is also built in. In addition to this, an object store can perform other clever functions during or after ingestion, such as transforming data into other formats or making use of metadata.
It therefore seems like the perfect back-end to a digital repository such as EPrints, DSpace or Fedora. A load of overhead could be cut down by having the repository target the object store directly, rather than doing so via files on a virtual file system using the NAS layer.
Alternatively, if the object store itself is clever enough, it could be used directly as a repository, using only a very thin user interface on top. A SWORD2-compliant interface would open up even more options.
If you’re interested in learning more, there are a number of white papers and other resources available on the Hitachi Content Platform web page.
Are other institutions implementing similar types of storage? Is it possible to integrate a repository with an object store directly via HTTP and if so has it been done?
It would be interesting to hear from anyone else who’s come across anything similar.
Image credit: Kitchen Shelves by John Martinez Pavliga
Catching young researchers early looks to be a key strategy for instilling good data management habits, before they start to make bad ones. This is particularly relevant for Research360, as we’re focusing a lot of our pilot implementation effort on the

Recent Comments