It is becoming increasing important these days for research data to be seen to be FAIR. That is, they should abide by the FAIR Data Principles that we discussed back in January. Not only is the Horizon 2020 Data Management Plan template based around these principles, they also form part of the criteria that the ESRC uses to decide if a data repository is good enough for your data. Most recently, the BioSharing directory of standards, policies and repositories was renamed FAIRsharing, positioning itself as a resource to aid compliance.
The word FAIR is an acronym formed from Findable, Accessible, Interoperable and Reusable. It is certainly catchy, but is it misleading? Are the FAIR Data Principles themselves really fair?
This was a question posed by a team from the 4TU.Centre for Research Data, based at TU Delft. They looked at a sample of 37 research data archives, repositories and databases, and tried to assess them against the 15 principles and sub-principles. The results were at first glance quite disappointing: 41% of the sample satisfied the Findability principles, 76% the Accessibility principles, 38% the Interoperability principles, and a mere 18% the Reusability principles. All these were respected data repositories, so what was going wrong?
The methodology may have played a small part. The researchers were making their assessments using information provided on the repository websites, not from a thorough audit of the services themselves. Also, the figures just quoted were for known compliance; the other repositories might have been judged borderline, non-compliant or unclear.
More interesting is what the exercise showed about the principles themselves. Some principles, such as the assignment of a globally unique and persistent identifier, can be measured objectively while others, such as description with rich metadata, are more of a matter of opinion. Some are highly specific, such as being able to retrieve metadata using the identifier and a standard protocol; others are much broader, such as meeting domain-relevant community standards (what are they?). And there were some suspicious patterns in the results suggesting that the principles favour some domains over others; the social sciences and climate science fared particularly badly.
So as a scorecard by which to measure repositories, it seems the FAIR principles may not be so fair after all. But on the other hand, maybe it isn't fair to blame the principles themselves for this. The difference in scope and objectivity between the principles reflects the confidence that the community has in the solutions to particular issues, and how universal those principles are. For example, there is widespread support for persistent identifiers as a way of making data citable and findable, despite the difficulties in making them work for dynamic or evolving datasets. On the other hand, no-one would claim to have a definitive view of what standards the data in each academic domain should adhere to, even though it is generally agreed that using standards is a Good Thing to do.
What does this mean for your own data? If you are asked to justify your choice of data archive or documentation in terms of the FAIR Principles, I think the important thing is to recognise the spirit behind the principles, and not to stress compliance or otherwise with the exact wording. For example, some archives are so well respected within their domains that their accession numbers hold as much weight as a DOI, say, even though they are not globally unique. The important thing is to understand why an archive or dataset might fall short of the principles, and whether it matters. You can see an element of this in the Horizon 2020 Data Management Plan template, which takes a broad view of the four FAIR elements rather than focusing on the principles themselves.
You can read more about the study by 4TU.Centre for Research Data, and inspect the data yourself, in a post on the Open Working blog.