By the end of this section, you should be able to….
- Describe the FAIR Principles
- Understand how Research Infrastructures ensure their data is FAIR
What are the FAIR Principles?
If we agree that improved and increased the sharing of research data would be of benefit to research communities and collections holding institutions alike, then how should we proceed? What ground rules should given how people share, when and where? How can we establish a common understanding of how far the ethic of sharing can and should extend?
These questions have been answered by the development of the FAIR (which stands for Findable, Accessible, Interoperable, Reusable) principles. Developed by FORCE 11 (a pan-disciplinary organisation, not one specific to arts and humanities), these principles provide a baseline understanding for the value sharing data can deliver, and the baseline requirements for doing so.
The FAIR principles are described as follows:
TO BE FINDABLE:
F1. (meta)data are assigned a globally unique and eternally persistent identifier.
F2. data are described with rich metadata.
F3. (meta)data are registered or indexed in a searchable resource.
F4. metadata specify the data identifier.
TO BE ACCESSIBLE:
A1 (meta)data are retrievable by their identifier using a standardised communications protocol.
A1.1 the protocol is open, free, and universally implementable.
A1.2 the protocol allows for an authentication and authorisation procedure, where necessary.
A2 metadata are accessible, even when the data are no longer available.
TO BE INTEROPERABLE:
I1. (meta)data use a formal, accessible, shared, and broadly applicable language for knowledge representation.
I2. (meta)data use vocabularies that follow FAIR principles.
I3. (meta)data include qualified references to other (meta)data.
TO BE RE-USABLE:
R1. meta(data) have a plurality of accurate and relevant attributes.
R1.1. (meta)data are released with a clear and accessible data usage license.
R1.2. (meta)data are associated with their provenance.
R1.3. (meta)data meet domain-relevant community standards.
Obviously not every collection of research data is equally eligible to be shared in a FAIR way. Anonymity of personal data must be respected and may only be sharable in a redacted form, for example, or unprotected research discoveries may require an embargo. Particular problems in the arts and humanities can exist, due to the shared nature of the ownership of cultural data (eg. between archives and researchers, or between publishers and authors). So the application of the FAIR principles is usually applied with the caveat condition that data be “as open as possible, as closed as necessary”
Case Study: CENDARI Data Soup
The Collaborative European Digital Archive Infrastructure (CENDARI) project is one of the PARTHENOS participating e-infrastructures. CENDARI gathers curated data covering two research areas in the community of “Studies of the Past”: WW1 and Middle Ages. It includes data from different sources (mostly across the GLAMs sector) both unique and deposited. The so-called CENDARI ‘data soup’, contains a wide range of formats and levels of description of data. Recognised and interoperable standards – in use in the different research domains involved – were used to encode data and describe cultural objects and collections (i.e.: EAD for Archival documents).
The CENDARI dataspace contains 829,087 descriptions, represented in several types of data formats. This information is stored in a repository called CKAN, an open source data portal platform developed and maintained by the Open Knowledge Foundation. The kind of file formats and standards, as well as the level of organization and accessibility of data provided by the Cultural Heritage Institutions in contact with CENDARI, vary from case to case: small archives are usually lacking resources for metadata standardization and data storage, therefore their archival descriptions are often accessible via spreadsheets and are not available online (hidden archives). National and international archives, instead, usually have a cataloguing and encoding department: nevertheless, they often lack both technical and political means to share their data with other institutions and projects.
Along with the aggregation work on data, CENDARI researchers have also encoded information related to archival descriptions and archival institutions, using the open source software ATOM (‘Access to Memory’), promoted by the International Council for Archives and fully supporting all the archival descriptions standards. CENDARI established collaborations with international networks in Digital Humanities, in order to engage communities of scholars and digital humanists: thus, the risk that data collected in the context of research projects become obsolete and unusable is reduced.
The FAIR Principles in practice.
This video shows how data that complies with the FAIR Principles helps researchers to use Linked Open Data.
“Linked Open Data – What is it?” from Europeana (approx 4 minutes)
(To watch this video in another language visit available at https://vimeo.com/36752317)
Dieter Van Uytvanck – CLARIN and the FAIR Principles (approx 30 mins)
Look at how Research Infrastructures ensure that they are compliant with the FAIR Principles. Dieter Van Uytvanck of CLARIN gave this presentation at the PARTHENOS-DARIAH-CLARIN ‘FAIR Principles Workshop’ held on the periphery of DHBenelux2017 in Utrecht, July 2017.
The Fair Data Principles
YouTube video: “Barend Mons / FAIR Principles”, by GODAN Secretariat, published 15th Sept 2016, https://youtu.be/K40utIzUzOk (accessed 23rd Jan 2018)