Data resources in the Humanities can come from many different mediums, be they books, videos, audio recordings, or photography, and each might use their own standards. Equally, when a project begins pull together or create datasets management strategies and infrastructures are likely to be subject to certain unconscious biases that could affect how easy it is to use by future researchers, either for discovery, exchange or reuse.
Elsewhere within these modules, we discuss the notion of ‘data soup’ (what is more commonly known as ‘Data Heterogeneity‘); the mixing of all these difference data types and standards, and how we can make it ‘Interoperable’. Ontologies are one of the ways in which we can make datasets interoperable, but it can take a bit of work to fully understand how they function.
One important first step to designing or implementing an ontology is to know your own data, in all its formats, and think about how you might categorise it in a way that is understandable to a human, as well as to a computer. This is how we come to the idea of ‘semantic data’, that is, applying human-motivated categorisation and meaning to data in a way that a computer can understand it, structure it, and re-represent it back to another human while retaining that meaning.
There are different ontologies for different types of information, discipline, or research community. For example, CIDOC-CRM is an ontology that is designed to meet the needs of people working in Museology. GOLD is an ontology for linguists to use to codify language according to linguistic elements. NeMO can be used by humanities researchers to track the workflow of their scholarly practices.
Throughout this module, we will discuss these concepts in much greater detail, introducing the issues around ontologies and semantic data. The first of these issues is one that we have touched on above: Data Heterogeneity…
“rope” by cocoparisienne, CC0 – available on https://pixabay.com/en/ropes-rope-cordage-dew-tross-341976/