Data in Ontologies

Data in Ontologies
by Trinity College Dublin

What kind of data can be produced using a formal ontology and how can I use it?

The end result of adopting a formal ontology for information integration is the representation information in a uniform manner, which allows the discovery of information not previously discoverable because of the splitting up of information into incompatible data silos. To achieve this end, the formal ontology must be used to store, produce and query data.

A formal ontology should be agnostic to data formats in order to be as neutral and usable across as many different platforms as possible. That being said, a formal ontology must be created in a particular format in order to be used to store data. As a logical structure, the S-V-O statements that a formal ontology is supposed to help to make, form a graph of classes and relations, also known as edges and nodes. In practical terms, this allows the linking of unlimited chains of information, which are logically controlled, and therefore calculable, by the rules of the adopted ontology.

In order to make it easier to practically apply an ontology to information, a number of formats based around the logical form of ‘S-V-O’ expressions have been developed for use.  Two of the most popular formats are RDF and OWL. Both of these formats have a growing number of databases, tools and platforms based around them, making them easier to use and implement. These formats allow the encoding of triples, the technical term for an S-V-O relation, and the storage of these in a graph form. Databases of this kind are called triple stores.

Watch!

Dr. Kristen Schuster discusses strategies for normalising heterogeneous data

Production of semantic data encoded according to an ontology takes a number of forms.  Because most data continues to be stored in relational databases, semantic data is often produced via a mapping process. This mapping process is an activity where the user takes each table and field from a relational database and gives it an equivalent semantic expression using the formal ontology. The semantic expression transforms data from relational database tables, for example, into a series of triple statements that link nodes via properties to each other which form a network of human readable and machine processable data. This is the semantic expression of the data. A software programme or script can then be used to create automatic conversions from a relational database form to the triple form (the S-V-O form) of knowledge bases. In some cases, rather than transform data into triple format, query converters are generated which map semantic queries to relational queries in canonical databases. In this case, rather than transform the data from one format to another, an equivalent is drawn between the query that would run on the standard database and how this would be represented/queried in the semantic database. You can then create an interface from which to make semantic queries that will run against relational databases. This has the advantage of not requiring data transformation processes, but it does encounter challenges if source data values are not compatible (see ‘Standards’), or the databases from which the sources are derived are not accessible. With the popularization of semantic data management platforms such as ResearchSpace, Arches and WissKi, it is increasingly possible to generate native semantic data, that is semantic data that has been generated within the data management platform. This is done by hiding the complexity of the semantic data to the user and providing traditional form entry tools in order to guide users to create well-formed semantic data as part of their data creation and management process.

Querying Semantic Data

Querying of semantic data requires a query language in the same way as you would need a language for querying traditional databases. The standard query language for RDF encoded semantic data is called SparQL. Querying data encoded in a semantic graph allows researchers to take advantages of the IsA relations described above, as well as the ability to create highly complex searches over multiple relations and nodes. The writing of such queries, however, is a non-trivial task, which for many non-computer science researchers can seem overly complex. Thus research in digital humanities has also focussed on ways to make such query facilities easier, either by making it easier to share queries or easier to build and run them in the first place (e.g. through graphical interfaces).


Your progress through the "Formal Ontologies: A Complete Novice's Guide" module

80%