What is Data?

What is Data?
by Trinity College Dublin

There are many definitions of what constitutes ‘data’, and often it depends on what your area of study is.  On a conceptual level, data can been seen as the basic starting point for research investigation, the ‘raw material’ from which a researcher begins to construct his or her understanding of a particular field or question. These materials are often called ‘raw data,’ although that is a highly contextual term, given that in many cases they have already been created or collected by another person or institution.  As the work of finding and collection continues, this will gradually become what is known as ‘research data,’ that is, the collected material from which the researcher will construct their final theories and arguments.  

At a very simple level, ‘data’ is a collection of observations, facts, objects, texts or statistics that can be analysed, sometimes also referred to as ‘sources’ or ‘evidence.  Other definitions include “citations, software code, algorithms, digital tools, documentation, databases, geospatial coordinates (for example, from archaeological digs), reports, and articles.” (NEH, 2015)  But even this long list can be expanded, as humanists also study audio and video recordings, collections of images, and other hybrid media.    

Given this broad range of definitions and potential types of data, research infrastructures face an immense challenge in collecting appropriate digital data and making it available across a range of original sources but also for a variety of users.  For example, if an infrastructure has assembled a collection of audio recordings taken during the Hungarian Revolution of 1956, a linguist might review them to determine if a change has occurred in the syntax of Hungarian language (Magyar) between the mid-1950s and the present day, whereas a social historian might want to use the documents as evidence for shifts in public opinion over the course of events, or as related to social class, age or gender.  Each of these researchers is looking for something different in the same material: facilitating them both equally is the key challenge for data interoperability.