Citizen Science and Research Infrastructures

Citizen Science and Research Infrastructures
by Trinity College Dublin

Learning Outcomes

By the end of this section, you should be able to:

  • Understand the need and types of infrastructural support around Citizen Science
  • Recognize the the main European Research Infrastructure stakeholders for Humanities research and their potentials and challenges in engaging Citizen Scientists
  • Understand the main forms of collaboration and mutual benefits between Cultural Heritage Institutions and and Citizen Scientists
  • Enrich data from cultural heritage collections on Wikidata and explore its potentials for their own research projects



Creating a well-functioning ecosystem that allows new forms of co-creation between professional researchers and citizen scientists is an essential success criterion for Citizen Science. Opening up research processes to actors outside academia requires the empowering of citizens with access not only to research outputs but also to the key sources of scholarly knowledge creation: research data and research tools like virtual research environments.

Research Infrastructures in these research fields, like DARIAH for digitally enabled Arts and Humanities, CLARIN for language and text-based studies, E-RIHS for heritage science, CESSDA for Social Sciences, and OPERAS for scholarly communication, have proven potential to connect expert specialists and resources with the public. These organizations play a vital role in opening up resources (databases, corpora, software, research tools and services to annotate, analyse, connect, store, and share data but also repositories and training materials) that are usually only available for established scientists with institutional e-mail accounts and affiliations for broader collaborations. Open data management and storage services provided by research infrastructures like EUDAT’s B2Drop repository can be essential services for hosting and connecting data produced by independent researchers and citizen scientists who do not have access to institutional services.

From a Citizen Science point of view, the Endangered Language Archive (ELAR), a founding member of the CLARIN-UK consortium, is an especially valuable resource as it allows citizens to connect with their own fragile heritage through a digital repository specialising in preserving and publishing endangered language documentation materials from around the world.

How the CLARIN Virtual Language Observatory enables citizens to collect, analyse and store language data

A significant advantage of the CLARIN Virtual Language Observatory over the institutionally hosted resources is that it is openly available for independent or non-specialist researchers around the globe. They can find all types of resources they need to conduct research on language resources – data sets, corpora, web services for processing, analysing and visualising data, storing capacities – on the same federated and easy-to search platform. All the necessary know-how is included in the tools which help decide which tools best suited for research. The video below illustrates a fictional use case and shows how researchers (experts and non-experts alike) can find answers to the question of how the use of nouns by females differ from male members of the parliament.

From language data to insight: the CLARIN use case:

Despite all that these services and tools research infrastructures can offer to non-academic researchers interested in the Social Sciences and Humanities, it seems that the uptake and use of such services is not yet a common practice. A possible reason for that is despite the open availability of webinars and other training materials, the navigation in such complex environments still requires a great deal of expert epistemic knowledge and the general awareness of their availability is also limited.


In a Cultural Heritage context, Europeana has a long tradition of collaboration with the broader public. Operating as  ‘the digital door to European cultural heritage’, the platform collects and provides access to digitised material of more than 30 million items. This rich source of is open to use and reuse in a great variety of educational and research contexts, and has also harnessed citizen knowledge to grow the range of perspectives it can represent.

Europeana 1914-1918

One of Europeana’s most far-reaching Citizen Science programmes has been the 1914-18 collection day, which invited citizens Europe-wide to present family artefacts, such as photos, diaries medals, recordings or postcards from the period of 1914-1918 and their associated stories. To develop a deeper understanding of this important period of European history and show a plurality of narratives about the Great War, the conflict and all perspectives, from the front line as well as the homefront, a substantial digital collection of material was gathered from the national library collections of ten libraries and other partners in eight countries that found themselves on different sides of the historic conflict. Contributions from the public were collected, digitised and curated by experts from the affiliate network of archives who then made the digital objects openly available on the Europeana 1914-1918 website. The project’s results were published in 2014, at the centenary of outbreak of WW1, but it is still open to for the public for contribution. Its success clearly shows the potential of Europeana as an aggregator infrastructure in Citizen Science. On the one hand, to reach diverse communities in different European countries, having a rich network of partner institutions was an essential success criterium. On the other hand, Europeana’s interoperability framework enabled the bringing together diverse sources from a variety of institutions to a central platform.

To make public involvement into collection enrichment broader and easier, Europeana is now developing a crowdsourcing platform that will enable citizens to transcribe and enrich cultural heritage material from Europeana Collections and national aggregator portals.

Further information:


Europeana 1914-1918 collection days in Poznan, Poland (June 2016) ENGLISH from Europeana on Vimeo.

Wikidata: an increasingly strong infrastructural link between Citizen Scientists and cultural heritage institutions

Europeana also plays a role in exploring the potentials of Wikidata as a platform for galleries, libraries, museums and archives to enrich, connect and openly share their cultural heritage collections.

Wikidata is a multilingual free knowledge base, a main pool for sharing scholarly and technical information that can be read edited both by humans and machines. The ambition of the endeavour is not less than reflecting “the sum of all human knowledge” in a structured and interconnected network of knowledge graphs that can be displayed in any language. The interoperability framework underlying these knowledge graphs is flexible enough to accommodate information from all areas of knowledge (including all domains of scholarly and scientific knowledge or even indigenous knowledge) and allows for the Wikidata user communities to iteratively and dynamically shape it via their inputs. These properties enable Wikidata to go beyond its role as a support data base for Wikipedia, Wikimedia Commons, the other wikis of the Wikimedia movement, and acquire the status of a major open data platform for massive online collaboration that is open to anyone in the world with an internet connection to learn, to share their own knowledge, to link further resources to the knowledge graphs or to build services, tools or cool applications on the top if it.

The participation of a global and multilingual communities of volunteer data creators and curators on Wikidata can be easily interpreted as an en masse instantiation of Citizen Science. Building on Wikidata allows individual projects to overcome many methodological challenges and struggles they might face by using a smaller, isolated virtual environment as it increase the capabilities of both professional and citizen scientists to collaborate with each other in mutually beneficial ways (for further details, see Mietchen et al. 2015). First, project designers do not have to struggle to find sufficiently large and diverse communities of contributors. Second, an ecosystem of infrastructure, technologies and tools already established and made available by Wikidata. Third, the use of Wikidata standards, vocabularies and know-hows as shared common grounds for data creation and curation reduce the risks of insufficient data production and ensures data quality.

The values in connecting collections with Wikidata and opening up their descriptions for community curation are increasingly recognized and explored by the cultural heritage sector worldwide. They are able to benefit from the Wikimedia and Wikidata community efforts in several ways.

  • Integrating their collection descriptions with Wikidata enables them to share and amplify knowledge collected in their institutions outside of the walls of their institution and embed this knowledge in a larger context of related knowledge systems.
  • It enables GLAM collections around the world to be mapped with each other and establish connections and links between them that would remain hidden otherwise. Uncovering synergies between different institutions holdings and curation practices help them to reduce duplicate work.
  • Community curation of descriptions help institutions to structure and enrich the metadata of cultural heritage objects from a different source. As a result, their holdings become better searchable and discoverable. Extra contextual information gained this way can help even the curators to discover new correlations and connections in their material.
  • Opening up collection descriptions for community enrichment also involves opening them up for different perspectives with different knowledge representation systems. For instance, as Alex Stinson remarks in her blog post Wikidata in Collections: Building a Universal Language for Connecting GLAM Catalogs, fitting instances of indigenous knowledge into library catalogues or authority vocabularies is an uneasy task that can easily result in inappropriate and insufficient description of cultural content. By contrast, Wikidata, as she puts it, “creates opportunities for community participation and allows for a greater diversity in the way people can be represented in data, giving people the power to shape knowledge about their own communities.”


Over the past years, a growing community has formed on Wikidata around the cultural heritage domain. You can learn more about and join them here:

Your progress through the "Citizen Science in the (Digital) Arts and Humanities" module