By the end of this section, you should be able to:
- Recognise and understand the issues and processes involved in managing data for a Citizen Science project
- Prepare a data management plan for your project
- Understand the concept of crowdsourcing and how crowdsourced data requires specific data management processes
Most scientific data management techniques apply equally to Citizen Science projects. But there are slightly more challenges to face here, merely due to the public participation in the whole process of collecting and managing data in these projects. Depending on the scope and audience of each project, methodologies may vary. Here we will guide you through the main process and aspects you should have in mind when planning the data life cycle of your project and the different data management techniques.
The process of managing data when it comes to Citizen Science projects can be summarised in the following data life cycle model.
These steps do not always follow a linear sequence but can occur or repeat at different stages, depending on the project’s needs.
Plan – Understand your project’s data needs
Start planning your data life cycle early ahead in your project. You will need to answer questions related to documentation, storage, quality assurance and ownership for each stage of the data lifecycle. At each stage, it is advisable that you consider cross-cutting elements, such as description (including metadata and documentation), quality management, backup and security. Making such decisions requires understanding of the project’s purpose and being familiar with the scope of data that should be collected to best serve the project’s goals.
This step deals with determining the best way to go about acquiring and organising the data that will be collected by the public, resulting into a data model which will clearly describe what the data is, their formats and how they will be organised and processed. You can acquire new data by collecting them, by adapting old data, by sharing or exchanging data and by purchasing data. In citizen science and crowdsourcing projects that involve data collection, volunteers typically record their empirical observations or use equipment such as cameras to create data.
In order for a project to be considered successful, it must ensure data quality, usefulness and preservation. This step is for ensuring quality control of the data by suggesting quality assurance procedures that should be employed in order to minimise errors and identify and treat erroneous data.
This step focuses on enriching your collected data with information about data (metadata) so as to help other users to discover, acquire, interpret and effectively use data. Metadata fields can vary but should in general answer the following questions: Why were the data collected? Who collected the data? What does the data include? When were the data collected? Where were the data collected? How were the data collected and how was data quality ensured?
This is an important, ongoing data management task for all projects and tackles the issue of long term preservation and sustainability of the collected data. This includes submitting data to an appropriate long term archive, such as a data centre or repository.
You should make sure that your data is discoverable by other users in a medium that people can understand and easily use. To achieve this, you should figure out who will need your data or want to see them, whether it’s researchers, journalists, policymakers or a particular community and then decide on the most efficient ways of giving users the data access they need. Start by providing easy-to-use search and discovery tools.
This step has to do with enriching your collected data with other resources so that you create a data set more valuable for analysis.
As in any scientific undertaking, analysis helps you document and describe facts, detect patterns, develop explanations, test hypotheses and illustrate findings. Analysis of citizen science or crowdsourcing data isn’t necessarily different from analysis of data collected by other methods, and can vary widely depending on the nature of the study and type of data. Knowing how you’ll analyse data before you create your final collection plan will help create a better data collection model.
Tips for managing data
Even when working in citizen science, proper management of data, with a view to making it open and reusable where possible, should be incorporated into a project. In fact, the risks inherent in the decisions you make about data may be even higher in such a project, as you may be less able to change course quickly and easily within such a context if you find an earlier decision has limited your options at a later stage, or introduced an unexpected risk..
For that reason, we recommend you be particularly sensitive to the following in your planning stages:
- Know what you data will be, and how you will use it, to ensure you are compliant with GDPR and ethical standards
- Use appropriate standards to model your data
- Use a data management plan to help structure your thinking
In particular, depending on the form of citizen science you are following, you may need to pay particular attention to quality control of the data you receive. Forums that allow for open commenting, for example, may be open to abusive or mischievous posting, which could skew the nature and quality of your data if unmanaged and undetected. On the other hand, standard academic measures of rigour or provenance may not be equally applicable if you want to leave space for meaningful input by non-professional researchers. A good introduction to how to think about data quality in a citizen science project is listed in the ‘further resources’ section below.
Case Study: Setting Tasks and Meeting Technical Requirements in the Transcribe Bentham project
Prof. Melissa Terras and Dr. Justin Tonra talk about the practicalities of building a platform for crowdsourcing, and setting realistic tasks for volunteers on the Transcribe Bentham project.
To find out more about the Transcribe Bentham project, click here: https://blogs.ucl.ac.uk/transcribe-bentham/
- Po Ve Sham – Muki Haklay’s personal blog, “Citizen Science and Scientific Crowdsourcing – week 5 – Data quality”, 2018 https://povesham.wordpress.com/2018/02/09/citizen-science-scientific-crowdsourcing-week-5-data-quality/
- Wiggins, Andrea, et al. “Data management guide for public participation in scientific research.” DataOne Working Group(2013): 1-41. http://www.birds.cornell.edu/citscitoolkit/toolkit/steps/accept/DataONE-PPSR-DataManagementGuide.pdf
- Schade, Sven, and Chrysi Tsinaraki. “Survey report: data management in Citizen Science projects.” Publication Office of the European Union: Luxembourg (2016). https://ec.europa.eu/jrc/en/publication/survey-report-data-management-citizen-science-projects
- Citizen Science Cost Action, “On the importance of data standards in Citizen Science”, 2018 https://www.cs-eu.net/blog/importance-data-standards-citizen-science
- COST Action CA 15212 “Citizen Science to promote creativity, scientific literacy, and innovation throughout Europe” Minutes of WG5 workshop in Geneva: “On the citizen-science ontology, standards & data”, 2018 https://www.cs-eu.net/sites/default/files/media/2018/06/COST-WG5-GenevaDeclaration-Report-2018.pdf
- Citizen Science.gov, How-to Toolkit https://www.citizenscience.gov/toolkit/#