By the end of this section, you should be able to…
- Describe a Data Management Plan
- Describe the need for Data Management Plans
- Start a Data Management Plan for your own research
Data Management Plans
Whether you are working independently on a traditional research project or leading a large collaborative team building a digital research tool, you will need some sort of data management plan (DMP). A DMP is a plan that you draw up, usually at the beginning of your research project, that outlines how you intend to manage the data within your research project responsibly.
If you are an independent researcher, you may never write it down, or think explicitly about it: research notes and references may be held in your Zotero library or in a certain file on your computer, photocopies of source material or articles may inhabit one or more box files or piles on your desk.
The more complicated your team and your project, however, the more likely you will need not only an explicit data management plan, but a written and agreed one. And, if your research is in receipt of external funding, you will almost certainly be required to have one. To help with this, many of these funding agencies provide a model or template for a DMP for you to follow.
This need not be an intimidating, or even an unwelcome task, however. While a data management plan will take some time to devise and agree, in the end, it is just a tool for thinking systematically through the kinds of material your work will produce, how you will work with it and ensure its integrity during the project, the possible reuse value of this material, and how it will be made safe and available into the future. A DMP is like an insurance policy for sustainability, ensuring you will maximise research value and have no unpleasant surprises at the close of your project.
Common Headings in a DMP
There is no hard and fast standard for DMPs, largely because the nature of research data can vary so significantly between projects. The following three categories (and the associated questions) will give you a sense of the kinds of information your DMP should capture, however. Not all of these may apply to your project and your data, but a subset almost certainly will!
What data will be collected, processed and/or generated
What does your data consist of? Why are you collecting it? How many and what file types will be represented? How large will the overall corpus be? How will it be collected (by survey, interview, desk research from secondary sources, from a data repository)? Will any transformations be applied to the data as you find or capture it? Will it need to be translated, transcribed, structured, anonymised or federated with other sources? Will it be encoded? Will the coding protocols be shared among a team, or unique to one individual coder? Will there need to be any short or long term restrictions on use, or an embargo? Will the conditions of use (for example, a creative commons license) be readily available (in human and machine readable forms)? Will the identity and purpose of the person seeking later access to the data be recorded or monitored? If so how? How will you ensure data quality?
The handling of research data during and after the end of the project
How will the data be kept secure? What backup protocol is being used within the project? How are you making it FAIR (findable, accessible, interoperable, reusable)? Will data will be shared/made open access? How data will be curated and preserved (including after the end of the project)? Who will take responsibility for the data protocols? How will any investments required be funded/supported? How many people will have access to it? What data repository will you use during and after the project to store it?
Which methodology and standards will be applied
What metadata will you maintain? What standards will you use for this purpose? Will any standard thesauri, vocabularies or methods be applied? If you are creating your own metadata schema, vocabulary or other convention, will a crosswalk or mapping to commonly available alternatives be made available? Will you apply a particular naming convention to the files? Will you be able to use permanent identifiers (PIDs) to enhance long-term findability of your resources? Will particular software tools be required to access and interrogate it? If so, can the source code for this software be made available as well? How long can you commit to the data being accessible for? What institution guarantees this commitment?
You will probably need also to think about the ethical of your data and data collection processes. For more information on this, see the next section of this module.