Research Data Management Preserving the treasure trove of research
At the Collaborative Research Centre “Extinction Learning”, researchers are offered support with storing, sharing, archiving and publishing their data.
MRI scans, EEG recordings, microscope images, stress surveys – researchers produce a veritable treasure trove of research data every day. The Collaborative Research Centre (SFB) “Extinction Learning” at Ruhr University Bochum intends to retrieve this treasure, share it with others and make it accessible to future generations of researchers. This requires trust, patience, convincing – and biscuits, according to Dr Marlene Pacharra and Tobias Otto. They support researchers in research data management in the Information Infrastructure Project (INF) at SFB 1280.
What exactly is research data management (RDM)?
Tobias Otto: Research data management refers to the organisation, storage, documentation and availability of data throughout the entire research process – all this may sound abstract at first, not at all practical and like a lot of work with no clear benefit. The problem with RDM is that it actually feels like that to begin with, but of course it’s not. RDM is long-term effort that often only pays off after an experiment or project has been completed. Then, however, you immediately notice how much it was worth investing in this effort – and you will ultimately maintain this approach in the long term.
Marlene Pacharra: Ideally, researchers consider how their research data should be stored and documented even before they start their experiments. It’s important to have a well-organised data and folder structure and an idea of what additional information is needed to ensure that the data will still be traceable and reusable in ten years’ time. The latter is the metadata that you often hear mentioned, i.e. descriptions of the research data.
What does that mean in practice? What is the job of a data steward?
Pacharra: First of all, the SFB’s research data managers are the researchers themselves, who, in keeping with good scientific practice, are also responsible for their own data. We assist them with advice and show them how they can improve their daily RDM routines. We are also creating sustainable infrastructures, for example, building the new research data management system ReSeeD in close cooperation with Research Data Services of Ruhr University Bochum and adapting it to the needs of the SFB. But in order for these tools to be accepted and become part of everyday research, we have to communicate with the researchers, ask them about their needs, and identify where things are going wrong.
Personalised support is essential to promote good research data management.
Tobias Otto
Otto: We ask “Where are you?” and “What do you need?” We talk to researchers, build awareness of the need for good management, solve problems collaboratively, improve storage processes and so on. Accordingly, our INF project is a service project that focuses on and supports communication. Sitting down with coffee and biscuits, providing personalised support – we believe this is essential to driving good research data management forward.
An important aspect of the data management strategy at the SFB is the Ruhr University’s research data management system (RDMS) ReSeeD. Can you explain what that will look like?
Otto: At the moment, the researchers at SFB 1280 work with apps we’ve developed for metadata on a network drive in central IT and use a defined folder structure in order to share research data. The system has grown over the years and is still operational, stable and secure. But for publication and archiving, our researchers have to use other systems. The Ruhr University’s new research data management system ReSeeD supports backup, documentation, collaboration, archiving and publication in a single system with a high level of usability for researchers, so that there’s no need to switch between systems.
Research data management at Ruhr University Bochum
Pacharra: What makes this research data management system so important for us in the neurosciences is that there are no established guidelines for our field. Whereas national infrastructures do exist for other disciplines.
How long has the current system been in place?
Otto: We first started thinking about it in 2016 in the FOR 1581 Extinction Learning research group: how do we describe our different research data with metadata? What must standardised folder structures look like that can work across all areas of neuroscience, i.e. for animal data, human data, EEG recordings, MRI images and so on?
We want to approach researchers in their area of expertise.
Tobias Otto
One thing was crucial right from the start: the concept doesn’t have to be perfect, but it has to be right for the researchers, so that they can integrate it into their daily research routine and, ultimately, the concept ends up being applied. We want to approach researchers in their area of expertise, which is research rather than data management. We believe that this is the only way we can be successful in the long term and ensure good research data management.
Ruhr University Bochum has further developed a software that is used all over the world to make it available to its researchers as ReSeeD. Have you had the chance to test the system?
Otto: We’re working closely with Research Data Services at Ruhr University Bochum and have tested the beta version of the new system with other users at the university. The system is going to be excellent! It is and will remain completely open source and will be provided by our IT.Services. Everyone can and should use the system and help develop it. It’s a system from the research community for the research community.
ReSeeD is going to be great.
Marlene Pacharra
Pacharra: ReSeeD is going to be great. Our data storage is very effective. It can be specifically customised for individual Collaborative Research Centres, for example to take into account discipline-specific metadata schema requirements.
How do researchers react to research data management efforts, such as the introduction of new systems?
Pacharra: The researchers have put an insane amount of work and brainpower into their research and data collection.
Researchers must have trust in the structures of research data management.
Marlene Pacharra
Researchers must have trust in the RDM structures in order to share this treasure, this data, in a way that is accessible. A cultural shift is needed in some areas for this to happen. What’s more, data management eats up time. Researchers want to do research, do their experiments and file the data as soon as possible. They are busy and don’t have time to laboriously organise their data. That’s why it’s so important that they get the support they need.
Otto: As research data management structures have been evolving over time, confidence in such systems and awareness of RDM have grown. The new generation of researchers is embracing the mindset of sharing data from the outset.
Which measures have you taken to support researchers when it comes to RDM?
Pacharra: For one, we introduced a Lab Data Cleaning Day, designed to raise awareness among researchers about managing their research data. That has to be carefully thought out in advance, because we’re asking researchers to take a day off from their experiments, to review their data and clean it up. During a Lab Data Cleaning Day, we are on hand to help secure their data troves.
Otto: In order to minimise the time burden, we’ve agreed on fixed workflows. Awareness and commitment have now been established. Researchers keep telling us, “I need one more data cleaning day,” and that’s a positive development.
How do you manage to get everyone to commit and stay motivated?
Pacharra: One way is through our new research data management policy, which we agreed on with all researchers at the SFB in 2022 and which sets out binding rules. It defines responsibilities, roles, workflows and standards.
The new policy shows our commitment.
Tobias Otto
Otto: The new policy shows our commitment. We don’t conduct research data management and Open Science away from the public eye.
Policies
Pacharra: For us, it’s crucial to understand where the researchers are coming from when we reach out to them and to appreciate the problems they’re having with RDM. This is the only way we can provide meaningful advice and effective support with regard to RDM.
What’s the greatest challenge from a professional perspective?
Pacharra: The human data, such as stress questionnaires and brain scans, represent a major challenge. A lot of effort is needed to ensure that these data are anonymised and that ethical standards are maintained. Researchers are quite rightly very cautious about this and have reservations. There’s always a lingering concern that it could be possible to infer information about patients and test participants. Data protection is the top priority in this context.
Otto: This is why our workflow requires three control instances. We provide tools for anonymisation workflows, which we naturally also share with the research community. That, too, falls within the scope of RDM.
What do you expect of research data management going forward?
Pacharra: We want to ensure that, once compiled, valuable data don’t have to be collected all over again. After all, data collection requires a lot of time, effort and brainpower, as well as a lot of money.
We hope that ReSeeD will be used sustainably.
Tobias Otto
Otto: We hope that ReSeeD will be used sustainably and that the data will still be accessible and the experiments reproducible even a decade from now.
Pacharra: We very much hope that different disciplines will be able to access this treasure trove of data. With the standardised folder structure and the metadata, we want to create a framework in which researchers from different disciplines can easily find their way around. A psychologist could thus check out microscope data from biology and immediately understand the central aspects – and could potentially use them for further research.
That does sound very promising for research.
Pacharra: Indeed. Let’s consider big data. Some research findings only come to light by collecting large amounts of data.
Otto: That’s also the idea behind our focus groups in SFB 1280. The availability of data is essential to formulate new, overarching theories, for example by combining data from animals and humans to gain new insights or to discover mechanisms.
We hope for more transparency.
Marlene Pacharra
Pacharra: In the future, researchers will be able to look up whether data already exists that’s relevant to their hypotheses and questions. This could speed up research processes. We also hope for more transparency overall. In psychology there’s what is known as the replication crisis. But the more meta-information we have and the more we understand about the contexts of study data, the more likely we are to detect errors and manipulations in research designs. This helps to determine which studies can be reproduced and replicated – and which cannot.
FAIR principles
The national research data infrastructure (NFDI)