Archiving as a new focus for research data

Updated at: 05/02/2025

The Recherche Data Gouv ecosystem is at the service of sharing and opening up research data, in particular by setting up the repository of the same name. However, Recherche Data Gouv, like other repositories, is not intended to ensure the long-term archiving of the datasets it contains (secure hosting is guaranteed for a period of approximately 5 years, renewable).

Nevertheless, some scientific data are of very long-term interest and deserve to be preserved in a quasi-heritage approach. As evidenced by the recent R.I.P Data colloquium, questions are multiplying in the research communities: what criteria can be applied to select the data to be archived? On which infrastructures... ?

In this context, a new working group dedicated to the archiving of scientific data was created within the network of data management clusters: the WG 7

Regularly confronted with researchers' questions on the subject, the network decided to dedicate a working group to it, drawing in particular on the complementary profiles of librarians, documentalists and archivists present in its teams. The skills of archivists are particularly valuable in addressing issues related to the regulation and management of archives in France.

The primary task of WG 7 will be to identify the issues facing researchers and support staff as soon as the data management plans are drawn up. WG 7 will also seek to establish links with the other working groups concerned with archiving beyond the Recherche Data Gouv ecosystem, whether at the level of the Research Data College of the french Committee for Open Science, CollEx-Persée and its programme dedicated to scientific archives, or the WG "Scientific Archives" of the Aurore section of the Association des Archivistes Français (AAF).

The work of WG 7 has started with a view to these two deliverables:

  • a factsheet on the selection criteria for sorting the data to be retained in the long term, in partnership with the resource center DoRANum,
  • an inventory of the concrete problems encountered in the field, which will be reported to the Committee for Open Science.

Clarifying the concept of archiving for research data

Authors: Christine Hadrossek (DDOR CNRS), Laure Bézard and Romain Boissat (Maison de l'Orient et de la Méditerranée), Océane Valencia (Sorbonne University), and Marie-Laure Bachèlerie (DSI CNRS)

In the context of research data management, the concept of archiving is often misunderstood, as it is frequently confused with related concepts such as storage, sharing, or publishing of data. However, archiving has a specific definition and purpose, and it is a distinct process that is essential for ensuring both the preservation and regulatory compliance of data.

What archiving is not

  • Archiving is not storage: Storage involves recording information on a physical medium (USB drive, hard disk, magnetic tape, or equivalent) for individual access. It is intended for immediate or short-term use, while archiving ensures the long-term preservation and intelligibility of data.
  • Archiving is neither sharing nor publishing data: When data is deposited in a repository such as Recherche Data Gouv, the goal is to make it accessible to the scientific community, promoting its reuse or valorization. Archiving, on the other hand, is part of a heritage and regulatory process.

So, what is archiving?

Archiving research data can be defined as the set of practices aimed at preserving data over time, ensuring their integrity, authenticity, and intelligibility, for purposes of evidence, memory, or public interest.
In France, the Code du patrimoine (Article L211-1) reminds us that archiving is a legal obligation for all documents, including data from public research. Long-term archived data must be transferred to a public archives service under the scientific and technical control of the archives administration.
"The legal definition of archives is much broader than the common understanding of 'archives' as old documents. It covers all types of documents and data, regardless of their form or medium. Thus, an infinite number of types of research data (field notebooks or photos, interview recordings, databases, algorithms...) are archives from the moment of their creation. Their management is therefore carried out in consultation with all stakeholders throughout the data lifecycle to, for example, ensure their accessibility or properly preserve those that cannot be opened immediately. Archiving is a set of methods, processes, and tools implemented to manage the preservation and use of documents and information in the short, medium, and long term. In this regard, institutional archives services offer researchers, in addition to open access publication routes (RDG, HAL, Zenodo, Nakala...), solutions for preserving and communicating research data that ensure compliance with various regulations (Code du patrimoine, GDPR, scientific integrity...) to help build the scientific heritage of tomorrow."

Complementary but distinct purposes

When you deposit your data in the Recherche Data Gouv repository, your goal is to facilitate its reuse and contribute to the open science movement. However, this is not archiving, as this deposit is revocable. Archiving, on the other hand, is only effective when you guarantee, with an archival service, the long-term preservation and accessibility of your data. It thus follows a different approach: it ensures preservation, often beyond the lifespan of current IT tools, in order to safeguard scientific and cultural heritage.

Why is this distinction important?

Understanding this difference allows for better planning of data management throughout their lifecycle:

  • For immediate or medium-term use: prioritize deposit in trusted thematic repositories or, if necessary, in the Recherche Data Gouv repository.
  • For long-term preservation, archiving for the history of research involves depositing your data in an archives service (building the scientific heritage of tomorrow).

Archiving and depositing in a data repository are not mutually exclusive but serve distinct and complementary purposes. By integrating them judiciously, you ensure the valorization and longevity of your research data.