• Menu
  • Find a dataset
  • Main content
  • Footer

République
Française

recherche.data.gouv.fr
Access Recherche Data Gouv data repository
    • Recherche Data Gouv at a glance
    • Recherche Data Gouv's organisation
    • Rechercher Data Gouv at the international
    • Join the ecosystem
    • Which research data?
    • Political strategies around data
    • Showcasing my dataset
    • Discover the data platform
    • Geographical (data clusters management and institutional reference centres)
    • According to my discipline (Thematic reference centres)
    • via resources (Centres de ressources)
    • on the Recherche Data Gouv repository (thanks to the Platform resource center)
    • Network of Competence Centers
    • Recherche Data Gouv repository
    • Institutional spaces
    • Trusted repositories
    • Recherche Data Gouv repository guides
    • Tutorials
    • Ressources juridiques
    • FAQ
  • News
Access Recherche Data Gouv data repository
  1. Home
  2. Depositing a dataset
    • How to choose a data repository
    • All you need to know about the Recherche Data Gouv Repository
    • Creating an account
    • Before depositing
    • Data management plans and the Recherche Data Gouv repository
    • Source codes and the Recherche Data Gouv repository
    • Depositing a dataset
    • Publishing a dataset
    • Publication process schemas for a dataset
    • Modifying and managing versions of a published dataset
    • Generating a data paper template
    • Withdrawing a published dataset from dissemination
    • Curators' charter
    • The aim of curation
    • Levels of curation
    • In practice
    • The curator's rights regarding datasets
    • Publication process schemas for a dataset
    • Administrators' charter
    • All you need to know about the Recherche Data Gouv Repository
    • Presentation of a collection
    • Creating a collection
    • Modifying the parameters of a collection
    • Linking a dataset to a collection
    • Complementary features
    • Browsing through collections
    • Searching for data
    • Displaying and exploring data
    • Guide to entering common metadata
    • Value-lists controled metadata
    • Guide to entering geospatial metadata
    • Guide to entering file metadata
    • Deposit Cheat-Sheet
    • Ingesting csv files
    • Recommendations on large datasets
    • README template
    • Curation report template
    • DV Uploader
Print

Depositing a dataset

Updated at: 13/08/2025
  1. Creating a dataset
  2. Entering the first batch of metadata
  3. Adding the associated files to a dataset
    1. Tabulated data files
    2. Conditions for the effective ingestion of tabulated data
  4. Saving a dataset
  5. Completing the metadata
  6. Indicate the terms of use for the dataset
    1. Licences
    2. Conditions for restricted access files
    3. The guestbook
  7. Managing the rights associated to datasets and files
    1. Rights associated to datasets and files
    2. Assigning a role to a dataset
    3. Restricting access to a data file
    4. Applying an embargo to a data file
    5. Giving access to an unpublished dataset (private URL)
    6. Giving access to an unpublished dataset (URL for anonymized access)
  8. Case of blind peer-reviewed datasets

Creating a dataset

Please go to the collection you have identified (please see Identifying your depositing space in the Before depositing guide).

Click on Add Data > New Dataset

Add data, New dataset

A collection may provide one or more templates in which some metadata, including the General Terms and Conditions and the licence, have been pre-filled in. If there is a suitable template this should be selected when creating a dataset as it cannot be applied retrospectively or changed.

Create a dataset from a template

If the collection does not provide a template you can request one from the collection administrator using the Contact button.

Entering the first batch of metadata

Please enter the obligatory metadata which is marked with a red asterisk as well as the recommended metadata that is available when you create your dataset (it will be necessary to modify the dataset after saving to complete the metadata).

Please see the Guide to entering metadata.

FAIR - FIR

Entering the recommended metadata helps comply with the following principles:
- Findable (title, keywords, etc.).
- Interoperable (using metadata standards and controlled vocabularies)
- Reusable (information on the characteristics and context of the creation or collection of the data.

 

Adding the associated files to a dataset

One or more files can be associated with a dataset in the Recherche Data Gouv repository. 

A file is also assigned its own DOI which is linked to the dataset's DOI. If the files have been deposited in another repository, the link to these will be given in the dedicated "Link to data" metadata.

All file types are accepted (tabular, text, pdf, image, video, audio, SHP, etc.). However, in the current context that favours data openness and reuse, it is strongly recommended to choose a format that is open or widely used and also machine readable.

FAIR - IUsing open formats complies with the Interoperable principle as such files can be read and modified using any software designed to process them (image, text, audio, etc.)

 

Please see: A DoRANum ressource nammed : Open of Close Format ? 

If the files have been deposited in a different repository to Recherche Data Gouv, please indicate the link to the data in the dedicated "Link to data" metadata.

Note: Files can still be added after the dataset has been saved or published.

Click on Select Files to Add or drag and drop the file(s).

All file formats are accepted.

Upload one or more files

Fill in the specific metadata for the file:

  • File name: auto-filled, can be modified
  • File Path if necessary
  • Description
  • Tags. There are three default labels: Data, Documentation and Code.
  • Provenance

File metadata

The media type (MIME type) of the file will be recognized even if the file has no extension. The Dataverse software may propose a preview of the file depending on its type.

The maximum size for each file uploaded is 50 GB.

It is recommanded to upload a maximum batch of 200 files in one transfer via the user interface. If you have more files than that, you must use the DVUploader tool and the Direct Upload Dataverse API.

When files are uploaded to a dataset, they are assigned:

  • a digital fingerprint enabling the integrity of the data (no corruption of the file) to be checked: UNF for tabulated files, MD5 for other formats (please see the footnotes);
  • a DOI.

For more information regarding large amount of files deposited and datasets' sizes, please refer to "Recommandations on large datasets"

Tabulated data files

The Dataverse software integrates xlsx (Excel), csv, tsv, R data, SPSS and Stata files as a tabulated .tab file (open format). The original format also remains available for download.

Note: Only tabular data files that are smaller than 500 MB are transformed into .tab files.

The file is analysed by the Dataverse software during the upload and the message “Chargement en cours” ("Loading in progress") is displayed:

Uploading a file

When the upload is complete, the message "The operation has succeeded! - The tabular files have been uploaded" is displayed and a message is sent to the depositor ("Your ingest has successfully finished!").

The numbers of variables and observations are displayed in the file metadata:

Number of variables and observations in a tabular file

It is strongly advised to verify those informations are correct !

If the file couldn't be analysed by the Dataverse software, an error signal is displayed yet the file is imported in its original format.

Tabular file uploaded with an ingestion error

The Dataverse software will send a mail entitled “Your ingest has finished with errors!” to the depositor. The type of error is not indicated.

Conditions for the effective ingestion of tabulated data

  • General recommendations
    • UTF-8 encoding for files containing special characters,
    • no empty headers or missing cells (see table below; empty cells are accepted),
    • each column header must have a different name,
    • if your file contains more than 1024 columns, it will be submitted but cannot be ingested,
    • no line break in a cell.
  • If the file is in Microsoft Excel format
    • each Excel file must contain only one tab/sheet, with the variables on the first line (column headers) and one observation per line.
      Warning! if there are several tabs, only the first one is ingested by the Dataverse software and will be taken into account in the display, exploration and export in tabulated format,
    • no merged cells,
    • no legend,
    • To help identify errors in an Excel file, one solution is to open the xlsx file with LibreOffice Calc and save it as a .csv file with UTF-8 encoding. See the procedure for Ingesting csv files
  • If the file is in csv format
    • use the comma as a separator (the semicolon is not accepted by Dataverse software),
    • the decimal separator must be the full stop (otherwise commas will be understood as separators),
    • in text cells containing commas, the text must be enclosed in inverted commas (otherwise the commas will be understood as separators).

example of a csv file with an error:
 ColA,ColB,ColC
 1,3
 4,5,6

example of a csv file without errors:
ColA,ColB,ColC
 1, ,3
 4,5,6

example of a Excel file with an error:

Excel file with error: variable missing in column D

example of a Excel file without an error:

Excel file without errors

Also see: Broman, K. W., & Woo, K. H. (2018). Data Organization in Spreadsheets. The American Statistician, 72(1), 210. https://doi.org/10.1080/00031305.2017.1375989

 

Please see: Tabular Data File Ingest to find out more about the processing of tabulated data by the Dataverse software.

NB : the cheat sheet Ingesting csv files details the steps for converting to UTF8 encoding and defining the comma as the value separator for a CSV using LibreOffice CALC software.

Saving a dataset

Click on Save Changes.

The dataset will be given provisional unpublished status.

A DOI will be reserved and activated when the dataset is published.

Completing the metadata

When a dataset is created, only a limited amount of metadata is visible and can be filled in. To complete and enrich the metadata description of the dataset, this must be modified after the first time it is saved.

Metadata can be edited on the dataset display page accessed via the menu Edit Dataset > Metadata

Modify the metadata of a dataset

or via the tab Metadata > Add + Edit Metadata.

Modify the metadata of a dataset via the Metadata tab

Please refer to the Guide to entering metadata to find out about the metadata that needs to be entered.

Indicate the terms of use for the dataset

The following can be specified in the terms of use:

  • the licence assigned to the dataset,
  • the conditions for access to restricted files,
  • the existence of a guestbook.

These conditions apply to all the dataset's files.

The terms and conditions of use are available on the dataset's display page via this menu - Edit Dataset > Terms.

Modify the terms of use of the dataset

or via the Terms > Edit Terms Requirements tab.

Terms tab, modifying the terms of use

Licences

Note: it is not possible to assign different licences to different files within the same dataset.

Open licenses Etatlab 2.0 is the default licence assigned to a dataset by the Dataverse software. T

Access Recherche Data Gouv data repository

Ministère
de l'Enseignement
supérieur,
de la Recherche
et de l'Espace

Contact Recherche Data Gouv
Access the contact form
Talk about Recherche Data Gouv
Access to the communication kit

Follow us
on social networks

  • legifrance.gouv.fr
  • gouvernement.fr
  • service-public.fr
  • data.gouv.fr
  • Legal notices
  • Releases notes
  • Sitemap
  • Accessibility: non-compliant
  • Cookies management

Unless otherwise stated, all content on this site is under licence etalab-2.0, source code is under license GNU GPL V3.

Back to top