Depositing a dataset
Creating a dataset
Please go to the collection you have identified (please see Identifying your depositing space in the Before depositing guide).
Click on Add Data > New Dataset

A collection may provide one or more templates in which some metadata, including the General Terms and Conditions and the licence, have been pre-filled in. If there is a suitable template this should be selected when creating a dataset as it cannot be applied retrospectively or changed.

If the collection does not provide a template you can request one from the collection administrator using the Contact button.
Entering the first batch of metadata
Please enter the obligatory metadata which is marked with a red asterisk as well as the recommended metadata that is available when you create your dataset (it will be necessary to modify the dataset after saving to complete the metadata).
Please see the Guide to entering metadata.
Entering the recommended metadata helps comply with the following principles: |
Adding the associated files to a dataset
One or more files can be associated with a dataset in the Recherche Data Gouv repository.
A file is also assigned its own DOI which is linked to the dataset's DOI. If the files have been deposited in another repository, the link to these will be given in the dedicated "Link to data" metadata.
All file types are accepted (tabular, text, pdf, image, video, audio, SHP, etc.). However, in the current context that favours data openness and reuse, it is strongly recommended to choose a format that is open or widely used and also machine readable.
Using open formats complies with the Interoperable principle as such files can be read and modified using any software designed to process them (image, text, audio, etc.) |
Please see: A DoRANum ressource nammed : Open of Close Format ?
If the files have been deposited in a different repository to Recherche Data Gouv, please indicate the link to the data in the dedicated "Link to data" metadata.
Note: Files can still be added after the dataset has been saved or published.
Click on Select Files to Add or drag and drop the file(s).
All file formats are accepted.

Fill in the specific metadata for the file:
- File name: auto-filled, can be modified
- File Path if necessary
- Description
- Tags. There are three default labels: Data, Documentation and Code.
- Provenance

The media type (MIME type) of the file will be recognized even if the file has no extension. The Dataverse software may propose a preview of the file depending on its type.
The maximum size for each file uploaded is 50 GB.
It is recommanded to upload a maximum batch of 200 files in one transfer via the user interface. If you have more files than that, you must use the DVUploader tool and the Direct Upload Dataverse API.
When files are uploaded to a dataset, they are assigned:
- a digital fingerprint enabling the integrity of the data (no corruption of the file) to be checked: UNF for tabulated files, MD5 for other formats (please see the footnotes);
- a DOI.
For more information regarding large amount of files deposited and datasets' sizes, please refer to "Recommandations on large datasets"
Tabulated data files
The Dataverse software integrates xlsx (Excel), csv, tsv, R data, SPSS and Stata files as a tabulated .tab file (open format). The original format also remains available for download.
Note: Only tabular data files that are smaller than 500 MB are transformed into .tab files.
The file is analysed by the Dataverse software during the upload and the message “Chargement en cours” ("Loading in progress") is displayed:

When the upload is complete, the message "The operation has succeeded! - The tabular files have been uploaded" is displayed and a message is sent to the depositor ("Your ingest has successfully finished!").
The numbers of variables and observations are displayed in the file metadata:

It is strongly advised to verify those informations are correct !
If the file couldn't be analysed by the Dataverse software, an error signal is displayed yet the file is imported in its original format.

The Dataverse software will send a mail entitled “Your ingest has finished with errors!” to the depositor. The type of error is not indicated.
Conditions for the effective ingestion of tabulated data
- General recommendations
- UTF-8 encoding for files containing special characters,
- no empty headers or missing cells (see table below; empty cells are accepted),
- each column header must have a different name,
- if your file contains more than 1024 columns, it will be submitted but cannot be ingested,
- no line break in a cell.
- If the file is in Microsoft Excel format
- each Excel file must contain only one tab/sheet, with the variables on the first line (column headers) and one observation per line.
Warning! if there are several tabs, only the first one is ingested by the Dataverse software and will be taken into account in the display, exploration and export in tabulated format, - no merged cells,
- no legend,
- To help identify errors in an Excel file, one solution is to open the xlsx file with LibreOffice Calc and save it as a .csv file with UTF-8 encoding. See the procedure for Ingesting csv files
- each Excel file must contain only one tab/sheet, with the variables on the first line (column headers) and one observation per line.
- If the file is in csv format
- use the comma as a separator (the semicolon is not accepted by Dataverse software),
- the decimal separator must be the full stop (otherwise commas will be understood as separators),
- in text cells containing commas, the text must be enclosed in inverted commas (otherwise the commas will be understood as separators).
|
example of a csv file with an error: |
example of a csv file without errors: |
|
example of a Excel file with an error:
|
example of a Excel file without an error:
|
Also see: Broman, K. W., & Woo, K. H. (2018). Data Organization in Spreadsheets. The American Statistician, 72(1), 210. https://doi.org/10.1080/00031305.2017.1375989
Please see: Tabular Data File Ingest to find out more about the processing of tabulated data by the Dataverse software.
NB : the cheat sheet Ingesting csv files details the steps for converting to UTF8 encoding and defining the comma as the value separator for a CSV using LibreOffice CALC software.
Saving a dataset
Click on Save Changes.
The dataset will be given provisional unpublished status.
A DOI will be reserved and activated when the dataset is published.
Completing the metadata
When a dataset is created, only a limited amount of metadata is visible and can be filled in. To complete and enrich the metadata description of the dataset, this must be modified after the first time it is saved.
Metadata can be edited on the dataset display page accessed via the menu Edit Dataset > Metadata

or via the tab Metadata > Add + Edit Metadata.

Please refer to the Guide to entering metadata to find out about the metadata that needs to be entered.
Indicate the terms of use for the dataset
The following can be specified in the terms of use:
- the licence assigned to the dataset,
- the conditions for access to restricted files,
- the existence of a guestbook.
These conditions apply to all the dataset's files.
The terms and conditions of use are available on the dataset's display page via this menu - Edit Dataset > Terms.

or via the Terms > Edit Terms Requirements tab.

Licences
Note: it is not possible to assign different licences to different files within the same dataset.
Open licenses Etatlab 2.0 is the default licence assigned to a dataset by the Dataverse software. T
Using open formats complies with the Interoperable principle as such files can be read and modified using any software designed to process them (image, text, audio, etc.)
