• Menu
  • Find a dataset
  • Main content
  • Footer

République
Française

recherche.data.gouv.fr
Submit / Publish a dataset
    • Recherche Data Gouv at a glance
    • Why Recherche Data Gouv ?
    • Recherche Data Gouv's governance
    • Recherche Data Gouv Trajectory
    • Rechercher Data Gouv and the EU
    • Data management clusters
    • Network of Data Management Clusters
    • Thematic reference centers
    • Resource centers
    • Institutional reference centers
    • Plateforme nationale des données de la recherche
    • Join the ecosystem
    • Which research data?
    • Recherche Data Gouv repository
    • Trusted repositories
    • Institutional spaces
    • Valuing your datasets
    • Political strategies around data
    • Recherche Data Gouv repository guides
    • Open classes
    • Tutorials
    • FAQ
    • Glossary
  • News
Submit / Publish a dataset
  1. Home
  2. Recommandations on large datasets
    • How to choose a data repository
    • All you need to know about the Recherche Data Gouv Repository
    • Creating an account
    • Before depositing
    • Data management plans and the Recherche Data Gouv repository
    • Depositing a dataset
    • Publishing a dataset
    • Publication process schemas for a dataset
    • Modifying and managing versions of a published dataset
    • Generating a data paper template
    • Withdrawing a published dataset from dissemination
    • Curators' charter
    • The aim of curation
    • Levels of curation
    • In practice
    • The curator's rights regarding datasets
    • Publication process schemas for a dataset
    • Administrators' charter
    • All you need to know about the Recherche Data Gouv Repository
    • Presentation of a collection
    • Creating a collection
    • Modifying the parameters of a collection
    • Linking a dataset to a collection
    • Complementary features
    • Browsing through collections
    • Searching for data
    • Displaying and exploring data
    • Value-lists controled metadata
    • Guide to entering common metadata
    • Guide to entering geospatial metadata
    • Guide to entering file metadata
    • Deposit Cheat-Sheet
    • Ingesting csv files
    • Recommandations on large datasets
    • README template
    • Curation report template
    • DV Uploader
Print

Recommandations on large datasets

Updated at: 27/05/2025

Introduction

This document sets out the best practices for managing large datasets that are deposited in the Recherche Data Gouv repository.
The values and ideas given below are recommendations derived from experience rather than having developed because of any technical limitations linked to the tool.
If you have specific requirements that are not covered by this document, please feel free to reach out to the platform's resource centre at: support-recherchedatagouv@inrae.fr.

 

General information on file uploads

There are three methods available for uploading files to the repository:
 
1. **Deposit Interface**: This method is recommended for datasets that are less than 50 GB or contain fewer than 200 files. Please note that the maximum file size limit is 50 GB.
 
 2. **DVUploader Application**: This is the best option for datasets made up of more than 200 files or over 50 GB in size. DVUploader also allows you to maintain the file tree structure when the deposit includes multiple directories or files.
 
 3. **S3-Direct-Upload API**: This method is intended for users who are comfortable working with APIs. However, it is not recommended to use any other APIs for file deposits.

 

Important considerations

DOI attribution

Each file you deposit will be assigned a Digital Object Identifier (DOI). It is the depositor's responsibility to organize their files coherently within their dataset. While it may not be necessary to cite every individual file, it's important to consider which elements of the dataset should have this functionality.

Datasets with multiple files

To preserve the file structure without using DVUploader, you can organize files into compressed folders (.zip, .xz, .7z, .bzip, .gz). Please note the following:
Only the ZIP format enables you to preview the tree structure and download files individually.
 We recommend keeping the following files outside of any compressed folder:
            - Files that require a DOI (citation).
            - Files that enhance the dataset's accessibility (such as Readme, metadata files, illustrative images, etc.).
            - Files for which previewing and/or ingestion is desirable.

Datasets with large files (over 100 GB per dataset)

As noted earlier, we recommend using DVUploader for submission of datasets larger than 50 GB. Additionally, please consider the following:

- Do not split large files to bypass the 50 GB limit. Instead, use the DVUploader application for the upload.
 - Using an open or discipline-specific compression format.

For datasets in the terabyte (TB) range, please contact the platform's resource centre at support-recherchedatagouv@inrae.fr in advance.
For institutional datasets, please ensure compliance with the limit set by your agreement (currently 5 TB). If you are approaching or exceeding this limit, please contact the platform's resource center at support-recherchedatagouv@inrae.fr.

Submit / Publish a dataset

ministère
chargé
de l'enseignement
supérieur
et de la recherche

Contact us
Access the contact form
Talk about Recherche Data Gouv
Access to the communication kit

Follow us
on social networks

  • legifrance.gouv.fr
  • gouvernement.fr
  • service-public.fr
  • data.gouv.fr
  • Legal notices
  • Releases notes
  • Sitemap
  • Accessibility: non-compliant
  • Cookies management

Unless otherwise stated, all content on this site is under licence etalab-2.0, source code is under license GNU GPL V3.

Back to top