DV Uploader
The repository interface of the Recherche Data Gouv repository can sometimes be limiting, when a large number of files needs uploading. While APIs can address these issues, there is also a less technical/more user friendly tool that can be used to directly upload the content of local folders: DVUploader. This article will guide you through the use of this tool.
DVUploader is a local tool that enables deposit into Dataverse software (and therefore to the Recherche Data Gouv repository) developed by the Dataverse community. As it is executed on the depositor's workstation, it allows to easily send the entire content of a folder with its subdirectories, to bypass the deposit limit of 1000 files per upload, and to update a dataset by depositing only the new files in a directory.
This tool requires installation and the use of the command line console, but it is very simple to use.
The commands to be entered in the console will be indicated with the following formatting:
Example of command line
Installation
Prerequisites
This tool requires the installation of Java (8 or higher).
Download the tool
The latest version is available at https://github.com/GlobalDataverseCommunityConsortium/dataverse-uploader/releases/. Download the .jar file in the "Assets" section of the page.
Direct download link for version 1.2.0.
Place this file in the relevant directory.
Note: Make sure you have access to this folder to be able to run it! If it is in "C:\Program Files", launch the console as administrator to be able to launch DVUploader.
Using DV uploader on Windows
The example below is performed using the sandbox of the Recherche Data Gouv. It is recommended to use it to test this tool.
1. Open the terminal
Open the Command Prompt application (C:WINDOWSystem32\cmd.exe).
Note: This can be done via the shortcut: "Win + R" then type "cmd" and "Enter
2. Go to the .jar directory
For an easier execution later on, go to the directory where DVUploader is stored. Replace {DVUploader Folder} without the braces by the path of the .jar file with the following command (see Installation):
cd {DVUploader folder}
It is possible to copy the path directly with a right click on "Properties" in the relevant folder. It is indicated after "Location:".
Note: If folder names contain spaces, the path must be enclosed in inverted commas.
3. Define the parameters as variables
Instantiate the following variables as shown by replacing the {values} with the following values (without braces):
- API Token
the API key or token associated with a user's account can be found on the repository after login in "API Token" by clicking on the user's name:
Note: this token has a lifetime of one year, it is possible to generate a new one from this same page.
set API_KEY={ API key}
Example:
set API_KEY="1212-1212-1212-1212-1212-1212-1212"
- Portal address
For the sandbox:
set SERVER="https://demo.recherche.data.gouv.fr"
Use the address https://entrepot.recherche.data.gouv.fr for production.
- Dataset DOI
Before depositing data, create a dataset, then copy the DOI (ex. 10.70112/RDGTI6). In the command, specify the {DOI} as 10.xxx/xxxx
set DOI=doi:{DOI}
Note: Other variable names can be used, but make sure that these are also the names added to the command line for depositing data
4. Upload a file
- Path to the file
There is one last variable to configure. It is different in cases 4 and 4bis:
set PATH_TO_DATA={path to data}
with a {path to data} indicating the files to drop. This path can be absolute or relative.
Example (relative path) :
set PATH_TO_DATA=DocumentsMyFile.csv
Note: If your folder names contain spaces, they must be enclosed in inverted commas:
set PATH_TO_DATA="DocumentsDepot File.csv"
- Repository
java -jar DVUploader-v1.2.0.jar -key=%API_KEY% -did=%DOI% -server=%SERVER% %PATH_TO_DATA%
The %VARIABLES% are those created previously.
Note :
- It is also possible to directly add the values indicated for the variables to the command line. This is however not recommended for the API key, for which a variable should always be used.
- In this example, the DVUploader-v1.2.0.version is used. if another version is used, the .jar file name will have to be modified .
4 Bis. Uploading files in a tree structure
- Path to the folder
It is also possible to upload the content of a folder or a directory tree. In that case, the path will be the location of the folder
set PATH_TO_DATA={path to data}
Note: the same comments as for the files apply. In addition, if quotation marks are used, do not end the string with : \"
- Repository
The command to be executed is the same as for the file, but in order to include sub-folders, add the "recurse" parameter.
java -jar DVUploader-v1.2.0.jar -key=%API_KEY% -did=%DOI% -server=%SERVER% %PATH_TO_DATA% -recurse
5. Example
set API_KEY="1212-1212-1212-1212-1212-1212-1212"
set SERVER="https://demo.recherche.data.gouv.fr"
set DOI=doi:10.70112/RDGTI6
set PATH_TO_DATA="C:\Users\Nom\Documents\Data"
java -jar DVUploader-v1.2.0.jar -key=%API_KEY% -did=%DOI% -server=%SERVER% %PATH_TO_DATA%
6. Additional parameters
Other optional parameters such as "recurse"can be used.The full set is available in the tool documentation. Only parameters considered to be the most useful are presented below.
- limit
Allows you to limit the number of files per upload.
This can be useful for testing if a whole folder does not have to be uploaded, or in case of a large number of files, they can be gradually uploaded by breaking upload into chunks as part of an automated workflow. .
Example:
java -jar DVUploader-v1.2.0.jar -key=%API_KEY% -did=%DOI% -server=%SERVER% %PATH_TO_DATA% -limit=1
- ex
Allows you to exclude files from being uploaded.
This parameter excludes files matching a given regular expression.
For example to exclude .txt files:
java -jar DVUploader-v1.2.0.jar -key=%API_KEY% -did=%DOI% -server=%SERVER% %PATH_TO_DATA% -ex=*.txt$
Note: This parameter can be used multiple times to use multiple filters.
Examples of character matches:
*: any string (Example: *.txt recognises "file.txt") .
? : a single character (Example : fast? recognizes "fasta" and "fastq").
^: start of the string (Example: ^bck* recognises "bck20200617").
$: end of the string (Example: *txt$ recognizes "file.txt" and not "datafromtxt.csv").
Using DV uploader on Linux
The procedure is the same as on Windows, this section only presents the command lines in bash.
Note: the DVUploader version number is indicated in the commands, it is necessary to modify it if another version than 1.2.0 is used.
1. Define parameters as variables (Recommended)
The various parameters of the command can be directly added to the command, but the use of variables makes it easier to read and facilitates iterative executions if necessary.
export API_KEY={ API key}
found on https://demo.recherche.data.gouv.fr/dataverseuser.xhtml?selectTab=apiTokenTab for Recherche Data Gouv's sand box)
export SERVER="https://demo.recherche.data.gouv.fr" (or https://entrepot.recherche.data.gouv.fr in production)
export DOI={DOI of the dataset where to drop the files} (in the following format {doi:10.70112/XXXX})
export PATH_TO_DATA={path to data}
2. Upload a file
If the specified path points to a file.
java -jar DVUploader-v1.2.0.jar -key=$API_KEY -did=$DOI -server=$SERVER $PATH_TO_DATA
2 bis. Upload files of a tree directory
If the specified path points to a folder, the files in that folder will be uploaded, but it is also possible to upload the content of a whole tree directory using the -recurse option.
java -jar DVUploader-v1.2.0.jar -key=$API_KEY -did=$DOI -server=$SERVER $PATH_TO_DATA -recurse
3. Example
mkdir ~/DVUploader
cd ~/DVUploader
wget https://github.com/GlobalDataverseCommunityConsortium/dataverse-uploader/releases/download/v1.2.0/DVUploader-v1.2.0.jar
export API_KEY="1212-1212-1212-1212-1212-1212-1212"
export SERVER="https://demo.recherche.data.gouv.fr"
export DOI=doi:10.70112/RDGTI6
export PATH_TO_DATA="../data/"
4. Additional parameters
See section "6. Additional parameters" in the Windows part, or the full set in the tool documentation,