European genome-phenome archive (EGA)

Genomics

Accepted data

Health and medical data, Phenotypes, DNA microarray

Supporting institutions

European Molecular Biology Laboratory European Bioinformatics Institute (EMBL-EBI)

Center for Genomic Regulation

Persistent identifiers

DOI

Volume limit

Volume limit per file : Unknown limit

Volume limit per dataset : Data submission is done through a "submission box" with a 8 TB limitation.

Repository provided moderation

Automatic and human. Every submission is subject to a documented quality control (File Quality Control Report). Average time between submission and publication: 1 month.

Notes

Sensitive data from biomedical research. Genetic sequences (generic or specific formats). DNA chips: from raw signal files to arrays. Phenotypes (all formats).

The repository is recommended for potentially re-identifiable data requiring access control. All submissions require the use of an encryption tool (crypt4gh). For other types of genetic data, other repositories are preferable. https://ega-archive.org/submission/metadata/submission/FAQ/

The repository recommends the use of controlled vocabularies (Experimental Factor Ontology Database) to describe phenotype data.

Metadata schema based on XML, JSON and ENA repository: https://ega-archive.org/submission/metadata/ega-schema/

Embargo: 1 year with possibility of extension on justification. Only metadata are made public. Access to data is subject to a very strict authorization request, requiring the approval of a Data Access Committee designated for each dataset.