Skip to Main Content

Research Data Management: Storing Research Data. Data Directories and Repositories

A guide to define and explore Research Data Management

Data Directories

Open Access Directory of Data Repositories - This is a list of repositories and databases for open data for a wide range of subject areas. 

General Directories

Figshare is a provider of open research repository infrastructure. Our solutions help organizations and researchers share, showcase and manage their research outputs in a discoverable, citable, reportable and transparent way. It is a repository for sharing all types of research output in any subject - includes papers, figures, posters, slides.

Figshare is a provider of repository software.

We support organizations and researchers in meeting the growing demands for research to become open, freer, FAIRer and more connected. Figshare provides the flexibility and control for you to create research management workflows that work for you. 

Amazon Web Services Public Data Sets * - This registry exists to help people discover and share datasets that are available via AWS resources. It hosts a variety of large public datasets, such as Landsat, census, and genomic data. Creating an account may be required and charges may apply for computing time and data transfer. 

See all usage examples for datasets listed in this registry.

See datasets from Allen Institute for Artificial Intelligence (AI2)Digital Earth AfricaData for Good at MetaNASA Space Act AgreementNIH STRIDESNOAA Open Data Dissemination ProgramSpace Telescope Science Institute, and Amazon Sustainability Data Initiative.

Discipline-related Repositories

  • HUMANITIES

Linguistics

OLAC – Open Language Archives Community  is an international partnership of institutions and individuals who are creating a worldwide virtual library of language resources by: (i) developing consensus on best current practice for the digital archiving of language resources, and (ii) developing a network of interoperating repositories and services for housing and accessing such resources. It is an international partnership “creating a worldwide virtual library of language resources,” currently with 58 participating archives. 

 

TROLLing-Tromsø Repository of Language and Linguistics - An open access repository of linguistic data and statistical code. 

 

Music

Mutopia Project - Free sheet music. Pieces of music – free to download, modify, print, copy, distribute, perform, and record – all in the Public Domain or under Creative Commons licenses, in PDF, MIDI, and editable LilyPond file formats 

  • SCIENCES

Biology/Life Sciences

DRYAD - General purpose repository for data underlying scientific and medical publications, historically with a concentration in life sciences. 

 

Gene Expression Atlas - Information on gene expression patterns under different biological conditions, such as different cell types, organism parts, or diseases. The home for big data in biology. 

 

genenames.org (HUGO Gene Nomenclature Committee) - Curated repository of HGNC approved gene names and symbols, gene families, and links to related genomic, proteomic, and phenotypic information. 

 

NCBI (National Center for Biotechnology Information) - The National Center for Biotechnology Information advances science and health by  providing access to biomedical and genomic information. It provides access to a variety of sources for biomedical and genomic data, including:

  • Conserved Domain Database (CDD) - Sequence alignments and profiles representing protein domains conserved in molecular evolution.
  • Gene - Gene data from a variety of species with related information, such as nomenclature, chromosome location, phenotypes, etc.
  • Database of Genotypes and Phenotypes (dbGaP)  - Data and results from investigations of the interaction of genotypes and phenotypes in humans.                     
  • WormBase - Data on the genetics, genomics, and biology of C. elegans and some related nematodes.

 

UniProt (The Universal Protein Resource) - Collection of databases that provide a comprehensive source for protein sequence and annotation data, including a repository for metagenomics and environmental data. 

 

Chemistry

eCrystals - Mostly open access source of fundamental and derived data from single crystal X-ray structure determinations from the University of Southampton and EPSRC UK National Crystallography Service. The information contained within each entry of this archive is all the fundamental and derived data resulting from a single crystal X-ray structure determination, but excluding the raw images. 

 

PubChem - Database of chemical substances with descriptive and property information along with bioactivity screening data. PubChem mostly contains small molecules, but also larger molecules such as nucleotides, carbohydrates, lipids, peptides, and chemically-modified macromolecules. We collect information on chemical structures, identifiers, chemical and physical properties, biological activities, patents, health, safety, toxicity data, and many others. 

 

Zinc15 - Database of commercially available compounds with 3-D structure representations in a format ready for virtual screening for potential biological activity. Zinc15 a free database of commercially-available compounds for virtual screening. ZINC contains over 230 million purchasable compounds in ready-to-dock, 3D formats. ZINC also contains over 750 million purchasable compounds you can search for analogs in under a minute. 

 

Traditional Journals for Science Data

These traditional "data journals" publish only articles that focus on presenting data, either experimental or computational, or may review experimental methods.

Journal of Physical and Chemical Reference Data - Publishes articles reporting critically evaluated reference data and property measurements.

Journal of Chemical and Engineering Data - Publishes both experimental and computational data.

 

Data Journals  or "Data Paper" Journals

These newer style "data journals" primarily publish articles that describe publicly available datasets and link to those datasets.They may also publish articles on data-related topics, such as describing or reviewing certain analytical or statistical methods. However, traditional research articles that actually analyze the data and draw conclusions from that analysis are generally outside the scope of these journals.

Biodiversity Data Journal - Community peer-reviewed and open-access. Promotes the publishing, dissemination and sharing of biodiversity-related data of any kind. Publishes data papers, general articles, software descriptions, species inventories, and more.

Earth System Science Data - An international interdisciplinary journal that provides a distinctive model for publishing papers about original research data sets and encouraging the reuse of high quality data. Includes methods and review articles and a "living data" process for handling datasets that undergo regular updating or extension.

IUCrData - Open-access and peer-reviewed. Provides descriptions of crystallographic datasets and datasets from related disciplines.

Scientific Data  - Open-access and peer-reviewed. Its Data Descriptor articles describe data sets, the method of data collection and analyses relating to the quality of the data. They also link to one or more published sources of the data.

 

Mixed Journals

These journals publish a mixture of article types, including "data papers" that describe datasets along with traditional research articles and other formats.

International Journal of Robotics Research - Publishes peer-reviewed data papers and multimedia extensions in addition to articles.

Internet Archaelogy - Open access and peer-reviewed. Publishes data papers as well as research articles, methodologies, reviews and more.

Nucleic Acids Research -  For more than 20 years has published a special issue in January that reports on databases containing data related to bioinformatics generally, including nucleic acids, proteins, and genomics.

 

A Growing List of Data Journals  (from Data@MLibrary)

Open Data Journals (from the FOSTER project)

  • SOCIAL SCIENCES

Economics

GTAP Database – Global Trade Analysis Project  - The centerpiece of the Global Trade Analysis Project is a global data base describing bilateral trade patterns, production, consumption and intermediate use of commodities and services. It describes bilateral trade patterns, production, consumption and intermediate use of commodities and services. 

 

GeoFRED® - Geographical Economic Data - Maps of data contained in FRED®. Create customized maps and download data. 

 

 

Peer Review. Data sets

Sources of Dataset Peer Review  (from the Edinburgh DataShare Wiki)

Survey Data

Survey Data

Survey data, including data from long-running surveys, series and longitudinal studies, are a major part of social science research. This section provides guidance and training resources around using survey and longitudinal data including short videos, on-demand webinars, event materials and more detailed written guides.

What is Stata? (PDF). 

Stata is a statistical software package for data analysis. You can use Stata by pointing and clicking, or by using the command syntax.  The software can support complex analysis, and, as it is so programmable, developers and users continue to add new features.

 

  1. advanced statistical analysis
  2. a vast library of machine learning algorithms
  3. text analysis
  4. open-source extensibility
  5. integration with big data
  6. seamless deployment into applications.

Its ease of use, flexibility and scalability make SPSS accessible to users of all skill levels. What’s more, it’s suitable for projects of all sizes and levels of complexity, and can help you find new opportunities, improve efficiency and minimize risk. SPSS

SPSS is a software package for Windows and can be used to to produce graphics of data as well as other data analysis. 

 

 

Nesstar enables you to search, browse, visualise, analyse and download a selected range of different kinds of social and economic data, from survey data to multidimensional tables.  

UK Data Service

 

  • CLOSER (Cohort & Longitudinal Studies in Enhancement Resources) aims to maximise the use, value, and impact of longitudinal studies. Part of their work includes training and capacity building for researchers. They have a Learning Hub with information and resources on longitudinal data run training events. We are the interdisciplinary partnership of leading social and biomedical longitudinal population studies.

 

  • P|E|A|S (Practical Exemplars and Survey Analysis) provides useful information and examples for researchers analysing data from complex samples. It also includes sections on survey non-response.

  •  
  • NCRM provide methodological training and resources to help people interested in social science research methods.