Skip to content
This repository has been archived by the owner on Sep 1, 2021. It is now read-only.

Related Projects

Simon Li edited this page May 30, 2014 · 6 revisions

There are several projects which have attempted to define standards for tabular information exchange. Some of these may be useful in defining the OMERO.features API, or potentially form a backend storage mechanism.

Dat

dat is an open source tool that enables the sharing of large datasets, the goal being a collaboration flow similar to what git offers for source code.

  • This is a very recent project started by Max Ogden, GitHub repo
  • What is dat
  • git for open-data
  • Manages data sychronisation and collaborative modifications to data
  • Aims to supports billions of rows and real-time access

Some of the GitHub issues look really relevant, such as common data structure for tabular data

Data

data - package manager for datasets

Contains some interesting ideas, could be relevant for sharing data across multiple OMERO servers.

OPeNDAP

OPeNDAP provides software which makes local data accessible to remote locations regardless of local storage format

NetCDF

NetCDF is a set of software libraries and self-describing, machine-independent data formats that support the creation, access, and sharing of array-oriented scientific data.

  • Variables: a multi-dimensional array, has a name, type and shape
  • Attributes: has a name, type, length and value (can be an array); can be global (per dataset) or per variable
  • Supports user-defined types including compound types

data.okfn

We're creating lightweight standards and tooling to make it effortless to share and get data

An Open Knowledge Foundation project to publish useful datasets in easy to use formats, and attempting to define simple interchange formats

  • Data Packages
  • Tabular data Packages
  • For example each dataset could consist of a JSON file describing the dataset, a CSV file containing the data, and a JSON file to provide additional metadata for each CSV column

Key ideas include keeping the format simple so that everyone can use it without rewriting their tools.

Resource Description Framework

RDF is a standard model for data interchange on the Web

Defines the structure of data based on triplets, Subject - Predicate - Object.