• Discussion paper
  • 23 November 2016

How can Data Catalog Vocabulary (DCAT) be used to address the needs of databases?


Beata Lisowska

The Development Data Hub is an example of one of many visualisation tools available on the web that aim to make data more accessible, easy to disaggregate and comparable in an intuitive way. As more such data tools are becoming available and as the World Wide Web Consortium (W3C) argues that data published on the web should always be coupled with metadata, this paper tests how easy it is to use one of the most widely used metadata standards, Data Catalog Vocabulary (DCAT), for such a purpose.

DCAT is a well-documented, flexible and practical metadata standard that is grounded in the solid foundations of Dublin Core. DCAT is an elegant standard to use for datasets published by a single source; however, it became more complicated when applied to the Development Data Hub or its underlying Data Warehouse.

This paper aims to find a practical approach to applying the DCAT standard to satisfy the needs of both a portal that provides dynamic visualisations and a database that provides the data to drive them. As we learn, this is a complex and tricky task. It would appear that a single instance of DCAT cannot handle the complexity of the data journey from its source to the final visual representation.

Why do we need DCAT to handle this problem? The transitions from data source to data warehouse to data series through datamart and finally to dataset cannot only be comprehensible from a human point of view. The logic needs to be encoded in a machine-readable way so that machines can point a data-user back to the original source of the transformed data and allow the discoverability and searchability of related datasets. This is in its heart a joined-up (meta)data standards problem.

This discussion paper was written as a part of the Joined-up Data Standards project, a joint initiative between Development Initiatives and Publish What You Fund.