However, despite more data being published in open formats, data scientists, journalists and analysts are often left with a daunting and time-consuming task of not only finding relevant data and discovering new datasets, but most importantly understanding it before any analysis can be done. That information should be found in the metadata that should couple the data published.
Metadata is, in essence, structured information that makes it easier to retrieve, use or manage an information resource. In practice, metadata describes a dataset and its structure, and helps users discover it. The information usually includes such basic elements as: title, who published the dataset, when it was published, how often it is updated and what license is associated with the dataset. These are classed as ‘descriptive metadata’ as opposed to ‘structural metadata’, which describes for example information on page layout or an object’s component and their relationships (such as chapters or tables in a book).
This paper investigates how open data portals share their metadata and explores the most prevalent underlying metadata standards used. It seeks to understand to what extent the metadata standards used by the predominant open data platforms are interoperable. Interoperble metadata across open data portals enables datasets to be discoverable,
re-useable and searchable across portals rather than ‘siloed’ within them (this is called a federated search).