At the Friday Seminar that preceded this year’s UN Statistical Commission, Open Data Watch’s Eric Swanson asked me a challenging yet pertinent question following my presentation to the plenary. He asked: “The definition and principles of ‘open data’ are quite clear and simple but the principles of joined-up data are less clear. Can you enunciate five principles of joined-up data that could serve as a practical guide for others?”
This is a question that we at the Joined-Up Data Standards (JUDS) project have been beginning to answer through our discussion papers, blogs and consultation paper. That said, Eric touched on a real gap in terms of concrete guidance when it comes to a commonly recognised list of principles for interoperability – the ability to access and process data from multiple sources without losing meaning, and integrate them for mapping, visualisation, and other forms of analysis – at a global level.
This blog builds on the answer that I gave to the Friday seminar and sets out five core interoperability principles:
Principle 1: Use and reuse existing data standards
Perhaps the most basic principle that underpins joined-up data is the notion that new classifications – how data are described – and standards – schema into which data are input – should not be developed unless absolutely necessary. Where possible, those seeking to develop a new standard should spend time considering what is already out there and whether an open data standard already exists that can simply and easily be adapted to their needs. This principle is implicitly recognised within our consultation paper, where we suggest a ‘checklist for new data standards’ as a guide for anyone seeking to produce a new data standard. Moreover, any new standard developed must be compatible with existing standards.
Principle 2: Don’t forget metadata
Metadata standards are arguably the most important prerequisite to joined-up data. Metadata includes information on the source of a piece of data, its author, the version being published and the link to the original dataset. Taken together, this information is crucial to ensuring that both machine and human users can discover, identify and contextualise data. Ensuring that machine-readable metadata formats are standardised and used across data producing institutions and bodies therefore greatly enhances the ability of data to be joined-up.
These attributes make metadata particularly important for the official statistics community as it starts to consider how statistical data can be made open by default. As my colleague Beata Lisowska recently put it in another blog, when it comes to metadata, “in essence, we’re really asking: can we trust this data?”
Principle 3: Use common classifications wherever possible
As more and more data are made open and proactively published by governments, international institutions, private sector actors, open standard initiatives and others, we need to make sure that the language used – or the classifications to which data are published – is the same. Often, similar information is classified using slightly different definitions, which hinders the machine-readability and so interoperability of that data. Within the international development sector, it’s crucial that data standards are fit for purpose and actively used, or at least linked to, by all stakeholders producing data.
Classifications of organisations and time formats are two cases in point where the absence of universally agreed definitions can seriously inhibit broad-scale interoperability. The identify-org.net site succinctly explains why the issue of organisational identifiers is important: “If my dataset tells you I have contacts with ‘IBM Ltd’. ‘International Business Machines’ and ‘I.B.M’ – how many firms am I working with?” Unique identifiers would go a long way to overcoming basic semantic challenges like this.
The United Nations Statistics Division has published a registry of classifications that it maintains at UN Classifications Registry. However the list does not include other international classifications such as UNESCO’s International Classification of Education (ISCED), WHO’s International Classification of Disease, or many other important classifications. A comprehensive inventory of all international and relevant national classification systems would be a boon to interoperability.
Principle 4: Publish data in machine-readable formats
For joined-up data solutions to offer real efficiency gains and value, it’s imperative that a machine is able to do most of the hard work in joining up the data. This is already possible but requires many data publishers to change the way they currently publish their data. Publishing data only in PDF format is not enough; data must also be published in machine-readable formats such as RDF, XML and JSON. Publishing in these formats would enable a computer to access, identify and filter data in an automated way, making it far simpler and less time-consuming for data users to put data to good use.
Principle 5: Ensure standards are user-driven
The explosion in open data publication that has taken place over the last twenty-odd years has happened with the key consideration of ‘openness’ at its heart. Whilst this is great and important, openness does not automatically equate to usability. For data to be usable they must be driven by the needs of users themselves. Take the Humanitarian eXchange Language (HXL) standard for example. Its beauty and functionality emanate from its incredible simplicity and ease of use. The process of ensuring that an interoperable standard is ‘usable’ can be a complex one that requires trial and error. Sticking with the HXL example, a linked-data approach was tried and tested but failed given the complexity of user needs in the humanitarian space. A hashtag approach was later agreed, which put user-needs at the heart of the endeavour .
These are some of the core principles of interoperability that we’ve uncovered during our research. They offer a starting point for further discussion and we will continue to explore these issues and others with the various stakeholders involved. One thing that we can be sure of already is that to find solutions to interoperability challenges, political will and policy coordination between governments, international organisations, open standard setters and others is key.
Tom Orrell is the Senior Advocacy Adviser on Joined-up Data Standards, based at Publish What You Fund’s London office. His role is to highlight the value that joining up open data standards can deliver for standard setters, data producers and users alike.