To date, the Office of the National Data Commissioner has held over forty roundtables to discuss the government’s data reforms. The roundtables have been an extremely valuable process – we have heard from researchers, businesses and citizens. Participants have come with different perspectives, backgrounds and concerns. This diversity has made the consultation particularly interesting, but hasn’t prevented common themes from emerging. Two key messages from the roundtables are: people understand that data can be used to improve lives and people are worried that information about them can fall into the wrong hands.
We have heard concerns about malicious hacking of data and inappropriate information being made publicly available. In this short piece we will try to address the second concern by showing that data can still be made useful without publishing it online.
Making data public online is an excellent way to unlock its potential – anyone can use it and recombine it with their own information. This can be a boon to researchers, business and the government, but the flip-slide of these benefits is the risk that anyone can misuse the data by attempting to identify individuals.
Data custodians seem to be faced with a challenge. Do they lock data safely away, meaning it can’t be used to improve lives; or do they take a chance and release it? Neither seems to be a satisfactory response.
The situation is further complicated by language, for example the word ‘de-identified’ means different things to different people. It also comes loaded with implications: de-identified data is safe for release while identifiable data needs to be kept under lock and key. Some roundtable participants have noted that both technically and legally it’s very difficult to establish whether data has been properly de-identified so that it can’t be re-identified.
If data is published online, then the only control we have is on the data itself – we can aggregate it, remove detail or deliberately change some details to protect peoples’ identities. But we have no control over how it is used or by whom. Once data is published online, it’s effectively out there forever, so we need to be very careful. The Australian Bureau of Statistics (ABS) has developed world-leading skills in safely publishing data, but they are still limited in what they can release this way. Very detailed information poses too much risk to privacy if it is available to a large number of people.
The good news is that we don’t need to operate in this simple lock-it-up vs. set-it-free (de-identified vs. identifiable, or open vs. closed) paradigm. Increasingly, governments and businesses are finding ways to share data in a safe and controlled manner. The five Data Sharing Principles provide a practical way of getting this done by asking us to consider the:
- Project - the purpose for sharing data;
- Data - the level of detail in the data;
- Settings - the environment in which the data will be used;
- People - who is accessing the data; and
- Outputs - what results can be made public.
Applying the Data Sharing Principles
Some researchers need access to more detailed information than can safely be released online.
Rather than changing aspects of the data so that entities can’t be identified (Data Principle), agencies can look holistically across the five Data Sharing Principles to ensure the data is safe.
Additional protections that could be put in place include:
- placing controls on which researchers have access to the data (People Principle)
- being very precise about how and for what purposes researchers can use the data (Project Principle)
- specifying what gets released following the research (Outputs Principle)
- keeping the environment safe – for example using a secure data laboratory such as the Secure Unified Research Environment or the ABS DataLab (Settings Principle).
New approaches are also being developed which mean that researchers can work with the data without needing to see it. The Confidential Computing approach of Data61, and Senate Platform of Data Republic, are good examples of this approach. While some researchers will always need to work directly with the data, there are many applications where data models can be built and maintained using new approaches.
One of the aims of the Office of the National Data Commissioner is to build trust in how the government manages its data. People and businesses want credible assurances that government can manage their data competently. By moving beyond the simple open vs. closed data approach, we can take a step towards making the most of government data and building community trust.
The Data Sharing Principles provide a tool that all government agencies can use to complement their existing legislative data protections. When the new Data Sharing and Release legislation commences, the Data Sharing Principles will be enshrined as one of the key protections under that legislation.