Why do we care about data privacy?

How would you feel if your takeaway delivery was accompanied by an offer of a date? Sounds outlandish? Perhaps, but nearly a third of 18-34-year-olds in the United Kingdom (UK) reported receiving unwanted romantic contact after giving their personal information to a business [1]. This research, carried out by the Information Commissioner’s Office (ICO), only adds to the zeitgeist of privacy in the big data age and the accompanying issues of third-party data collection, storage, and exploitation.

Polls have consistently indicated varying levels of public anxiety about data privacy. Amnesty International found that 71 per cent of respondents were worried about how tech companies collected and used their personal data, in a 2019 poll of 10,000 individuals across nine countries [2]. Ipsos Mori published data, in 2022, showing that 84 per cent of Americans were “at least somewhat concerned about the safety and privacy of the personal data that they provide on the internet” [3]. And in PwC’s 2023 Trust Survey, it was found that protection of their personal data chiefly impacted how consumers rated their trust in companies [4].

This ongoing discussion encourages us to ask the questions: what do we consider personal information? Why does data collection feel uncomfortable? And why are we indifferent to having some data collected and analysed but not others?

How is data collected?

Data privacy is much more complex than a lot of us imagine. Internet providers store information on customers, and governments monitor the activity of some citizens. Almost every website we visit retains some kind of information—or ‘digital fingerprint’—about us: an IP address, username, location, or search history. Social media usernames and handles are saved, and email addresses are added to mailing lists so businesses can mine personal information to target us with advertising. The internet, by design, does not prioritise privacy.

How do organisations actually use our data?

The revelations by Edward Snowden and Chelsea Manning may have exacerbated suspicions that governments cannot be trusted with public data [5 ]. This view positions the right to data privacy as a civil liberty, similar to free speech or freedom of assembly. Snowden made this analogy himself when he commented that “arguing that you don’t care about the right to privacy because you have nothing to hide is no different than saying you don’t care about free speech because you have nothing to say” [6].

Of course, recurrent stories about data misuse and malpractice aren’t the full picture. In the UK, ICO published a list of common data-sharing misconceptions [7]. For example: it is often thought that only large tech companies benefit from data sharing. In fact, data sharing can produce significant social and economic benefits for a range of organisations and individuals. These include job creation, funding for research, public service delivery, and delivering a range of social services. It is also widely assumed that consent is always required to share data with organisations. In fact, organisations can usually share data without consent if there is a good reason to do so. Banks share data for fraud protection, insurance companies request information for claims, and local authorities rely on data sharing to distribute council services and process council tax.

Part of the concern stems from the sheer scale of data that exists today. Fifty years ago, it would have been inconceivable for a government to collect metadata on every phone call made by its citizens. Now, billions of us invariably share a digital listening space with advertisers alone through websites, phones and computers. A direct consequence of this is that countries around the world must now grapple with an evolving legal and legislative landscape when it comes to data privacy. There is hardly a country in the world that is not closely monitoring the activity of big tech and scrambling to explore a range of responses to beef up data privacy laws. In the last few years alone:

The European Union (EU) introduced the General Data Protection Regulation (GDPR) in 2018.
China released its long-awaited Personal Information Protection Law (PIPL).
Ireland fined Whatsapp a record €225 million for breaching EU privacy laws.
Australia drafted a parliamentary bill proposing a suite of new laws aimed at tackling data handling practices by technology companies.

One of the main arguments made in favour of more data privacy regulations is ownership: the idea that your data is your property, and any type of data collection is an infringement on that ownership and therefore your privacy.

Debates about privacy, of course, have existed far longer than the current outcry over data privacy. George Orwell’s famous Big Brother character in his novel 1984 warned against the terrors of mass surveillance. Less famous works, such as The Naked Society—written by Vance Packard twenty years prior in 1964—argued that advancements in technology were encroaching and radically shifting privacy standards. The right to privacy, however, is a tricky concept: it is difficult to exactly define yet readily invoked if we feel it is threatened. One of the issues is that there is a broad spectrum of tolerance for privacy. We may be happy to share our fingerprint with Google or Apple—possibly because our work or recreation depends on it—but feel uneasy about having our picture taken by the police, where its purpose can be uncertain. Privacy, then, is a relative concept.

How can we make data privacy more robust?

There are a variety of tools and techniques that claim to dispel the current misgivings about data privacy. Many in academic circles focus on pseudonymisation as a panacea for data privacy and protection issues. Latanya Sweeney’s seminal 2000 study into anonymised data found that 87 per cent of the US population (216 million people in 1990) could be uniquely identified using only a five-digit zip code, gender, and date of birth [8]. Pseudonymised data seeks to address the ease with which people can be identified through their data by replacing identifiable information with artificial identifiers or pseudonyms. However, the Sweeney study demonstrates there are cases where it can be relatively simple to link disparate pieces of pseudonymised data together to identify someone. This challenges the assumption that data privacy is binary: that it is either anonymous or it is not. Instead, it shows a spectrum of anonymisation exists where the omission of explicit identifiers such as name or telephone number is insufficient in protecting identity completely. Crucially, despite attempts to use pseudonymisation as a tool for protecting personal data, GDPR and other similar legislation still recognise pseudonymised data as personal data, even though it may present less risk.

Another technique that has recently emerged is differential privacy. This technique disguises personal data by deliberately introducing errors into the dataset, making it almost impossible to identify a person with certainty. This process of randomisation must be carefully calibrated, otherwise it could be compromised by changing the data summary statistics. A glaring security issue with this technique is that if randomisation takes place after unaltered data has been collected, then hackers may still be able to steal the original data.

It is unlikely that any technical solutions or legislative frameworks can counter all of the scepticism and concern that currently surrounds data privacy. Misgivings about government surveillance, the collection of aggregated data by companies, and the issues of privacy and consent will continue to elicit cause for concern as long as voluminous streams of personal data are exchanged every day. As it currently stands, data is circulated and traded in so many ways that the line between personal and non-personal data is difficult to decipher.