Privacy in the Age of AI

the unique privacy risk of LLM’s that they cannot actually unlearn the information they were fed.

chatgpt

26 January 2024

It is not a new story that large tech companies collect personal data from users to enhance their services. However, with the advent of consumer-use Generative AI models including Large Language Models (LLMs) such as OpenAI’s ChatGPT, concerns have been raised about the usage of personal data and its implications for privacy.

So, what about AI is particularly concerning for privacy? 

LLMs like ChatGPT are fed large amounts of text-based information to learn how to answer user queries with fully fleshed-out, human-like responses. They interact with the user to provide requested information, complete tasks, and even mimic human creative activity.

Implications for privacy

The privacy concern around these models stems from two processes mentioned above. The training process, and the user-interaction process. AI models like ChatGPT are trained on publicly available datasets that often contain personal information such as email addresses.

The user-interaction process (referred to as inference in AI terminology) is arguably more concerning, ironically because users have more control over the information they supply the AI model.

After the training stage, AI models continue to take user input to fine-tune their responses over time. For example, OpenAI states that ‘data submitted through non-API consumer services ChatGPT or DALL·E may be used to improve our models.’ This simply means whatever you tell ChatGPT, it will remember it, and ‘use’ it. 

Thus, theoretically, if you submit personal information, it can be extracted again when prompted. This is why experts have strongly advised against inputting PII (personally identifiable information) such as your name, address, email address, and government-issued codes or numbers (e.g. National Insurance Number) into AI models. 

Besides exercising discretion, it is important to find countermeasures to protect your data from being used by AI services against your will or knowledge. For example, users can opt out of their data being used to improve OpenAI services and/or disable chat history so that conversations will not be used to train OpenAI’s models. Users can even have their content permanently deleted, but only by deleting their account.

However, experts have pointed out the unique privacy risk of LLM’s that they cannot actually  unlearn the information they were fed. Over 100 countries have data protection regulations in which individual data subject rights such as the “right to be forgotten” are explicitly codified. The European Union's General Data Protection Regulation (GDPR) is one of them.

However, the content of user conversations still seeps through and influences the generated responses. In other words, even if your input is not used for training, the AI model is constantly evolving from its interactions with users.

Takeaways

AI can be immensely helpful and innovative in our daily, professional and academic lives. If used well, the potential is boundless. Yet with the commercialization of new, exciting technologies, users must stay aware of the implications for their digital privacy and security. Simple acts such as reading through a company’s privacy policy and looking for ways to minimize unwanted usage of personal data can do wonders to protect your data privacy.

So next time, before you initiate a conversation with ChatGPT and reveal anything personal, e.g. submit a CV for it to review, take a moment to think about what you are comfortable with to take advantage of AI tools in the way you believe is most beneficial for yourself.