What does responsible and ethical AI look like?

The robots are not taking over the world quite yet. Those increasingly chatty Artificial Intelligence (AI) systems with names like ChatGPT and Bard are, however, beginning to take over our conversations. Before, chatbots were annoying popup boxes that opened when you visited the website of an insurance company. Now they wax lyrical about business strategy, write poems and generate code. But at what point do these systems go from models that produce coherent blocks of text mimicking natural language to artificial general intelligences (AGI) capable of acting (and even thinking) independently? If the goal is to develop sophisticated AGI, then the responsible use of ethical AI systems should be front and centre.

A stopgap on AI progress?

Recently, the Future of Life Institute published an open letter that aimed to address this very question. The letter—signed by more than 20,000 individuals including Elon Musk, Steve Wozniak, Stuart Russell, and Yuval Noah Harari—calls for a six-month pause on the development of large AI systems. The letter argues that a moratorium on the further scaling of AI models such as GPT-4 (the model behind ChatGPT) is necessary to give AI ethics and safety research time to catch up to the rapid technical progress made in recent months.

This seems misguided to me for a few reasons. The sheer potential of AI and the vast number of possible applications means that it could take some time until there is anything resembling a scientific consensus on AI safety concerns. Is it realistic to pause all the research and commercial development taking place around the world until this is reached? And why for only six months? This is surely nowhere long enough for theoretical concepts from AI safety and ethics to diffuse throughout the entire AI community. Although it’s less than perfect, I would suggest that a framework for the responsible use of AI needs to be devised in-step with broader advancements: the two cannot be separated.

The UK as an AI superpower

With this in mind, the UK government is now attempting to position itself as a global custodian of AI ethics and regulations. I recently attended the AI UK event run by the Alan Turing Institute where AI researchers, industry leaders, and government officials all gathered to make the case for the UK leading the world in responsible AI. The backbone of this rests on the UK’s reputation as a safe and stable place to invest capital. This means that the UK can project itself as the best place to invest in AI technologies, even if there are more groundbreaking innovations taking place elsewhere, where rules and regulations may be weaker. Part of this would necessarily include advocating for the ethical use of AI in new platforms and applications to establish a strong international consensus for responsible use. The questions then arise: what are the best measurements for ethical and responsible AI, and are there objective indicators that can be used to evaluate how ethical a particular application or technology is?

Creating a framework for responsible and ethical AI

There seems to be three broad mechanisms that stand out for creating an ethical framework for the responsible use of AI. The UK government can contribute to these by varying degrees, and in some cases, its ability to influence will be extremely limited. These mechanisms include building sufficient safeguards and guard rails into AI systems, benchmarking and testing applications as they are being developed (and before they are released to the public) and encoding systems to align more closely with ethical frameworks. I will discuss each of these in turn before outlining a potential way forward.

Safeguards and guardrails for AI systems

Integrating safeguards and guardrails into AI systems is a way of digging into the “black box” problem. Mechanisms such as “constitutional constraints” can hardcode requirements that many of us would see as integral such as accuracy, explainability and bias mitigation directly into the algorithms and neural networks that power AI systems. Services such as Google’s “What-If Tool” [1] and IBM’s “FactSheets” [2] aim to achieve these requirements through algorithmic transparency and testing model outputs. Some current large language models (LLMs) such as Claude (developed by Anthropic) adopt constitutional constraints that hardcode ethical requirements to ensure they align with system design. This, however, presents a significant challenge in determining how to translate ethical principles into technical specifications and code. It also raises the issue of universally applied standards. Can we build universal standards for technologies that are strongly contingent on differences in culture, religion, beliefs, and social values? In addition, can we trust AI companies not to fall foul of the data collection, privacy, encryption and terms of service issues that have plagued tech companies like Facebook in the past? Implementing these mechanisms also presents a practical challenge as existing AI systems may need to be re-engineered to add safeguards that are expensive both in terms of time and computational resources.

Hardcoding trust in AI systems

Although encoding trust into AI systems remains a pertinent challenge, research suggests it can be implemented in a few ways. The first is adopting wrapper functions on top of the code for AI systems to hardwire ethical use. Wrapper functions offer an additional layer on top of AI systems to monitor their behaviour and enforce guardrails. These wrapper functions could audit data inputs and outputs, add pauses to high-risk decision processes and use override mechanisms to prevent harmful behaviour. Recent research into bias in machine learning models found wrapper functions to be a viable approach for evaluating bias. One study applied a “wrapper bias detection” technique to detect machine learning bias in an open-source dataset that contained attributes for gender and race. The technique employed uses a concept known as “alternation functions” that swaps the two attributes and evaluates the impact on prediction using a probability distribution technique called Kullback-Leibler divergence. The results from the study successfully found that bias for both gender and race could be located within the dataset using this approach [3]. This kind of technique could make AI systems more explainable and accountable by mitigating bias and allowing for more effective monitoring. However, wrapper functions are limited in that they are tailored to specific AI applications. Implementing them would require some visibility of the data used to train AI systems or access to the models that sit behind these systems. Since many existing systems are proprietary black boxes, this is likely to be highly restricted.

Another technique advocated by Andrew Ng and Stuart Russell is called inverse reinforcement learning. This framework learns an agent’s objectives, values, or rewards by observing human behaviour. Instead of traditional reinforcement learning, where an agent receives a reward through trial and error and receiving feedback on its actions, in an inverse reinforcement learning scenario the AI agent infers human goals and objectives through observed behaviour. The agent then uses these inferences to shape how it evaluates its own performance. Despite displaying promise in terms of value alignment and explainability, inverse reinforcement learning faces significant challenges in terms of scaling and generalisability.

Benchmarking and testing

Developing benchmarking and testing practices for AI systems is another potential mechanism for responsible use. These tests would assess how well AI systems conform to desirable constraints such as groundtruth, transparency and bias. Benchmarking tests could also evaluate AI systems at different stages to ensure the research and use aligns with intended ethical principles. This would also include building “sandboxes” where AI tools can be evaluated and adjusted while they are in development to check for dangerous behaviours (although what constitutes “dangerous” would also need to be defined). Furthermore, this would involve some kind of self-regulation on the part of AI companies or an independent regulator that monitors best practice. Establishing external review processes to audit AI systems before, during and after they are deployed—in conjunction with AI companies—could be implemented in a similar manner to the way financial regulators work with banks and financial institutions to create industry standards.


If the UK plans to position itself at the forefront of responsible and ethical AI, it must consider more than simply creating standards, regulations or waiting for AI progress to slow down until these issues can be researched more fully. AI systems are built upon algorithms and code. If these are not produced with ethical considerations in mind during the entire end-to-end development cycle, then overarching ethical frameworks will have minimal effect. The real sticking point will be how to champion the responsible and ethical use of AI that integrates seamlessly with the advancement of systems, rather than trying to reign them in or counteract progress. Like most technological progress, the government will be playing catchup to commercial breakthroughs in AI for some time to come. It needs to support existing research into AI safety or risk being left behind.

Additional sources:

The Wrapper Approach
Metathical Perspectives on 'Benchmarking' AI Ethics
Race to the Top: Benchmarks for AI Safety