Skip to main content

Improving responsible access to demographic data to address bias

Posted by: and , Posted on: - Categories: Algorithms, Artificial intelligence, Bias, Data, Demographic data, Intermediaries, Trust

Following our review into bias in algorithmic decision-making, the CDEI has been exploring challenges around access to demographic data for detecting and mitigating bias in AI systems, and considering potential solutions to address these challenges. 

Today we are publishing our report “Enabling responsible access to demographic data to make AI systems fairer”, which explores the potential of novel approaches to overcome some of these challenges. This work, part of our responsible data access work programme, explores solutions with the potential to assist service providers to responsibly access data on the demographics of their users to assess for potential bias.

The use of AI, and broader data-driven systems, is becoming increasingly commonplace. With this, the risks associated with bias in these systems has become a growing concern. Where AI systems produce unfair outcomes for individuals on the basis of these protected characteristics and are used in a context in scope of the Equalities Act 2010 (e.g. the provision of a service), this might result in unlawful discrimination.

The importance of addressing fairness in AI systems has also been recognised in the government's recent white paper “A pro-innovation approach to AI regulation”. The white paper proposes fairness as one of five potential cross-cutting principles for AI regulation. Fairness encompasses a wide range of issues, one of which is avoiding unfair bias, which can lead to discrimination.

Many approaches to detecting and mitigating bias require access to demographic data about users, including characteristics that are protected under the Equality Act 2010 - such as age, sex, and race - as well as other socioeconomic attributes. 

For example, if an insurer wants to understand the impact of their risk pricing model on different ethnic groups, it needs data about the ethnicity of their customers (and potential customers). However, collection of such data is not common practice in most sectors.

Organisations building or deploying AI systems often struggle to access the demographic data they need. There are a number of reasons for this including:

  • Legal issues (both real and perceived), such as the misconception that collecting demographic data is not permitted under data protection law and the challenge of ensuring data is collected and used only for bias monitoring purposes.
  • Ethical issues, including the belief that service users do not want their data collected for this purpose, and concerns around privacy and surveillance, representation, transparency, and public trust.
  • Organisational barriers, such as the reputational risks associated with revealing organisational biases and inadequate resource and/or expertise.
  • Practical challenges like ensuring data quality and representativeness. 
  • Difficulties in mitigating against risks that arise when collecting this data, such as data theft or misuse.

Potential solutions

Our work focuses on two contrasting sets of promising approaches to help address some of these challenges: data intermediaries and proxies.

In simple terms, for the purposes of this report, a demographic data intermediary can be understood as an entity that facilitates the sharing of demographic data between those who wish to make their own demographic data available and those who are seeking to access and use demographic data they do not have. Intermediaries could help organisations navigate regulatory complexity, better protect user autonomy and privacy, and improve user experience and data governance standards. However, the overall market for data intermediaries in the UK remains nascent, and the absence of intermediaries offering this type of service may reflect the difficulties of being a first mover in this complex area, where demand is unclear and the risks around handling demographic data require careful management. 

If gathering demographic data is difficult, another option is to attempt to infer it from other proxy data already held. For example, an individual’s forename gives some information about their gender, with the accuracy of the inference highly dependent on context, and the name in question. There are already some examples of service providers using proxies to detect bias in their AI systems. 

Proxies have the potential to offer an approach to understanding bias where direct collection of demographic data is not feasible. In some circumstances, proxies can enable service providers to infer data that is the source of potential bias under investigation, which is particularly useful for bias detection. Methods that draw inferences at higher levels of aggregation could enable bias analysis without requiring service providers to process individually-identifiable demographic data. 

However, significant care is needed. Using proxies does not avoid the need for compliance with data protection law. Further, use of proxies without due care can give rise to damaging inaccuracies and pose risks to service users’ privacy and autonomy. Therefore, we argue that inferring demographic data for bias monitoring using proxies should only be considered in certain circumstances, such as when bias can be more accurately identified using a proxy than information about an actual demographic characteristic, where inferences are drawn at a level of aggregation that means no individual is identifiable, or where no realistic better alternative exists. In our paper, we suggest risk mitigations and safeguards that organisations should consider if using proxies. 

Next Steps

Our report reflects on what needs to happen to enable an ecosystem to emerge that offers better options for the responsible use of demographic data to improve the fairness of AI systems. 

In the short term, direct collection of demographic data is likely to remain the best option for many service providers seeking to understand bias. Where this isn’t feasible, use of proxies may be an appropriate alternative, but significant care is needed. However, there is an opportunity for an ecosystem to emerge that offers better options for the responsible use of demographic data to improve the fairness of AI systems. This ecosystem would be characterised by increased activity in the development and deployment of different solutions that best meet the needs of service providers and service users, as well as ongoing efforts to develop a robust data assurance ecosystem, ensure regulatory clarity, support research and development, and amplify the voices of marginalised groups. 

Building on this work, we have announced a new Fairness Innovation Challenge. This will support organisations in their efforts to implement the Fairness principle set out in the UK government’s AI White Paper. The challenge will provide an opportunity to test new ideas for addressing AI fairness challenges in collaboration with government and regulators, including challenges around data access. 

We are currently running a call for use cases, and would welcome submissions of specific fairness-related problems faced by organisations designing, developing, and/or deploying AI systems. For more information and to submit a use case, see this link.

Sharing and comments

Share this page

Leave a comment

We only ask for your email address so we know you're a real person

By submitting a comment you understand it may be published on this public website. Please read our privacy notice to see how the GOV.UK blogging platform handles your information.