Data Bias: Identification of Mitigation and Remediation Strategies, Techniques and Tools

Status: 
ongoing
Period: 
November 2019 - November 2022
Funding: 
in kind
Funding organization: 

Politecnico di Torino, Softeng research group, Nexa Center for Internet & Society

Person(s) in charge: 

Antonio Vetrò (Senior Researcher), Marco Torchiano (Nexa Faculty Fellow), Mariachiara Mecati (Ph.D Student)

Executive summary: 

The proposal of this PhD project is to investigate the impact of poor data quality and biases in the data on the automatic decisions made by software applications.

Background: 

Nowadays, many software systems make use of large amount of data (often, personal) to make recommendations or decisions that affect our daily lives. Consequently, computer-generated recommendations or decisions might be affected by poor quality and bias in the input data. This implies relevant ethical considerations on the impact (in terms of relevance and scale) on the life of persons affected by the output of software systems.

Objectives: 

The PhD proposal aims at investigating the impact of poor data quality and biases in the data on the automatic decisions made by software applications. As a minor aspect, the ethical character of algorithms and the relative effects on decisions will be also investigated.

The objectives of the PhD plan are the following ones:
• Build a conceptual and operational data measurement framework for identifying data input characteristics that potentially affect the risks of wrong or discriminating software decisions. This goal encompasses identifying which characteristics have an impact, and the measurement procedure.
• Collect empirical evidence concerning the actual impact of the measured data quality issues on automated decisions made by software systems: the evidence will be built by means of different research methods: case studies, experiments or simulations, depending on the availability of data, software and third-party collaborations. In particular a key achievement is the establishment of relational links between quality issues and output bias features;
• Design of mitigation and remediation strategies and specific techniques to reduce the problem: a proof of concept implementation should be provided. We anticipate not all aspects of the problem will be solvable computationally, in such cases it will be important to advance the knowledge in the area by identify explanations and provide critical reflections.

In addition, the secondary goal is to investigate how quality of software and bias incorporated in the algorithms can contribute to flawed decisions made by software applications:
• If any evidence is found, we have to investigate how this aspect is related to the previous one of data quality and bias.
• Design and prototyping of remediation techniques for the problem.

Results: 

At the time of writing, an exploratory study has been conducted (Evaluating Risk of Discrimination in Automated Decision Making Systems with Measures of Disproportion) and submitted to the EGOV-CeDEM-ePart 2020 conference (http://dgsociety.org/egov-2020/). In this study, the authors investigated measurable characteristics of datasets which can lead to discriminating automated decisions. The final goal being to ensure a more conscious and responsible use of automatic decision-making (ADM) systems.