Data Bias: Identification of Mitigation and Remediation Strategies, Techniques and Tools

Executive summary

The proposal of this PhD project is to investigate the impact of poor data quality and biases in the data on the automatic decisions made by software applications.

Background

Nowadays, many software systems make use of large amount of data (often, personal) to make recommendations or decisions that affect our daily lives. Consequently, computer-generated recommendations or decisions might be affected by poor quality and bias in the input data. This implies relevant ethical considerations on the impact (in terms of relevance and scale) on the life of persons affected by the output of software systems.

Objectives

The PhD proposal aims at investigating the impact of poor data quality and biases in the data on the automatic decisions made by software applications. As a minor aspect, the ethical character of algorithms and the relative effects on decisions will be also investigated.

The objectives of the PhD plan are the following ones:
• Build a conceptual and operational data measurement framework for identifying data input characteristics that potentially affect the risks of wrong or discriminating software decisions. This goal encompasses identifying which characteristics have an impact, and the measurement procedure.
• Collect empirical evidence concerning the actual impact of the measured data quality issues on automated decisions made by software systems. The evidence will be built by means of different research methods: case studies, experiments or simulations, depending on the availability of data, software and third-party collaborations. In particular a key achievement is the establishment of relational links between quality issues and output bias features.
• Design of mitigation and remediation strategies and specific techniques to reduce the problem: a proof of concept implementation should be provided. We anticipate not all aspects of the problem will be solvable computationally, in such cases it will be important to advance the knowledge in the area by identify explanations and provide critical reflections.

In addition, the secondary goal is to investigate how quality of software and bias incorporated in the algorithms can contribute to flawed decisions made by software applications:
• If any evidence is found, we have to investigate how this aspect is related to the previous one of data quality and bias.
• Design and prototyping of remediation techniques for the problem.

Results