Data Bias: Identification of Mitigation and Remediation Strategies, Techniques and Tools

Submitted by Anita Botta on Fri, 17/04/2020 - 16:53

Status:

concluded

Period:

November 2019 - January 2023

Funding:

in kind

Funding organization:

Politecnico di Torino, Softeng research group, Nexa Center for Internet & Society

Person(s) in charge:

Antonio Vetrò (Senior Researcher), Marco Torchiano (Nexa Faculty Fellow), Mariachiara Mecati (Ph.D Student)

Executive summary:

The proposal of this PhD project is to investigate the impact of poor data quality and biases in the data on the automatic decisions made by software applications.

Background:

Nowadays, many software systems make use of large amount of data (often, personal) to make recommendations or decisions that affect our daily lives. Consequently, computer-generated recommendations or decisions might be affected by poor quality and bias in the input data. This implies relevant ethical considerations on the impact (in terms of relevance and scale) on the life of persons affected by the output of software systems.

Objectives:

The PhD proposal aims at investigating the impact of poor data quality and biases in the data on the automatic decisions made by software applications. As a minor aspect, the ethical character of algorithms and the relative effects on decisions will be also investigated.

The objectives of the PhD plan are the following ones:
• Build a conceptual and operational data measurement framework for identifying data input characteristics that potentially affect the risks of wrong or discriminating software decisions. This goal encompasses identifying which characteristics have an impact, and the measurement procedure.
• Collect empirical evidence concerning the actual impact of the measured data quality issues on automated decisions made by software systems. The evidence will be built by means of different research methods: case studies, experiments or simulations, depending on the availability of data, software and third-party collaborations. In particular a key achievement is the establishment of relational links between quality issues and output bias features.
• Design of mitigation and remediation strategies and specific techniques to reduce the problem: a proof of concept implementation should be provided. We anticipate not all aspects of the problem will be solvable computationally, in such cases it will be important to advance the knowledge in the area by identify explanations and provide critical reflections.

In addition, the secondary goal is to investigate how quality of software and bias incorporated in the algorithms can contribute to flawed decisions made by software applications:
• If any evidence is found, we have to investigate how this aspect is related to the previous one of data quality and bias.
• Design and prototyping of remediation techniques for the problem.

Results:

First of all, an exploratory study has been conducted (“Identifying risks in datasets for automated decision–making”) with a view to investigating measurable characteristics of datasets which can lead to discriminating automated decisions. This initial study has been accepted for publication at the EGOV-CeDEM-ePart 2020 conference and has been selected as the Best Paper in the category The most innovative research contribution or case study" (which "Awards the paper with the most outof‐the‐box and forward-looking idea and concept. Relevance is more important than rigor").
After that, a more detailed research has been submitted to the Government Information Quarterly (an International Journal of Information Technology Management, Policies, and Practices) . This subsequent study has been carried out by extending the set of imbalance measures with a view to examining in more depth the capability of such measures to detect imbalance among the classes of a given attribute in a dataset. Then, it has been taken into account a much larger number of datasets belonging to various application domains (from the criminal justice systems to financial services, but also social related topics, such as personal earnings and education), for the purpose of assessing whether the existing imbalance measures are able to reveal a discrimination risk when an ADM system is trained with such data. The final goal being to ensure a more conscious and responsible use of automatic decision-making (ADM) systems.

Related Publications:

Mecati, M.; Torchiano, M.; Vetro, A.; De Martin, J.C.

Measuring Imbalance on Intersectional Protected Attributes and on Target Variable to Forecast Unfair Classifications

03 March 2023

IEEE ACCESS, 11:(2023), pp. 26996-27011

Mecati M., Adrignola A., Vetrò A., Torchiano M.

Identifying Imbalance Thresholds in Input Data to Achieve Desired Levels of Algorithmic Fairness

17-20 Dec. 2022

Second International Workshop on Data Science for equality, inclusion and well-being challenges (DS4EIW 2022), Osaka, Japan 17-20 Dec. 2022, Page 4700-4709

Mariachiara Mecati, Antonio Vetrò, Marco Torchiano

Detecting Risk of Biased Output with Balance Measures

April 2022

Journal of Data and Information Quality, April 2022

Mecati M., Vetrò A., Torchiano M.

Detecting Discrimination Risk in Automated Decision-Making Systems with Balance Measures on Input Data

15-18 Dec. 2021

In: First International Workshop on Data Science for equality, inclusion and well-being challenges (DS4EIW 2021)

Vetrò A.

Imbalanced data as risk factor of discriminating automated decisions: a measurement-based approach

2 Dec. 2021

JIPITEC 12(4) 2021

Simonetta A., Vetrò A., Paoletti C.M., Torchiano M.

Integrating SQuARE data quality model with ISO 31000 risk management to measure and mitigate software bias

8 Dec 2021

3rd International Workshop on Experience with SQuaRE Series and Its Future Direction (IWESQ 2021),Taipei (Taiwan) 8 Dec 2021, pp.17-22

Vetrò, A., Torchiano, M., Mecati, M.

A data quality approach to the identification of discrimination risk in automated decision making systems

September 4, 2021

GOVERNMENT INFORMATION QUARTERLY, Elsevier, pp. 17, 2021, Vol. 38, Issue 4, ISSN: 0740-624X DOI 10.1016/j.giq.2021.101619

Alessandro Simonetta, Andrea Trenta, Maria Cristina Paoletti, Antonio Vetrò

Metrics for Identifying Bias in Datasets

9 July 2021

ICYRIME 2021 International Conference of Yearly Reports on Informatics Mathematics, and Engineering 2021, Online, July 9, 2021

Mecati, M., Cannavò F.E., Vetrò A., Torchiano, M.

Identifying Risks in Datasets for Automated Decision–Making

September 2020

EGOV2020 – IFIP EGOV-CeDEM-EPART 2020, Linköping University (Sweden), August 31 - September 2, 2020, pp. 332-344. egov-2020 (BEST PAPER AWARD)

Data Bias: Identification of Mitigation and Remediation Strategies, Techniques and Tools

join our community

recommended links

project keywords

Upcoming events

Featured event