Responsible AI: from principles to practice - Nexa Center for Internet & Society

Status: concluded
Period: November 2022 – November 2025
Funding: 79.900 €
Funding organization: Politecnico di Torino, Nexa Center for Internet & Society
Person(s) in charge: Marco Rondina (PhD Student), Juan Carlos De Martin (PhD supervisor), Antonio Vetrò (PhD tutor)

Executive summary

The objective of this project is to develop practical techniques and industrial practices for accountable Artificial Intelligence (AI) decision-making.

Background

Artificial Intelligence (AI) applications automate decisions on relevant aspects of human lives. These kinds of automated decisions often unfairly discriminate against groups of individuals, on grounds that are unreasonable or inappropriate. The goal of the research is to translate the principles and guidelines of responsible and human-centric AI into techniques and actionable industrial practices.

Objectives

The PhD proposal aims at research, implement, and test techniques that can help detect bias and data quality issues in training data and mitigate them by experimenting with a variety of techniques. The objectives of the PhD plan are the following:

(i) Identify data quality and bias measures and test them on available datasets, their mutations, and synthetic datasets; experiment with the propagation of bias and quality problems to the output of classification/prediction tasks; identify and test mitigation techniques.

(ii) Identify guidelines and measures for the quality of dataset documentation; set up and perform measurements on available datasets; analyze results and possible consequences as “data cascades”.

(iii) Design and prototype informative, ethically sensitive data labels that can inform stakeholders (data maintainers, model builders, end users, etc.) about the risk of downstream effects from early data problems in AI pipelines. The data labels will be designed and tested with the aim of facilitating early intervention and mitigation of data cascades, including both human intervention (through interactive visualizations) and seamless implementation in the AI pipeline.

Results

An empirical investigation was conducted to investigate the state of dataset documentation practices by measuring the completeness of the documentation of several popular datasets in the ML/AI community. We analyzed the documentation presented in the very same place where the data can be accessed, to capture the peculiarities of the communities around some popular dataset repositories. A set of information that should always be clear to the users of the datasets, to achieve transparency and accountability, was adapted into a Documentation Test Sheet, which is able to measure the completeness of the documentation. It turned out that the information related to how to the use of the dataset was the most present. On the contrary, maintenance over time or processes behind the data generation were very poorly documented. In general, a lack of relevant information was observed, highlighting a lack of transparency. The analysis of the data shows the potential of repositories to help curators of datasets to produce better documentation, especially if they provide a more comprehensive documentation schema.