Indirect Bias Exploration - Visualization 2

Introduction

The purpose of this visualization is to explore the potential biases learned by NLP transformer models, machine learning models which can be used to deal with human language.

You will be able to choose a target category attribute (e.g. sport) and see the correlations made by the chosen model with different feature category attributes (e.g. beverage or trait). It is also possible to investigate the correlations between the target and feature attributes and some sensitive attributes (such as gender or religion), to check whether the target and feature elements could be linked by some indirect correlations with these sensitive features. The correlation scores for different models are available.

The visualization is composed of a scatterplot, where each dot represents an element from the chosen target category. The position of the dots is defined by the feature category. High-dimensional vectors are derived from the correlation scores between the target and feature elements, which are mapped to a 2-dimensional space using a dimensionality reduction. Different dimensionality reductions are available. It is possible to zoom in on the scatterplot to get better insights. A color scale can be applied to explore whether it exists a correlation between the target elements and a selected feature or sensitive element, and how strong this correlation is. The correlations are generated using names as a bridge, which are linked to the target and feature elements. It is possible to explore these links by clicking on the dots of the scatterplot.