Bias Risk Assessment — A systematic approach — Part 1/3

Sundar Narayanan
3 min readNov 28, 2022

--

Bias is contributed by data, model, outcome, interface, environment, and human actions in which such AI/ML system is used. It could exist in both supervised and unsupervised learning environments in the context of AI/ ML systems. While several discussions have focused on data-driven bias, there is a necessity to understand the bias from a holistic perspective.

Bias in computer systems can be categorized as pre-existing, technical, and emergent bias [Friedman, B. and Nissenbaum, H. (1996)]:

  • Pre-existing bias is contributed by an individual or societal biases in the environment. These are also amplified by the selection and representation of the data gathered. Much discussions around bias lead to data bias or the underlying societal bias.
  • Technical bias can arise from tools, lack of context for the model, and inconsistencies in coding abstract human concepts into machine learning models.
  • Emergent bias arises from the feedback loop between human and computer systems.

The existing studies mainly discuss bias, unfairness, and discrimination in general but rarely delve into detail by studying the source of such bias. This article delvs into the various sources of Bias in automated decision systems.

Using a single assessment procedure is insufficient, as bias can be in different forms (e.g., Biased content, bias against underprivileged groups, etc.). In addition, individual fairness or group fairness needs to be looked at from a contextual perspective. For instance, group fairness may be unfair to underprivileged groups (the interest of which needs to be prioritized). Further, bias in the model is also contributed by the model's development, including the hyper-parameter considerations in building them.

While statistical parity is recommended, it does not actively satisfy all the bias elements (as referred to above). Further, equal odds could be detrimental for protected categories that require positive bias to extend policy benefits. Also, if causality in the data/ models is not appropriately validated, it could contribute to bias. It is necessary to understand the source of bias.

Understanding Bias Cube

Bias Cube attempts to provide a systematic explanation of sources of bias, types of bias, and how it gets exhibited to a user, thereby helping adopt appropriate bias mitigation efforts. Such efforts are essential in the US, given the requirements of bias mitigation in Insurance, Hiring, Fair Credit, Fair Housing, etc.

Bias Cube

Bias in Data: Bias in data can come from various touch points, including data gathering, data augmentation, data merging, data cleaning, data pre-processing, data encoding, and data split.

Bias in Model: Bias can arise from the model due to model choices, feature engineering, training, parametric choices, and testing & tuning.

Bias in Pipeline and infra: Bias can arise from pipeline and infrastructure due to pipeline, infrastructure robustness, infrastructure measures, and optimization choices.

Bias in Interface & integrations: Bias can arise from interface and integrations due to nudges, design, and integrations (with tools or other models).

Bias in Deployment: Bias can arise from statistical distribution differences between the training and deployment environments. It can also arise due to changes in the meaning of inferences and causalities.

Bias in Human-in/ on-the-loop: Human decisions associated with inferences, proxies, or causalities, outcomes, and subsequent actions (specifically human-in-the-loop and human-on-the-loop decisions on model outcomes) can contribute to bias.

Few metrics that examine whether the outcomes are representative enough are considered sufficient for bias assessment. One such metric is the 4/5th rule. This ensures that the selection rate for the protected category is at least 4/5ths of the group with the highest selection rate. Examining outcomes and metrics at each protected category variable level enhances the opportunity to understand potential adverse impacts that may be there for the select protected category. Further, the 4/5th metric may not cover all sources of bias as represented above. Given the limitations, it is advised to determine the appropriate bias mitigation strategy based on Bias risk assessment.

--

--