AI: The Ripple Effect of Bias in Data

Bias in AI

An AI model reflects the bias present in the data sets on which it was trained. Historical datasets reflect the bias present in the historical processes on which they were built. How do we build an AI model that reflects our current values and does not condemn us to repeat the decisions of the past?

There is strong sentiment that we should eliminate bias from AI models that are used to adjudicate the fate of individuals.  Studies show that AI built on data of imprisonment records show a strong bias that penalizes people of color.  The history of imprisonment reflects a strong societal or law enforcement bias toward the incarceration of people of color, so data built on that historical record propagates that injustice. Current thinking seeks to eliminate that bias, but can we?

Sure, “they” say, eliminate the field designating race from your data set and then your model is color blind. But is it?  Even if elements such as address, education history, family structure, and employment status are removed, race-related attributes can still remain because bias permeates many features that on first look seem to be objective. Removing bias from a data set is a difficult task. (I would argue, an impossible one.)

So what can we do?  The data we have is the data we can use.  Does it make sense to add in artificial data that is unbiased or that is biased in the opposite direction?  Can we do this in a way that doesn’t introduce a manufactured bias? Mucking with the input data in a data-driven process is dangerous business.

Perhaps the question we should really be asking is “is AI the appropriate method for this application?” AI learns tendencies and relationships in data that are not understood by model designers and that may not reflect tendencies or relationships that a human would use to make a similar judgement.  This is extremely powerful in areas like computer vision and extremely dangerous in applications dealing with the fate of individuals (like loan approval, or length of prison sentence). We do not have a data set that reflects the world we want to be, so AI as a data-driven process to decide our fate may not be ready for prime time.