Biased Data and How to Avoid It, an Interview

Some algorithms may promote racial, gender, or other inequalities by relying on narrow data sets and historical data that's imprinted with bias.

By David Ryan Polgar, Contributor

Algorithms are often viewed as neutral mathematical tools, where their automation creates the illusion they are without human bias. In reality, algorithms may promote racial, gender, or other inequalities by relying on narrow data sets and historical data that’s imprinted with bias.

Following recent cases of algorithmic bias, such as the time one company unintentionally developed a program that gave male resumes priority over those that included the word “women,” society is calling for greater fairness and transparency when it comes to deploying automation processes.

One person keen to help companies identify these potential danger zones is Dr. Hannah Fry, an associate professor in the Mathematics of Cities at the Centre for Advanced Spatial Analysis at University College London. Author of the recent book, “Hello World: Being Human in the Age of Algorithms,” (which was shortlisted for the 2018 Royal Society Investment Science Book Prize and the 2018 Ballie Gifford Prize for non-fiction), Fry has spent her career studying behavioral patterns as they relate to humans and technology.

In this interview, Fry shared three main ways bias can creep into the algorithmic process, the danger of giving up on flawed algorithms, and the questions business leaders should be asking about automation.

What is your definition of algorithmic bias?

Dr. Hannah Fry: Bias is when you can segment the output of your algorithm and find a difference between the groups you’ve segmented. For example, it might be if you are able to segment by race or gender, or it could be geographical or age-related. In an ideal world, any way that you cut up your data, you want your algorithm to be treating every group identically.

How can we identify this type of bias?

HF: I think there are three main ways that bias creeps into your system. The first is when it’s by deliberate design—someone is intentionally trying to bias things towards or against a particular group. To be honest, that’s pretty rare.

The second way it happens is when it’s the unintended consequence of a lack of diversity in the design process. It’s accidental omission, where you just haven’t thought about all the different potential groups that might be using your product and the way the algorithm is going to treat them. One of the most visually arresting examples of this is a soap dispenser that failed to recognize darker skin types. That’s an embarrassing example for the company that produced it.

The final instance is in some ways the most difficult to work around. It is the bias that creeps into our system because of history. Since the data we are using is a record of how the world is right now, sometimes if you take that historical data and project it forward, you can end up perpetuating biases that already exist. A great example of this is what has happened with the bail algorithms in the justice system.

Do you think companies are still viewing algorithms as neutral or objective?

HF: Companies are thinking about this stuff more carefully, but [bias] should be front-and-center at every possible stage. You should be thinking about what you might be accidentally encoding into the system. That’s really important.

At the same time, we have to be careful to lower our expectations around what algorithms are and how perfect they are going to be. Humans have this slightly strange relationship with technology in the sense that on one hand, we have this habit of wholeheartedly trusting it—like people blindly following their GPS straight into water—and on the other hand, as soon as an algorithm is shown to be flawed, people throw it away. We have to be really careful not to throw the baby out with the bathwater and just get rid of it completely when the initial result is less than desirable.

Who is responsible for algorithmic bias?

HF: This should be everyone’s problem. Any time you have a company that is building or designing an algorithm, [everyone involved] should be aware of the ethical concerns around it, its potential pitfalls, and any unintended consequences.

Do you foresee companies incorporating greater ethical thinking around how they work with algorithms?

HF: It’s definitely a trend we’re seeing. We’re at this stage right now where algorithms are so pervasive and powerful that progression and ethics have to go hand-in-hand. That is quite difficult when you are employing mathematicians, physicists, and computer science majors for whom ethics do not form a keystone of their studies. But is very important.

How should companies treat third-party algorithms or AI-based programs that aid hiring and other business functions?

HF: People aren’t asking the right questions, and they aren’t asking enough questions. If someone comes in and says, “This algorithm can do this thing,” ask them, “How do you know? How do you know the algorithm can do this thing? How can you prove that it can do that thing? How can you be absolutely sure that it will be able to do what you say it will be able to do?”

When someone is coming in to tell you how brilliant an algorithm will be, it is very easy to get swept up in the positives. But it is really important to think about the negative as well. Brainstorm all of the possible worst-case scenarios and make sure that you’re really mitigating against them.

There are lots of different ways that the algorithm could be successful. But [leaders] need to sit down and think very carefully about what success looks like, what they actually want.