What you’ll learn in this article:
- Mathematician and Author Cathy O’Neil is developing an “ethical matrix” to serve as a framework for evaluating algorithmic systems and the data that feeds them.
- Processes for evaluating algorithmic bias could be replicated and regulated in the future, but we are not ready for that yet.
- The currency of computer science and engineering academia currently does not value fairness or transparency as much as it does speed and accuracy.
Cathy O’Neil has no insurance company clients. No HR analytics clients using AI to filter job applicant resumes either. The mathematician, algorithm bias-busting crusader and author of Weapons of Math Destruction has, however, worked recently with clients including German industrial giant Siemens, a New York-based rental property grading firm called Rent Logic and groups in London and Amsterdam.
These clients, she said, have been willing to go the extra mile to have her algorithmic auditing consultancy put their systems through a customized evaluation process to reduce – or ideally remove – problems that lead to unjust decisions and inaccuracies.
In other words, this isn’t an automated bias removal tool.
“The clients we actually have are clients that actually want to know if their algorithms work or not,” she said during an interview with RedTail earlier this month.
For O’Neil, evaluating the quality of an algorithm is not only contingent on assessing the data employed to train it, but determined by its impact on everyone who might be affected by the decisions it makes. Until now, data scientists have rated their algorithmic creations based on factors such as profitability and efficiency. Typically, they only have considered the impact on corporate stakeholders or company goals rather than a technology’s impact on individuals it may eventually affect (people who have applied for a loan or a job, for example).
O’Neil and her firm ORCAA (O’Neil Risk Consulting and Algorithmic Auditing) aim to flip that script. The company employs an “ethical matrix” framework for assessing the quality of an algorithm and its potential impact on all stakeholders. This matrix she and colleagues are developing considers principles inspired by biomedical ethics including beneficence (i.e. doing good) and justice, and evaluates the effects of algorithms on a more inclusive set of stakeholders.
When asked to define “algorithm,” O’Neil clarifies by describing a “predictive algorithm” as “an automated process trained to look for historical patterns of success and to suggest that whatever led to success in the past will lead once again to success in the future.” She adds, “This requires only historical data and a definition of success, and might be used to suggest that a certain type of DNA will increase risk of breast cancer or that white men will be promoted in the context of a company.”
RedTail chatted with O’Neil about the ethical matrix, her opinion of some of the automated algorithmic “fairness tools” out there, and about what she thinks the future of practical and repeatable algorithm assessments for insurance or other industries might look like. Here are select portions of that conversation.
Automatic Bias Removal vs. the Ethical Matrix
RedTail: IBM recently launched a tool that it says will “automatically mitigate bias” in AI applications. Consulting firm Accenture has had a product out for a few months that, as the company has told me, evaluates data used to train an AI system and then “de-biases” it. I like to call this “productizing the AI ethics gap.” How does ORCAA fit into the world of AI fairness tools?
Cathy O’Neil: The fairness tools – and I don’t claim to have actually worked with either of the ones you mentioned – are just, like, very simplistic, not very useful, not very flexible. I’m not impressed. And the reason I’m not impressed is because this stuff is actually very nuanced, very contextual and very tricky, and it can be simple. But it mostly isn’t simple.
RT: Tell me a little more about the ethical matrix you’re developing.
CO: Our contention is that technologists trivialize or simplify to a ridiculous extent the question of whether an algorithm works, because they define the quote-unquote working as sort of like being profitable or being efficient or being accurate – something they like. And what this ethical matrix does is it expands that notion very directly by sort of forcing you to list all the stakeholders in a given context for a given algorithm and all of their potential concerns in that context.
So, just to be clear, algorithms don’t have ethics but algorithms used in a context by humans do have ethics. The idea of the ethical matrix is to expand it even past just the people who built the algorithm and the target of the algorithm. Like, who else has a stake in this system?
My job as ORCAA is simply to translate priorities between the different languages, the language of business and the language of data. And I think more broadly the ethical matrix can be used to do that. It surfaces the questions of ethics but also surfaces the question of, like, what are we prioritizing? You can’t minimize false positives, maximize accuracy and minimize false negatives all at once. There always are trade-offs, so this is a way of surfacing those trade-offs in ways that it doesn’t take a math PhD to do.
“Algorithms don’t have ethics, but algorithms used in a context by humans do have ethics.”
RT: What do you think about automating the “de-biasing” process, which is what other products seem to stress?
CO: I have not come across anything that is that simple, and the reason is because biases are almost always unknown unknowns, really, when it comes down to it. Let me be more precise. So, there’s an example of a case study [about algorithms used in child abuse risk scoring in Pennsylvania, as described in Virginia Eubanks’ book Automating Inequality]. Basically, it comes down to the fact that poor people and Black people are both more likely to have lots of data in the database they use to populate their risk scores for a child being abused, and more likely to be called out on child abuse by the reporters because they’re poor – and also more likely to have their child removed (because sometimes it’s not really abuse, it’s just that the heat goes out and they have to remove the child.)
…. So, for all these reasons, you have an enormous amount of bias. You can’t de-bias that with any tool. What you can do instead is just say what kind of false-positive rate you’re willing to have and understand how that will affect certain populations more than others. But you won’t know exactly how much more or how much that extra scrutiny is fair or not fair because it is simply missing data. Another way of saying this is, rich white people abuse their kids and get away with it without anybody ever building data around it.
“Like, why did we think an algorithm is going to solve this hard problem that humans have trouble with?”
RT: So, because there are only certain sets of data that exist which are used to train the system, by its nature, the system arguably is biased because it only encompasses data representative of certain populations.
CO: Exactly. It over-represents certain populations. But the thing that is completely clear to me…is that any automated tool will just be as good as the data that is represented. At some point, they’re either going to assume the data perfectly represents the situation, which is totally wrong, or they’re going to assume it doesn’t in a specific way, which will be arbitrary. So, there’s just no way to automate this. There are ways to say, “We would like the false-positive rate for Black families to be similar to the false-positive rate for white families,” the false-positive rate being very, very hard to measure.
…. And you always come away with the question, “Like, why did we think an algorithm is going to solve this hard problem that humans have trouble with?”
Making Ethical AI Assessment a Repeatable Process
RT: If I were to put this in a nugget of a sentence, the distinction is that you are doing a very custom approach, whereas some of the other tools out there appear to take a more automated approach. So, maybe you can walk me through a process with a client example.
CO: It is custom, and the first, second and third questions I might ask are, “What are you worried might be happening? List the stakeholders and tell me what’s a worst-case scenario.” That’s already custom because it’s very contextual. And then I invent a test that measures whether what they’re worried about is happening.
…. If a client of mine was, let’s say, a life insurance company – just imagine that – and I built a tool to see whether their life insurance policy deciding who gets life insurance under what conditions and what the price is – deciding whether or not that’s racist – that would be the kind of thing I want to do. Let’s say I develop that tool for them and…that could possibly be re-purposed quickly if not repeated for a different life insurance policy to see whether it’s racist.
So, I’m not suggesting that there’s no way to make this, once it’s tailored, no way to repeat this kind of method. And my vision is that in ten years or twenty years, the regulator in charge of insurance that makes sure that it’s legal, that it’s compliant with anti-discrimination law, will have these tools and will not only use these tools, but require the insurance companies to prove that they have used these tools in monitoring their policies….
“My vision is that in ten years or twenty years, the regulator in charge of insurance that makes sure that it’s legal, that it’s compliant with anti-discrimination law, will have these tools.”
But I don’t think we’re there yet. It’s going to take us five or ten years to figure out what exactly conditions look like and how to measure that. We’ll need guidance from the regulators which are absolutely not on the ball right now. Regulation on all these industries – because all these industries are using algorithms in all sorts of ways – they will be basically arguments about what this monitoring code looks like.
On Insurance Firms and Plausible Deniability
RT: You told me you don’t have insurance companies as clients right now. Why?
CO: It’s not because none of them have approached me, because I have been approached multiple times by people in those areas and fields. It’s because the first call is with a concerned senior analytics person who read my book and he’s pretty sure something might be going wrong with their algorithm. The second call, their senior legal counsel joins the call and asks me, along the lines of, “What if you find a problem that we can’t solve?” And I say, “Well, you know, I think we can solve a lot of these problems, but I can’t guarantee that we can solve all these problems.”
And then I never get another call back. I call it the era of plausible deniability.
“I worked at a hedge fund so I’m a bottomless pit of cynicism in certain ways.”
RT: So what do you hope changes about that?
CO: The clients we actually have are clients that actually want to know if their algorithms work or not; it’s not a plausible deniability issue. They want to know because if it’s not working either their investors are going to lose faith, or their customers are going to lose faith, or the public which they depend on for trust is going to lose faith, or they themselves are going to lose money.
RT: Are you worried that some clients or potential clients just want a “no bias” stamp-of-approval?
CO: Oh yeah, that is what they want. I mean, not my clients. My clients are the best people in the world, but the world in general, for sure. Look, I worked at a hedge fund so I’m a bottomless pit of cynicism in certain ways.
On Bullshit AI Explanations
RT: There are lots of demands for algorithmic transparency and explainable AI, exposing how an AI-based technology got to a decision [Read RedTail’s feature on this topic here]. Do you think it’s feasible from a technical standpoint?
CO: It’s really easy to give a bullshit explanation…. If we do end up with a meaningful definition of a meaningful explanation and we actually demand it, that’s something I think the data nerds could actually do. In other words, right now they’re focused on one metric, which is accuracy. If you look at the journals in computer science – please take a look – because it spells out what the currency of computer science publishing looks like; it looks like, “Can you do this faster? Can you do this more accurately?” And those are the only two questions that get published. As soon as the priorities change and say, like, “Can you explain this better?” – that’s something that you can get tenure for because you solved that problem, then they’re going to solve that problem.
I have talked to computer scientists who are like, “Don’t tell anybody I talked to you and am interested in fairness or transparency because I won’t get tenure because it’s not considered a serious subject in my department.”
These guys are really infinitely smart, but they just don’t care about that…. I don’t actually anticipate somebody getting tenure for this. The academia silo, like the “What do we care about?” question, is so slow-moving. So I think it’ll take a little longer than just computer scientists saying, “Hey, that’s an interesting question. I want to answer that.”