Guest post: At the mercy of black box algorithms trained by skewed data
Originally a comment by latsot on The AI did not like women.
The interesting part (well. at least to people like me) is this:
It penalized resumes that included the word “women’s,” as in “women’s chess club captain.” And it downgraded graduates of two all-women’s colleges, according to people familiar with the matter.
Nobody told it to take notice of the word “women’s”, it worked that out all by itself. This is one of the deeper problems of machine learning: the software can generate unexpected concepts and make decisions based on them and there’s often no way to know this is happening. Sometimes these concepts can perform well in a task, but then start to do badly when input data gradually changes. Sometimes they can bias future learning even more than it is already biassed.
There are lots and lots (and lots) of problems with algorithms running everything. Having no way to tell why particular decisions have been made is one of them. Trying to fix a bad process by training with the data that it produced is another (in the other room I mentioned the AI system lots of police forces use to predict crime. SPOILER: it picks black neighborhoods) is another.
But by far the biggest problem is the widespread assumption that if the programmers try hard enough, the algorithm can do everything. Which, to be fair, is an assumption everyone I have ever worked for has shared and is not only an AI issue. The UK’s porn filters and the proposed EU copyright filters are examples of systems that can not possibly work. Youtube’s copyright filter has proven this over and over again but nobody seems to take any notice.
I’ve drifted off-topic but my point is that this story is entirely unsurprising to anyone who works in the field (and, I assume, many who don’t). It’s going to happen more and more. We’re increasingly at the mercy of black box algorithms trained by skewed data with – for all anyone knows – capricious or malevolent intent. It’s as dystopian as hell.
But in a sense we have always been at the mercy of black box algorithms. The decision algorithms in our head are black box algorithms too. Look at what lengths we have to go, to make those black box algorithms produce reliable results.
I guess we will have to check this AI black box algorithms the same way we check society for biases. Throw in a lot of duplicated applications, with the only difference a number of gendered words, and see how the recruitment tools handles these.
AI is a setup for bias. Users of a neural network decide what correlations they want to infer as outputs from inputs. You give it the outputs (conclusions) you want; it may or may not be successful in confirming these conclusions with a degree of reliability. Racial biases in sentencing (COMPAS) and targeting (PredPol) criminal activity are mentioned in the post.
AI is typically not “artificial intelligence”, at least not yet.
Notwithstanding the impressive scope of ‘AI’ or “machine learning” applications, it’s Artificial Inductive coRrelations: AIR.
The introduction to AI by the Chief Decision Intelligence Engineer, Google, Cassie Kozyrkov “Machine learning?—?Is the emperor wearing clothes?” (https://hackernoon.com/machine-learning-is-the-emperor-wearing-clothes-59933d12a3cc) is an authoritative accessible introduction. Or Google Judea Pearl’s article in The Atlantic.
Nonetheless, it can be very helpful leveraging ‘big data’.
I was reading an article not long ago that when they put a machine learner on the Internet to learn, within less than an hour it was a flaming misogynist. For some reason, this surprised the programmers, who apparently believed you could put a learning machine on Facebook or other social media and they would pick up only the best side of humanity, and learn how to be human.
Learning how to appear human these days seems to require learning misogyny. It’s everywhere, from top to bottom, and on the Internet, it appears to be the default mode any more (maybe it always was, and we didn’t notice until it became so blatant).
@axxyaan:
The difference is that we can question people about their reasoning. Such reasoning might well be generated after the fact to justify the conclusion, but we can test whether the reasoning is sound and whether the claims are supported by evidence. We can also hold people accountable for bad decisions and for the consequences of some decisions.
People are, of course, vastly more flexible than domain-specific algorithms. They are better at spotting false positives and false negatives and their training is infinitely superior. We don’t use machine learning because it is superior, we use it because it can be applied at internet scale. YouTube couldn’t possibly use humans to run a copyright filter so if we’re going to have enormous content hosting sites and we’re also worried about copyright violations, then we have no alternative but to use technology and there’s no way at all that we can avoid enormous numbers of false positives and negatives. Also, the filters will be laughably easy to fool.
Machine learning research has been around since the 40s. The kinds of AI systems around today aren’t very much more sophisticated than the ones I worked with in the 80s/90s. But we haven’t really used them on a widespread basis until quite recently. There are two reasons for this: first, we live on an internet scale now. It’s the only way to cope with some applications. Second, we have enormously more computing power to throw at software now. A lot of AI applications (eg speech recognition) can’t improve until they have functionally unlimited computing power and access to data.
This, of course, leads to 3: when anyone finds an actually good use for AI and it kinda-sorta works (I’m looking at you, Alexa) VC types will go hog wild trying to stick it in everything they possibly can. Customers will lap it up; they can’t be expected to know that it isn’t necessarily a good idea. The result is that we have a lot of systems around that nobody knows how to maintain or fix but are running our lives.
Well, there’s nothing new there, but at least reverse engineering and refactoring is theoretically possible in old-skool legacy systems, even though nobody has ever actually done it.
We do indeed need to learn more about how to train AI. We also need to learn more about how to evaluate the results and fix problems that occur in the future. There’s no shortage of research on this but it is (clearly) being ignored by technology companies. Consequently, the research is rarely applied to internet-scale applications so it’s hard to evaluate whether it is even valid at that scale.
But we go ahead and let these people stick AI in every device we own and allow it to govern (whether we know it or not) just about every aspect of our lives.
@John Wasson:
Indeed, but for whose benefit? Not (hardly ever) for the benefit of the people generating the data: the users. It’s great for sorting people into categories and selling that data to other unscrupulous firms.
There’s quite a famous (in privacy circles) interview between two people I can’t remember, which I can no longer find (OK, maybe it’s not that famous). A privacy activist demonstrates on-the-fly that what big data is really, really good at is making mistakes about people and propagating it around the world. In his example, he looks at the locations where the interviewer has ‘checked in’ using whichever social media app it was. The places he checks in most are a deli and a doctors surgery! The obvious conclusion is that he has health problems related to the frequent consumption of fatty meat, something his insurance company would no doubt love to know.
In fact, the doctor is a paediatrician (the interviewer’s daughter has a long-standing condition) and he tended to check in a lot in the deli because hardly anyone else ever did and he wanted the entirely meaningless ‘honour’ of being ‘mayor’ of that place (the person who has checked in most over some period). It’s a false correlation leading to a false conclusion, which could nevertheless have adverse and possibly irredeemable consequences. How would the guy even know that this conclusion is floating around out there and that companies are buying it and making decisions about him? Even if he did somehow find out, how could he go about fixing it? He wouldn’t know which companies bought that data, let alone whether and how they are using it.
It was a deliberately over-simplified example, but you get the point.
@iknklast:
Yes, there are a few examples of this kind of thing, including racism and homophobia as well as misogyny.
One issue here is that it’s hard to tell whether the bad behaviour was due to ‘honest’ interaction with the bots (learning from actual discourse) or from people realising it was a learning bot and gaming it to encourage the misogynistic output.
Sadly, it is also unclear whether there’s even a difference between these two things. As you say: