Last June, Antonio Radić, the host of a YouTube Chess Channel with more than a million subscribers, an interview with Grandmaster Hikaru Nakamura was broadcast live when the broadcast was suddenly cut out.
Instead of a lively discussion about chess openings, famous games and iconic players, viewers were told Radic‘s video was removed for “harmful and dangerous” content. Radic saw a message saying that the video, which contains nothing more outrageous than a discussion of the video King’s Indian Defense, violated YouTube’s Community Guidelines. It stayed offline for 24 hours.
Exactly what still happened is not yet clear. YouTube declined to comment, except that deleting Radic’s video was a bug. But a new study suggests that it reflects shortcomings in artificial intelligence programs designed to automatically detect hate speech, abuse and misinformation online.
Ashique KhudaBukhsha project scientist specializing in AI at Carnegie Mellon University and himself a serious chess player, wondered if YouTube’s algorithm might have been confused by discussions about black and white pieces, attacks and defense.
So he and Rupak Sarkar, an engineer at CMU, designed an experiment. They trained two versions of a language model called BERT, one using the messages of the racist right-wing website Stormfront and the others use data from Twitter. They then tested the algorithms on the text and comments of 8,818 chess videos and found that they were not perfect. The algorithms marked about 1 percent of the transcripts or comments as hate speech. But more than 80 percent of the respondents were false positives – read in context, the language was not racist. “Without a person in the loop,” the couple says in their newspaper, “the misleading prediction of classifiers on chess discussions can be misleading.”
The experiment exposed a core issue for AI language programs. Detecting hate speech or abuse is about more than just catching up with dirty words and phrases. The same words can have very different meanings in different contexts, so an algorithm must derive meaning from a series of words.
“Fundamentally, language is still a very subtle thing,” says Tom Mitchell, a professor at CMU who previously worked with KhudaBukhsh. “This kind of trained classifier is not going to be 100 percent accurate soon.”
Yejin Choi, an associate professor at the University of Washington who specializes in AI and language, says she’s not at all ‘surprised by the YouTube removal, given the limits of language comprehension today. Choi says additional progress in detecting hate speech will require major investments and new approaches. She says algorithms work better if they analyze more than just a text in isolation, which includes, for example, a user’s history of comments or the nature of the channel in which the comments are posted.
But Choi’s research also shows how the detection of hate speech can perpetuate prejudice. In a 2019 study, she and others found that human reporters were more likely to label Twitter posts by users who identify themselves as African American as abusive, and that algorithms trained to identify abuses using the notes would repeat the prejudices .
Companies have spent millions collecting and recording training data for self-driving cars, but Choi says the same effort has not been made in the recording language. So far, no one has collected and annotated a high-quality hate speech or abuse dataset, which contains many “ambiguous” borderline cases. “If we’ve invested in the level of data collection – or even a small fraction, of it, I’m sure AI can do much better,” she says.
Mitchell, the CMU professor, says YouTube and other platforms probably have more sophisticated AI algorithms than KhudaBukhsh built; but even those are still limited.