Linguistics Algorithms to detect harmful language
Artificial intelligence is capable of identifying slurs. But it has a hard time with rather more hidden forms of linguistic violence. For the time being.
Professor Tatjana Scheffler from the Digital Forensic Linguistics Group at Ruhr-Universität Bochum (RUB) is investigating how well algorithms for recognising hate speech can also detect more subtle forms of harmful language. Together with colleagues from Berlin, she analysed a Telegram chat about the storming of the Capitol in 2021 and identified forms of harmful language that algorithms have so far had difficulty detecting. In the future, the researcher would like to use these findings to optimise the algorithms she’s developed for detecting harmful speech. Rubin, the RUB’s science magazine, features an article on her work.
More than just direct insults
Artificial intelligence algorithms can learn to assign statements to different categories. For example, they learn to determine whether a text passage contains a direct insult or not. The algorithms learn the categories using large training datasets that have been previously classified by humans. Subsequently, they can transfer the knowledge about the learned categories to new data.
“Direct insults and swear words are relatively easy to identify,” says Tatjana Scheffler. Often, all the algorithm has to do is to compare the data with a list of frequently used insults. According to the researcher, however, harmful language is much more than obvious hate speech directed at individuals. “There are more implicit forms that are not directed at a specific individual,” she explains. “Harm can also be done by talking about others in a certain way or by creating a certain atmosphere.”
Telegram chat of Trump supporters analysed
In one study, for example, Tatjana Scheffler’s team looked at 26,431 messages posted to a public Telegram channel between 11 December 2016 and 18 January 2021. People with extreme right-wing sentiments initially commented there on the theoretical idea of overthrowing the US government. Later, they developed concrete plans to storm the Capitol, which happened on 6 January 2021.
Comparing automated processes and humans
The researchers analysed approximately one fifth of the messages manually and compared their results with those of automated algorithms, such as those used by tech companies to detect hate speech or offensive language. 4,505 messages were included in the comparison. Of these, 3,395 were rated as non-harmful by both the researchers and the automated processes, and 275 were agreed to contain harmful language. In contrast, humans and machines assessed 835 messages in different ways: the algorithms incorrectly classified about half of them as hate speech or insults; they did not recognise the rest – unlike the researchers – as harmful language.
Especially in the case of inflammatory comments, insider terms and othering – terms that are used to separate one group of people from another – the automated methods were often off the mark. “When we see in which cases established methods make mistakes, it helps us to create better algorithms in the future,” Tatjana Scheffler points out. Together with her team, she is also developing her own automated methods to better recognise harmful speech. This is where linguistic findings come into play: “Certain grammatical structures, for example, can be an indication that a term is meant to be derogatory,” explains Scheffler. “If I call someone ‘you twig’, that’s not the same as simply saying ‘twig’.”