McGill University: Using online ads to spot human trafficking
Researchers at McGill University and Carnegie Mellon University (CMU) have designed an algorithm that detects organized human trafficking activity in online escort advertisements. Advertising is one the most common use of technology for human trafficking purposes.
A majority of the victims are advertised online and have no input into the wording used in the advertisements posted for them by their pimp, who usually controls over 4 to 6 victims, says Reihaneh Rabbany, Assistant Professor at McGill’s School of Computer Science and Canada CIFAR AI chair. This leads to similar phrasing and duplication among listings which can be used to detect organized activity.
The proposed algorithm, called InfoShield, can put millions of advertisements together and highlight the common parts,” adds Christos Faloutsos, Fredkin Professor at CMU’s School of Computer Science, and the CMU project lead. “If the ad have a lot of things in common, it’s not guaranteed, but it’s highly likely that it is something suspicious.” This algorithm could help law enforcement direct their investigations and better identify human traffickers and their victims.
According to the International Labor Organization, an estimated 24.9 million people are trapped in forced labor. Of those, 55% are women and girls trafficked in the commercial sex industry. In the past decade, human trafficking cases have been in the rise in Canada and in response to that the Canadian government (in collaboration with RCMP) has launched a “National Strategy to Combat Human Trafficking 2019-2024”, with one of the focus areas being the need for technological advancements and research. The Infoshield algorithm is taking a step in this direction.
“Human trafficking is a dangerous societal problem which is difficult to tackle,” explains lead authors Catalina Vajiac and Meng-Chieh Lee. “By looking for small clusters of ads that contain similar phrasing rather than analyzing standalone ads, we’re finding the groups of ads that are most likely to be organized activity, which is a strong signal of (human trafficking).”
Spotting organized activity on social media
To test InfoShield, the team ran it on a set of escort listings in which experts had already identified trafficking ads. The team found that InfoShield outperformed other algorithms at identifying the trafficking ads, flagging them with 85% precision.
The test data set contained actual ads placed by human traffickers. The information in these ads is sensitive and kept private to protect the victims of human trafficking, so the team could not publish examples of the activities identified or the data set itself. This meant that other researchers could not verify their work. To remedy this, the team looked for public data sets they could use to test InfoShield that mimicked what the algorithm looked for in human trafficking data: text and the similarities in it. They turned to Twitter, where they found a trove of text and similarities in that text created by bots.
“Bots and trolls will often tweet the same information in similar ways,” adds Rabbany, “Like a human trafficking ad, the format of a bot tweet might be the same with some pieces of information changed, since it is originating from the same source. In both cases – Twitter bots and human trafficking ads – the goal is to find organized activity”.
Among tweets, InfoShield outperformed other state-of-the-art algorithms at detecting bots. Vajiac said this finding was a surprise, given that other algorithms consider Twitter-specific metrics such as the number of followers, retweets and likes, and InfoShield did not. The algorithm instead relied solely on the text of the tweets to determine bot or not.
“That speaks a lot to how important text is in finding these types of organizations,” says Vajiac.