University of Maryland: Researcher Helps Create Big Data ‘Early Alarm’ for Ukraine Abuses
From searing images of civilians targeted by shelling to detailed accounts of sick children and their families fleeing nearby fighting to seek medical care, journalists have created a kaleidoscopic view of the suffering that has engulfed Ukraine since Russia invaded—but the news media can’t be everywhere.
Social media practically can be, however, and a University of Maryland researcher is part of a U.S.-Ukrainian multi-institutional team that’s harvesting data from Twitter and analyzing it with machine-learning algorithms. The result is a real-time system that provides a running account of what people in Ukraine are facing, constructed from their own accounts.
The project, Data for Ukraine, has been running for about three weeks, and has shown itself able to surface important events a few hours ahead of Western or even Ukrainian media sources. It focuses on four areas: humanitarian needs, displaced people, civilian resistance and human rights violations. In addition to simply showing spikes of credible tweets about certain subjects the team is tracking, the system also geolocates tweets—essentially mapping where events take place.
“It’s an early alarm system for human rights abuses,” said Ernesto Calvo, professor of government and politics and director of UMD’s Inter-Disciplinary Lab for Computational Social Science. “For it to work, we need to know two basic things: what is happening or being reported, and who is reporting those things.”
Calvo and his lab focus on the second of those two requirements, and constructed a “community detection” system to identify key nodes of Twitter users from which to use data. Other team members with expertise in Ukrainian society and politics spotted him a list of about 400 verified users who actively tweet on relevant topics. Then Calvo, who honed his approach analyzing social media from political and environmental crises in Latin America, and his team expanded and deepened the collection, drawing on connections and followers of the initial list so that millions of tweets per day now feed the system.
Nearly half of the captured tweets are in Ukrainian, 30% are in English and 20% are in Russian. Knowing who to exclude—accounts started the day before the invasion, for instance, or with few long-term connections—is key, Calvo said.
“One of the big concerns when we started was to what extent the information was credible—the objective was not to capture as much data as possible, but to make sure it’s quality data,” he said.
Other team members include Olga Onuch of the University of Manchester, U.K., a Ukrainian associate professor of politics who helped guide the selection of monitored Twitter accounts and shape the list of more than 600 Ukrainian and Russian keywords the system monitors for. It captures “living language,” she said—for instance, a protest might be referred to in Ukrainian or Russian with the Soviet-era colloquialism of “a meeting.”
Erik Wibbels, a political science professor who leads Duke University’s DevLab, handles the project’s natural language processing element, utilizing artificial intelligence and the keyword list developed by Onuch and others to analyze what the tweets are about; political science Professor Graeme Robertson of the University of North Carolina at Chapel Hill and his colleagues provide expertise on the region; and undisclosed scholars at the Kyiv School of Economics, among other functions, are helping to validate the system’s performance.
In one instance, its tracking of civilian resistance and human rights abuses was able to immediately identify the beginning of a major event—Russian forces firing on peaceful protesters in the southern city of Kherson on March 21—soon registering as a spike on one of the main graphs on the project’s website.
Onuch hopes the work can help in two ways: in the moment, by perhaps helping aid agencies direct resources to people fleeing fighting, and in the long term, by permanently documenting abuses and atrocities for eventual justice.
“Social scientists have a duty in a time of crisis—if they have special or technical knowledge that can be useful—to use it,” she said. “Even if they can’t directly save human lives, they can use it to record what happened.”
Crunching social media big data is likely to become an increasingly prevalent way to take the pulse not only of wars, but other crises and key moments in future history, Calvo said.
“Scholars, politicians and regular citizens are using Twitter as one of the main ways to amplify information on the ground in Ukraine,” he said. “What’s new for me is that I have mostly used the data to look backward to analyze political crises, but now, the DevLab and our lab working together are developing the ability to use it for early detection.”