University of Waterloo: Artificial intelligence needs good data to grow the future

0

Data is the currency of tomorrow, with boundless opportunities, but there needs to be a common standard for how that currency is created and used, said Canada’s chief data steward.

Anil Arora, Chief Statistician of Canada for Statistics Canada, was the keynote speaker at the Fall Industry Day on Monday, hosted by the Waterloo Artificial Intelligence Institute (Waterloo.AI) and Communitech.

Titled “Data — The Fuel for AI,” the event at the University of Waterloo’s Fed Hall brought together more than 350 business leaders, data scientists, academics and government officials from around the country, both in person and online.

“Since launching in 2018, Waterloo.AI’s multi-disciplinary research teams have been collaborating with industry to develop intelligent systems. This hybrid event provides an opportunity to bring together our leading researchers and industry partners to discuss emerging trends and the future of Canada’s data network,” said Harold Godwin, managing director of Waterloo.AI.

Arora told the hybrid audience that every company in the world will be buying and selling data in the future, but the quality of that data has to be assured.

He offered Statistics Canada as a trusted steward of that data, with a century of experience in data collection and use. That century has seen the evolution of the agency, from simply surveying citizenry to integrating data from sources not considered a generation ago, such as satellite imagery.

To gather and share that data, partnerships are crucial, Arora said.

Using the satellite imagery example, Arora explained how data scientists can detect crop types from those images, then run models to predict crop yields or measure the water stress on plants. In another instance, an artificial intelligence-driven program built a virtual population to measure the spread of COVID-19 once workers return to their offices.

The pandemic accelerated the collection and use of data around the world, Arora said, making it critical to establish a common language for data sharing.

“Data is a team sport,” he said, saying that Statistics Canada continues to seek new partners in the private sector, governments and academia to establish those common standards.

Arora emphasized that the role of Statistic Canada is to stimulate, but not to compete. He said that there would be messiness and some mistakes going forward, but declared that change agents can’t stay in their own lanes — co-operation and collaboration are key to their success.

He cited the case of applying AI insights to the rising tide of opioid deaths. When timelines of victims were examined, it was found that a significant number of them were in the construction industry. A workplace injury might lead to opioid addiction. Knowing this at the point of treatment could save a life. This kind of data use can demonstrate to Canadians the value proposition of working together, he said.

Those issues, and others, were explored in the late-morning panel on “Data Trends, Opportunities and Challenges,” helmed by Waterloo.AI co-director Vijay Ganesh, which included Arora; Jimmy Lin, data science professor and co-director at Waterloo.AI; and Reem Al-Halimi, Chief Data Scientist at the Airbus company, NAVBLUE.

Ganesh led the questions by asking how Canada ranks beside the U.S. and EU in data legislation. Arora noted that the government is revising legislation written decades ago, and said it is a tricky balance between being relevant in an evolving context and yet not hindering industry’s ability to innovate. Building a consensus can be challenging — “It’s a bit of a messy art.”

Reem Al-Halimi suggested that the EU’s General Data Protection Regulation (GDPR) needs to be understood by all employees working with data, and said companies need a built-in process or team to respond to GDPR-related changes.

Asked how one ensures compliance, Jimmy Lin noted that there is little about data ethics in current university curricula. And, Lin said, the data science community is still “in the deer in the headlights stage.” It was only a few years ago, he noted, that data scientists understood that models could be biased by gender or race, for instance. There has to be training in data ethics for the upcoming data scientists, he said, not just to track the things one would expect to go wrong, but also to monitor the aspects that one doesn’t expect to go wrong.

A challenge for those using data, said Al-Halimi, is the constant monitoring of the quality of data products, but citing examples from the air travel industry, she said that there are opportunities for improving workflow and reducing waste in every aspect of the business.

Lin addressed the common concern that AI is replacing human control, by saying that AI is not a substitute for human creativity, but is an aid to human creativity.

Arora echoed that, noting that good-quality data can be used to understand the inequities in contemporary infrastructure. “The opportunities are endless.” Old problems, he said, can be looked at with the power of today, to reveal relationships not seen before. “The opportunities are truly exciting.”