University of Exeter: ‘Democratic AI’ makes more favoured economic policy decisions
In a paper published in Nature Human Behaviour, researchers trained an artificial intelligence (AI) system named ‘Democratic AI’ to design a way of distributing the proceeds of an investment game and found that it was more popular with the players than any human-designed system.
Thousands of participants were recruited to play an investment game in groups of four. Before every round, each player was allocated funds, with the size of the endowment varying between players.
Each player could keep those funds or invest them in a common pool that was guaranteed to grow, but with the risk that the players did not know how the proceeds would be shared out.
The researchers then used different policies to share out the funds: one was a human-designed policy where funds were redistributed in proportion to contribution; the other was a form of AI trained through ‘deep reinforcement learning’ to observe and copy how people play the game in previous versions and maximise the players’ preferences across a wider group.
When asked to vote for which policy they preferred, the participants chose the AI system over policies such as redistributing the funds equally or redistributing funds in proportion to each player’s contribution.
And when the researchers trained up a ‘human policy-maker’ the players still preferred the Democratic AI system.
The research addresses a question that has divided opinions among philosophers, economists and political scientists for many years: how exactly should we distribute resources in economies and societies?
Oliver Hauser, Associate Professor of Economics at the University of Exeter Business School and co-author of the study, said: “AI systems are sometimes criticised for learning policies that are incompatible with human values, but with this approach the AI harnesses the principle of democracy by maximising the majority preferences of a group of people. While this approach is only a prototype, it may help ensure that AI systems are less likely to learn policies that are unsafe or unfair.”
The researchers analysed the policy the AI had discovered and found it incorporated a mixture of ideas that had been previously proposed by human experts to solve the problem of redistributing the funds.
This included taking into account a player’s initial means and redistributing funds in proportion to players’ relative – rather than absolute – contribution.
They also found the AI system rewarded players whose relative contribution was more generous, perhaps encouraging others to do likewise.
“Importantly, the AI only discovered these policies by learning to maximise human votes,” said Professor Hauser. “The method therefore ensures that humans remain ‘in the loop’ and the AI produces human-compatible solutions.”
The study was a collaboration between researchers from University of Exeter, Deepmind, UCL and University of Oxford.