Utrecht University: Machine learning can help identify politically connected firms

There appears to be a widespread increase in the share of contracts allocated to political connected firms during the Covid crisis. The Netherlands are no exception. The crisis might have induced this and they might not always have been cases of corruption but simply missing, decent public procurement. But the lack of transparency was evident and it raised a lot of eyebrows. Besides a plead for professional procurement, regulatory organisations and the general public demand to know more about the influence of firms on political decisions.

Deni Mazrekaj, Fritz Schiltz and Vitezslav Titl, external linkAssistant professor from the Utrecht University School of Economics (U.S.E) suggest that machine learning can help with identifying politically connected firms. Following this, a targeted audit to investigate on potential corruption of these firms can be far more efficient than a random one. The researchers are developing an algorithm (‘R’) code that can help public servants to implement this.

Covid as amplification of an existing, urgent matter
‘There appears to be a widespread increase in the share of contracts allocated to political connected firms during the Covid crisis,’ says Vitezslav Titl. ‘It happened pretty much anywhere during Covid. We have gathered examples from at least twenty countries, from Turkey to the UK to Kenya to the US, Czechia, Tunisia. The audit in the UK for instance showed that one third of the firms were connected to politicians and senior officials. In the Netherlands, 100 million euro was spent on a huge batch of nose-mouth caps that haven’t been used (the Sywert van Lienden/Stichting Hulptroepen case) and more than a billion euro’s was awarded to the Testen voor toegang (‘Tests for access’) experiment – both without a proper public tender, which raises suspicions.

Prof. Elisabetta Manunza, external linkprofessor of European and International Public Procurement Law at Utrecht University, also commented on this, in the Economist. external link

Politicians or procurement officers often seem to have other preferences than the society
‘The Covid crisis once again shows that politicians or procurement officers often seem to have other preferences than the society,’ Titl adds. ‘The pandemic seems to have amplified it. Most governments started doing public procurement in sort of emergency regimes, with much less strict regulations so they could have done things that normally wouldn’t be allowed or justifiable. This has made our research into politically connected firms even more urgent and important for the society but also more difficult since data are often not available.’

In many countries, the public sector is getting weaker in respect to the way it’s able to react on corruption, tax evasion and similar issues. To give an example, the United States tax office (the IRS) audited merely 0.45% of personal income-tax returns in 2019, less than half of the audit rate in 2010. More targeted policy is needed, and machine learning could be part of the solution’ Titl says.

Politically Connected Firms
In their working paper ‘Identifying Politically Connected Firms: a Machine Learning Approach’, external linkDeni Mazrekaj, Vitezslav Titl and Fritz Schiltz introduce machine learning techniques to identify politically connected firms, i.e. firms with links to politicians that might be in severe conflicts of interests. By assembling information from publicly available sources and the Orbis company database, they constructed a novel firm population dataset from the Czech Republic in which various forms of political connections can be determined. The results indicate that over 85% of firms with political connections can be accurately identified by the proposed algorithms. The model obtains this high accuracy by using only firm-level financial and industry indicators that are widely available in most countries.

‘Political connections can take the form of donations; for instance a firm that gives money to a political party, directly – which will show up on the list of a political party donations, but also when a CEO or Board member gives money to a political party,’ Titl explains. ‘It can also be about personal connections: a CEO or Board member that has been on a list of candidates of a political party during elections (also when they were not actually elected).

We chose these three types because for each of them there is evidence that shows adverse effects of the connections. They usually bring benefits to the firms; they can result in more procurement contracts, overpriced contracts, poorly executed public works, hiring of less competent individuals, erosion in employment standards as well as regulatory benefits and politically channelled loans. These benefits in general help the firms, not society.’

Data and machine learning leading to targeted audits
‘I have been working on political connections in the Czech Republic for about ten years and have very good, comprehensive data. We basically have information on the connections of certain types about every firm in the country. These data were ideal for testing the predictive power of machine learning algorithms.

There is half a million firms and roughly 10.000 are politically connected in the Czech Republic. Normally you would have to audit 50 firms to find one that is connected. With the algorithm you can find ten firms and when you audit them, eight or nine firms will actually really prove to be connected. You have to look at a much lower number of firms to find the political connected ones. Our research shows that with using these algorithms and data (that are available in most countries) we have about 85% accuracy and we could target these firms for audits.

The outcome could be that when the audits show that there is a problem, they would no longer be allowed to participate in public procurement. This could prevent potential issues and scandals in the future. Instead of doing a random check you could target potentially corrupt firms.’

R code for implementation by civil servants
‘To implement the algorithms as we suggest, one needs a dataset of (ideally all) firms with their basic financial and industry information and a small sample of firms with and without connections. In most countries, the first dataset is available in some form. For the latter dataset, public officials could use historical cases, in which conflicts of interest (a connection) was and was not found. We show that one needs only about 100 firms with and 100 without political connections to start with training the algorithm. This should be possible to obtain almost anywhere with a small time investment.

The implementation is straightforward for anyone, who has experience with using statistical software such as ‘R’ and programming. We even provide easy to implement source code (in ‘R’). The procedure would be as follows: one needs to collect a necessary training dataset (for a given country), load the dataset, and run our code to let the algorithm learn on this training dataset. Then the trained algorithm can be used on a dataset of all other (not yet audited) firms so that it predicts whether the firm likely has a connection.

If you are a public servant and you would like to implement this, you can download it from the appendix of a previous version of the paper external link(published on the OECD website) or just email the authors. With a bit of adjustment you should be able to use this code on new datasets.