Ecole Polytechnique: Results of the student data science challenge “Flights”

The “Flights” challenge proposed to use a supervised learning approach to predict the number of passengers based on the date of their reservation and the route of the plane. It was organised jointly by the Data Science Institute and Data Science and Industrial Processes sponsorship programmes and closed on February 3rd.



With the participation of more than 100 students from a dozen countries, the “Flights” challenge proposed to build supervised learning models to predict the number of real passengers on aeroplane flights. At the crossroads of the “Data science and industrial processes” and “Data Science Institute” sponsorship programmes, led by Éric Moulines, researcher at the Centre de mathématiques appliquées (CMAP*), Lambert Tanoh, teacher-researcher at the Institut National Polytechnique Félix Houphouët-Boigny (INPHB) and Jean Arnaud Kouakou, teacher-researcher at L’Ensea-Abidjan, the competition took place from December 13th to January 15th. The students were able to present their models and retrace their progress at the closing event on 3 February.

The students were provided with real data indicating the departure of the plane, its route, as well as the average and distribution of time between the booking of a ticket and the departure of the plane. From these data sets, they built algorithms that were to reproduce the real observations as closely as possible. This is a supervised learning problem, as the algorithms were trained with data knowing which result corresponded to which input. They were able to submit their model daily to evaluate their performance, with a total of over 400 submissions. Outcoder, a startup specialising in data science challenges, coordinated the competition and evaluated the submitted algorithms.

Of the 31 teams composed of one to four people, those at the top of the podium came from Mohammed VI Polytechnic University, INP-HB in Yamoussoukro, and the Institut national des sciences appliquées Centre Val de Loire, respectively. Using analyses of variance, i.e. the distribution of the data, and correlation matrices, the students analysed in detail the data sets provided. They then determined how the algorithm should balance the data and choose the most relevant ones. The final step before submitting the model was the choice of algorithm, as well as the use of software to optimise the machine learning.

This international competition allowed data science students from different backgrounds to come together and test the results of their ideas every day. This challenge, carried out with real data, directly confronted its participants with concrete situations that they will face at the end of their studies, and rewarded the teams that were able to intelligently complete the data.

*CMAP: a joint research unit of CNRS, École Polytechnique – Institut Polytechnique de Paris

About the sponsorship programmes:

Data Science and Industrial Processes:
Led by Éric Moulines, the International Chair “Data Science and Industrial Processes” contributes to training a new generation of engineers at Mohammed VI Polytechnic University to exploit Data Science and develop innovative industrial processes. Through the teaching activities of the Chair since its inception in 2018 and with the support of OCP, the partners together prepare students to become industry innovators and develop tomorrow’s industrial processes through Data Science.

Data Science Institute:
Supported since 2017 by the Orange Group, and by Société Générale Côte d’Ivoire since 2021, the “Data Science Institute” sponsorship, carried by Eric Moulines, Lambert Tanoh and Jean-Arnaud Kouakou, aims to accelerate the rise in “data management” skills in Côte d’Ivoire and West Africa. Hosted within existing Ivorian institutions, the Institut national polytechnique Félix Houphouët-Boigny (INP-HB) in Yamoussoukro and the École nationale supérieure de statistique et d’économie appliquée (ENSEA) Abidjan Côte d’Ivoire, it plans to set up an international-level course of excellence in data sciences, initially for students and then for company executives.