HKU statisticians develop online diagnostic system for screening COVID-19 with AI technologies based on chest CT dataset

A research team led by Professor Guosheng YIN, Head of Department of Statistics and Actuarial Science, the University of Hong Kong, and Dr Bin LIU, Assistant Professor of Centre of Statistical Research, School of Statistics, Southwestern University of Finance and Economics (currently Post-doctoral Fellow at HKU) has integrated radiography and computer vision to develop a digital online diagnostic system for COVID-19 based on chest CT scans.

The diagnostic system can help to screen suspected cases of COVID-19 and evaluate the probability of one contracting the disease. It has the following features. (https://www.covidct.cn):
– Fast – The diagnostic result is immediate from chest CT images shown
– Accurate – Accuracy: 88%, AUC (performance measurement for binary classification model): 93%, Sensitivity: 86%, Specificity: 90% (See Note 1)
– Easy to Use – Online web with user-friendly interface
– Open Source – All codes and data are freely available
(https://github.com/xiaoxuegao499/LA-DNN-for-COVID-19-diagnosis)

With years of research experience in Biostatistics and Clinical Trials, Professor Guosheng Yin and his team have been actively extending AI technologies to applications in the medical field in recent years. Meanwhile, the use of chest CT scans for screening suspected cases has been common in the research of various diseases.

“We decided to perform the diagnosis based on chest CT scans with reference to our many years of research in the field of Computer Vision. There are many issues with the current RT-PCR testing for COVID-19 in terms of false negatives and time lag in diagnosis. The test, which takes a swab from an individual’s nose or throat for a trace of the virus, sometimes requires several trials to make a final confirmation. This would put patients at a great disadvantage, as they cannot be diagnosed in a fast way and be provided with the necessary quarantine and treatment at an early stage,” said Professor Yin.

As discovered in radiological research, CT scanning may be effective in testing for COVID-19, particularly amongst those with no symptoms or minimal symptoms. “This is because the coronavirus will typically first attack the lungs and cause lesions after it enters the body. By integrating AI technologies, we use patients’ chest CT images for early diagnosis. However, since most of the chest CT datasets of COVID-19 patients are not publicly shared, we have to spend much time to search for publicly available samples and tag them,” added Professor Yin.

Building this digital platform is more proof that Radiography and Computer Vision can be perfectly integrated, actualising the practical use of AI technologies in medical fields. However, in the early study of CT scanning for COVID-19 diagnosis, the prediction as published in some of the peer researchers’ papers could not achieve the clinical standards. It is believed that, besides a small sample size, the rich annotations associated with the CT images may not have been fully utilised.

The major difference between the current batch of CT images and the traditional medical imaging dataset is that each of the CT samples is collected from a research preprint. In these papers, clinical experts have comprehensively annotated the chest CT images of the COVID-19 patients with detailed lesion descriptions. Leveraging on these text reports from 760 research papers, the research team further analysed and pinpointed five different lesions in association with COVID-19 and identified each confirmed patient with at least one of the five lesions or more. These five lesions are the distinctive features that differentiate COVID-19 from the general pneumonia or other lung diseases.

In this regard, the research team at HKU has designed a lesion-attention deep neural network (LA-DNN) model based on the CT images. Whilst the proposed data-driven LA-DNN model focuses on the primary task of binary classification for COVID-19 diagnosis, an auxiliary multi-label learning task is implemented simultaneously to draw the model’s attention to the five lesions of COVID-19. As both tasks are trained synchronously while it shows that the auxiliary task promotes the primary task to focus its attention on the lesion areas and, as a result, the diagnostic accuracy of COVID-19 can be improved drastically.

After launching the online COVID-19 diagnostic system, the research team will continue to collect new samples and improve the training model periodically. Professor Yin and Dr Liu hope that medical staff battling with the disease can make use of the diagnostic system and share patients’ image data, in order to initiate collaborative research and accommodate the urgent demands for COVID-19 testing.

“At the moment, most of the research papers do not share the data and the computer codes, and this does not facilitate knowledge exchange and disease prevention around the globe, yet our online system, data and computer codes are all publicly and freely available for everyone in the world.” They said.

Note 1:
Sensitivity and specificity are terminologies used in medical diagnosis. Sensitivity, namely the true-positive rate, measures the percentage of actual positives which are correctly identified. The larger the sensitivity, the better the diagnostic testing for identifying patients. Specificity, or the true-false rate, measures the percentage of actual negatives which are correctly identified. Likewise, the larger the specificity, the better the diagnostic testing for confirming negative cases. For COVID-19, greater emphasis should be placed on sensitivity so as not to misidentify any real patients.