From Entelai we want to make our contribution to the Covid-19 epidemic, and we offer free access to our service for detecting suspicious cases on the web

Introduction

Hi, everybody. We want to tell you the story of our last days working day and night to adapt our Artificial Intelligence (AI) chest X-ray algorithm to detect suspicious cases of Covid-19. 

The SARS-CoV-2 pandemic (popularly known as coronavirus, and which causes a severe respiratory illness called Covid-19) has surprised us all and will have a very profound impact in the coming months and years. We have no doubt that once again humanity will emerge stronger from this type of challenge. But this is not happening by magic, but by the effort of the whole society and in this case of the health professionals who work in prevention, containment and treatment of the cases. For them and their patients, we make available our chest X-ray algorithms for the detection of suspected cases of Covid-19. Below we summarize some information about what produces the infection in the lungs and can be seen in the chest radiography and then we count the development of our tool.

Diagnosis and radiological findings in Covid-19 pneumonia

Chest radiography is the first recommended method of study in the case of suspected Covid-19 infection and its interpretation is decisive in the management of these patients. It is a fast and accessible method for the clinician, and available in centers with different degrees of complexity. In the words of Dr. Mercedes Serra, Medical Director of Entelai: “So far, 77% of patients with severe COVID-19 infection, and 54% of those with milder infection, show some visible alteration on chest X-ray”. This is a very important fact, because even if a system that analyzes chest X-rays detects 100% of cases with Covid-19, it would only do so in 77% of severe patients.


The most characteristic findings are patchy opacities in frosted glass that generally affect several lobes, or even both lung fields, at peripheral predominance. These findings may allow us to differentiate Covid-19 pneumonia from bacterial pneumonias, where the segmental or lobar condition, with alveolar pattern and tendency to consolidation, is more frequent. Differentiating Covid-19 pneumonia from other viral or other atypical pneumonias may be more difficult. However, other features, such as perihilar (central) predominant condition or the presence of pleural effusion, have been described and may help to differentiate these cases.

Another method used for the detection of Covid-19 pneumonia is computed tomography (CT), which is usually more sensitive for the general detection of pneumonia, but exposes the patient to more radiation, is more expensive, and does not necessarily contribute to Covid-19 differentiation. In addition, the patient is exposed to greater risk of hospital infection if the equipment is not cleaned properly. For these reasons, the American College of Radiology does not recommend the use of CT as the first line in these patients, and suggests instead the use of portable radiology equipment that is easier to clean and avoids contamination of radiology rooms.

Regardless of the symptoms (cough, fever) and chest x-ray findings, the diagnosis of SARS-CoV-2 disease is made by a laboratory technique called PCR according to World Health Organization recommendations. That is, one cannot make the diagnosis of Covid-19 solely by interpretation (with or without the aid of AI) of the chest x-ray. However, the diagnostic test is often not available, or is delayed in obtaining the results. For this reason, clinical and radiological detection of patients with greater suspicion of the disease can be very useful when deciding the management and diagnostic-therapeutic approach of a patient in the emergency. And that is where we believe that an AI system trained to detect suspicious cases of Covid-19 in chest radiography, can be helpful to health professionals. That’s why we’ve worked very hard in the last few days, to be able to give doctors on the front line an additional tool, which we hope will be useful to them during the course of the pandemic. Below we tell you briefly how we did it.

System training

AI algorithms learn how physicians, through study and analysis of many cases, can abstract patterns, infer key findings from one disease to another, and thus predict or classify medical problems. For example, given a number of normal chest x-ray images, and a set of x-rays of patients with pneumonia, doctors and AI can learn to distinguish them by their characteristics (presence of consolidation or spots on the x-ray, etc). For this task then, we obtained about 100 images of patients confirmed with Covid-19 and other patients with similar pneumonias, as well as a control group without pneumonia respecting age and gender distribution. The age or gender distribution is important (that in each group there are similar amounts of women and men and of an equivalent age range), since otherwise the system can learn to differentiate it by other characteristics unrelated to the presence or not of Covid-19, such as ossification in minors that is not present in adults. Dr. Mercedes Serra assembled a complete dataset based on images obtained mainly from these sites: 

  • https://www.sirm.org/category/senza-categoria/covid-19/
  • https://www.eurorad.org/
  • https://github.com/ieee8023/covid-chestxray-dataset
  • https://www.kaggle.com/kmader/pulmonary-chest-xray-abnormalities
  • https://www.kaggle.com/nih-chest-xrays/data
  • https://www.kaggle.com/c/rsna-pneumonia-detection-challenge
  • https://github.com/BIMCV-CSUSP/BIMCV-COVID-19
  • Entelai Repository
  • https://radiopaedia.org/cases

Once the image database that the system is going to use to learn is configured, you are presented with the images so that you can start to recognize their differences and eventually make predictions every time you are presented with a new image. In this case, our AI was already trained to recognize normal X-rays from abnormal ones, so in this case, it was a fine tuning (usually to “learn” from scratch, the system needs thousands of images). Francisco Dorr is our data scientist who has been tirelessly training our AI to quickly “learn” to detect these cases using a type of neural network known as DenseNet121. The result provided by the AI is a percentage associated with each label. For example, Covid Pneumonia-19 90%, Other Pneumonia 9%, Normal / Other findings 1% is a result that can be seen in a highly suspicious case. In the words of Francisco: “The percentages given as results are a measure of how confident the network is in making a prediction about a disease. It can range from 0 to 100. The higher the percentage, the more confident the network feels about the result it is giving. If in doubt, the percentages will be similar across all classes.

Results

Initially, we divided the training data set (116 cases per category) and asked the AI to detect those suspected cases of Covid-19 pneumonia. Below your results:

  • Sensitivity: 84%.
  • Specificity: 91%.
  • Positive predictive value: 83%.
  • AUROC: 0.93

We will comment on each of these results. Sensitivity is the percentage of people who have the disease who test positive with the system. That is, in this case, out of 10 people who are sick, the system would detect 8.4 people, and would fail to detect 1.6 people. The specificity on the other hand is the percentage of people who do not have the disease and that the system classifies as negative. In other words, out of 10 healthy or disease-free people, the system classifies 9.1 people as healthy and 0.9 people are incorrectly told that they have the disease. The positive predictive value or PPV is the probability of having the disease if the test result is positive. And finally the AUROC or area under the curve is a measure that summarizes the effectiveness of a rating, with values above 0.70 being considered good and values above 0.90 being considered very good or excellent. This means that for the set of X-rays used, Entelai Pic had a good performance in the classification of probable Covid-19 pneumonias.

Now what would happen if we exposed this system to a completely different set of X-rays (other countries, other equipment, other patients)? Within the initial data set, an independent part is separated which is used to evaluate the system once trained, and perform internal validation. Although the images are different from those used for training, they start from the same bases and, therefore, the operation of the same equipment and the same population with which the network was trained is evaluated. The external validation is then performed as a second, more demanding test, with a set of database data completely independent of the set with which the system was trained. This allows to see if the data reflected in the first test are robust and extrapolate to other teams and populations. This validation with an external data set is fundamental to ensure the adequate performance of any AI or prediction system. Generally, especially in this scenario of training with few images, a decrease in performance is expected. Below are the results:

  • Sensitivity: 70%.
  • Specificity: 79%.
  • Positive predictive value: 67%.
  • AUROC: 0.74

The results are not as accurate because the system does not feel as confident about completely new or unfamiliar images in your training. Hopefully, then, the system will perform in real life closer to external validation than the optimistic numbers of your initial training. In
any case, evidence suggests that under normal conditions the sensitivity of clinicians or emergency physicians to detect pathologies in chest radiography ranges from 20-69% (not counting something new like Covid-19). 

Today these professionals are the ones on the front line of this battle and we believe that the results of Entelai Pic Covid-19 could help them.

Scope and limitations

It is very important to bear in mind the limitations of this model, namely

  • Number of images used in the training: About 100 images per category were used in this model for final adjustment and prediction, and the rule of thumb is that the more images, the better system performance. We are working against the clock to increase the number of images. We thank and call on physicians and health professionals who can add patient images to enrich this tool. As weeks go by we will give more images to the system so that it learns more and improves its performance.
  • System validation: this system was trained with images of adults mainly from China and Italy, so its performance is not necessarily equivalent in images of patients from other regions, or tested with other equipment. That is why it is always important to make local tests and external validations with other data sets, as Entelai always does with its developments. Until these experiments are performed, the performance obtained in the initial training, can be far from the one obtained in practice and is one of the reasons why this tool is only for experimental use by medical professionals.
  • Selection biases: the images taken for this dataset were not collected with clear and specific criteria, so there may be biases that affect system performance. For example, that only the most severe and notorious cases are uploaded, leaving aside the more moderate cases and with perhaps different findings in the X-rays. Thus, the algorithm would be biased to only detect the severe cases and ignore the mildest ones. That could lead to additional errors and lower detection rates.

Precisely because of these limitations, Entelai always carries out a triple quality control in its developments, internally (our data), externally (other data and performance of other research groups) and by teams and clients (to ensure the correct performance in each team of each client in the different countries we operate. This is significantly more laborious and time-consuming, but gives doctors and patients unique assurance of the quality with which we work. Finally, we work with the various regulatory agencies to validate and approve the quality and safety of our developments.

Why then generate and offer a tool that does not comply with any of these quality controls? 

This question was in our heads for many days and was the subject of internal debate. We are facing a global pandemic emergency and it is vital to speed up detection times to achieve quicker and better diagnoses. In addition, globally, there is a lack of materials and even of medical professionals who can provide quality care to patients at the peak of demand. Against this background, we believe it is essential to make a concrete contribution with AI tools that will help improve healthcare practice.

Is the emergency reason enough to relax these quality controls or is it better to wait until all the validations are in place and offer this help in 4 or 6 months?

We honestly don’t know the answer, we don’t know if there is a right option and we decided that the medical users should be the ones who finally “validate” the usefulness of having this tool now by making them aware of its limitations. Currently, decision makers around the world are evaluating and thinking about measures in the emergency, without having experience and data and surely we will learn and be better prepared for a new pandemic scenario. If doctors find it useful, and it helps to save even one life, the effort will have been worthwhile.

The Entelai Team (Mauricio Farez, Diego Fernández Slezak, Carlos Cicogna, Macarena Gonzalez, María Mercedes Serra, Hernán Chaves, Pablo Heide, Martín Elías Costa, Francisco Dorr, Joaquín Seia, Andrés Ramirez, Iván Donoso)

Access the page at https://covid.entelai.com