CORONA: THROUGH THE LENS OF DATA SCIENCE

What's next?

 

Your contact at diconium

Amin Dadashi
data scientist

In times of uncertainty and crisis, numbers and behaviors help to gain insight and avoid misunderstandings when faced with new phenomena such as the novel Coronavirus (COVID-19). Mathematical models and data are reliable channels of scientific communication which can inform us about the spread of coronavirus, and how we are, as individuals, responding to the actual crisis.

COVID-19 is nothing more serious than seasonal flu, or is it?

The media is intentionally ignoring the recovered population and overstating the risks of the novel Coronavirus infection. Flu has a comparable death rate to COVID-19. If being treated like normal flu, it would go away with minimum damage. All these claims seemed to be valid before each nation faced the truths. Partially correct statements, which without knowing the true circumstances might lead us to irreversible devastation.

Fortunately, data and statistics capacitate us to avoid the danger at the time scientist are working on developing the proper medicine to gain full control over the situation. The numbers that are describing the Coronavirus and flu infection might not be noticeably different at first sight, however, when talking about an epidemic it all comes down to maintain an equilibrium. Minimal exceed from the safe threshold that disturbs this equilibrium might trigger an exponential growth and an outbreak.

There are many factors which rate the severity of an epidemic; How many susceptible people an infectious person is exposed to, the contagion chance, total number of susceptible people, geographical distribution of the population, the incubation period, the time it takes that an infected goes out of the cycle by recovery or death and of course the diversity of the possible carriers. Let’s have a look how SIR model (Susceptible, Infected, Recovered) which predicts the spread pattern of the two so-called similar COVID-19 and seasonal flu infections; taking into account that on average a person diagnosed with flu is hosting 1.3 new victims and remains infectious for 8 days compared to a Coronavirus patient which delivers the virus to 2.7 other people and is capable of spreading the disease for 14 days on average. Of course, the actual values may differ significantly under different circumstances, but the relative values should stay in the same range and that is enough to make the point.

Figure 1 - Modeling the spread of the seasonal flu (left) and the Coronavirus (right), in case of no precaution

SIR model reveals how fast each epidemic spread, starting with one infected person in a society as large as Germany with 80 million susceptibles, what percentage of susceptibles are at risk and how long does it take that the epidemic ends (figure 1). In a perfect condition, Coronavirus infects almost 90% of the population, which is considerably more than 65% by the seasonal flu. It also takes Coronavirus half the time to find all its hosts. This makes it significantly difficult to cope with the flood of diseased people in the hospitals. Another immediate consequence is the shortage of health care equipment and supplies as the industry cannot keep up with the demand, or even just the panic of the shortage which draws people out of their safe zone and expose them to the danger by crowding the supermarkets or pharmacies. The rapid escalation of the situation also makes it harder to identify infected people. According to estimates, between 291,000 and 646,000 people worldwide die from seasonal influenza-related respiratory illnesses each year*. These numbers also reveal the potential risk of the Coronavirus infection, however the positive prospect is that we still get the opportunity to stop its escalation.

When our destiny is guided by data!

Data tells the truth if interpreted carefully. At the time that no real model of COVID-19 epidemy exists, there is nothing to foretell. We do not know how fast it spreads, what would be the endmost extent of the outbreak, or when it would disappear. However, we can learn from what is going on in the current infection hotspots which tells about the near future of the next infected regions.

Figure 2 - Number of infected people normalized by the population size in Italy (IT), France (FR), Spain (ES), Germany (DE) and the US in the day at which a specific keyword has been searched the most number of times in the period between, February 15 to March 7, 2020.

Figure 2 is extracted from Google trend data in the period of Feb 15 - March 7, 2020, and it demonstrates the general behavior of people in 5 most affected countries. The size of the circles is relative to the number of infected people on the corresponding date in which the specific keyword has reached its local maximum. As an instance, the top right diagram shows Germans felt the urge to gain knowledge about the Coronavirus when they already had more infected cases than Italians. The same is true for the term “sanitizer”, which probably means the Italians were even more aware of the consequences and the precautionary measures at the same stage of the infection spread. 

Figure 3 - The date at which the number of infected surpassed 400 in each country

Considering the dates in which the number of infected people passed 400 (figure 3) is another sign that shows Italians had reacted relatively faster than the other four countries. Does this information mean that we in Germany are just a few days behind Italy regarding the novel Coronavirus epidemic? As on March 17, we already know that, what could be predicted 10 days ago had already happened. Coronavirus pandemic in Germany with nearly 8000 infected cases on March 17 seemed pretty much under control on March 7 with 800 cases compared to Italy with 6000 cases.

At the time of crisis, every action count and any relevant data can be a valuable source of insight and a key to survival if it is used ontime and cleverly. Our destiny is determined by our actions, let the data guide it.

What is the way out?

Awareness is the door to immunity and data is its key. Models tell us that meeting fewer people and abiding by the social distancing rules has a considerable effect on the SIR plot. This is what widens the SIR plot width and buys some time, shortens the plot height and puts fewer people at risk. Rushing to supermarkets will guarantee our toilet paper supplies for a month as well as the infection of more people. Buying more time is the only way to overcome the crisis, keep people healthy and sustain the businesses which empowers us to endure until a solution is found. It does not matter where you live,  what your age  or health condition is, you could still be infected and spread the virus. Almost every country is on a similar trajectory of a breakout. So, do not underrate it until we obliterate it.

References:

* Seasonal flu death estimate increases worldwide. CDC Newsroom. https://www.cdc.gov/media/releases/2017/p1213-flu-death-estimate.html. Published 2017. Accessed March 17, 2020.

What's next?

 

Your contact at diconium

Amin Dadashi
data scientist