See original article for table and figures
We are experiencing terrible days, and still do not seem to see the end of the tunnel. In this regard, Mauro Maltagliati, after reflecting on the choice of the best variable to analyze in order to understand how the Coronavirus pandemic is evolving in Italy, follows its temporal evolution and dares to predict.
Mauro Maltagliati, Covid-19: Let's try to predict what will happen
All of Italy now follows the statistics on the Corona virus with almost spasmodic interest, with the understandable objective of understanding how the phenomenon will evolve and, above all, when the pandemic will end, at least in our country.
When you venture into forecasts, there are essentially two ways you can go. The first is perhaps logically a little more defensible: a "causal" model is used, that is to say, the evolution of the phenomenon of interest (the spread of the virus) is expected according to others, those that favor or slow it down, such as for example containment policies, weather (with the arrival of the hot season: will it be useful?), progress in medicine.
However, the relationship between these "causal" factors and the spread of the virus is unclear, and even less so is their evolution in the coming days and weeks. In short: a difficult road.
A time series approach
Let's try to follow a different track, already undertaken by many, even on Neodemos
The evolution of the phenomenon in the recent past is observed (say 10-15 days), and it is assumed that the regularity already highlighted is maintained even afterwards and the values thus obtained are projected. It is a "black-box" approach: we do not investigate the causal mechanisms mentioned above, but we simply follow the phenomenon, and extrapolate it to the future. Obviously, this approach is defensible only until "shocks" of any nature occur, which in the specific case hopefully are better (we hesitate to say "positive"), such as an unexpected success of the confinement policy of the Italians, or the discovery of a cure.
So many numbers, so little clarity
However, we must choose a variable to analyze. In these days many are named (infected, positive, dead, mortality rate, lethality rate, ...), but few say what is commonly thought.
Let's take the number of infected, or infected, for example. We actually know the "positives", or the number of infected among those who have been subjected to the swab. However, only those who are strongly suspected of being infected are subjected to the test, and this distorts the vision, both of the level and of the tendency of the phenomenon. To really know the infected, or rather to estimate them correctly, you should swab a random sample of the population - an operation that is obviously not possible today (and, of course, I am not proposing it here).
Even the "healed" are hardly measurable, as they are actually a fraction of the "positive", and not of the infected. Paradoxically, the increase in the healed can even provide a misleading image: it is not necessarily good news, because, with the same success of the therapy, the healed increase as the number of patients increases.
As for the various rates, the measurement problems are even greater. For example, the mortality rate will only be known at the end of the epidemic (deaths from Covid-19 / residents). In the meantime, one could think of the lethality rate (deaths from Covid-19 / infected), which however is not easy to calculate correctly for two reasons:
1) as mentioned, we do not know the number of infected people, but only the positive ones, which more or less correspond to the full-blown infected people, and therefore more serious. If we used the correct denominator for the lethality rate (the infected, not the positive), the lethality would probably be very low.
2) there is a time lag between when you are infected and when you unfortunately die (if this happens), and to calculate the true lethality we should know "what will happen to the infected" - that is, if they die or instead recover. But since we do not know the evolution of the number of infected, nor the moment in which they became infected, nor the times (different from each other) in which each infected transform into a healed or a dead person, the picture is considerably complicated
Let's talk the dead
Once the other variables have been discarded, the sad, but necessary (and in this case also useful) count of the dead remains. A less uncertain figure than many others, but which also needs some clarification. Not in all countries, in fact, we refer, as happens in Italy, to the number of deaths with Covid-19 (as opposed to the deaths for Covid-19). Yet it is the most reliable definition, since it is not trivial to ascertain whether some individuals, perhaps with other serious pathologies, would have remained alive (and for how long) if they had not been infected with the Corona virus.
And in fact, from February 27 to March 10, that is, at the beginning of the epidemic in Italy, the number of deaths followed an exponential law almost perfectly (tab. 1):
The slope of the line estimated on the logarithmic transform tells us that in the period considered the number of deaths per day was approximately 31% of the (overall) value updated to the previous day. But this regular evolution (linear in the logarithm, and therefore explosive in natural numbers) must sooner or later turn into a logistic curve, and no longer exponential. And luckily, that's what's actually going on.
The estimate of the logistic curve and ... D-Day
If we add the data of the last few days we get figure 2. On this basis, statisticians like me can do their job and estimate the logistic curve of the accumulated deaths. If, as in this one, a good adaptation for the past is observed, a forecast on the future, and in particular on the exhaustion of the process (= end of the pandemic, at least in Italy) can also be hazarded.
Well, strictly speaking, to be honest, logistics never "ends": it tends to a horizontal asymptote (the total number of deaths), without ever reaching it. But let's say that we are satisfied with estimating a day that we can conventionally define as the "end of the pandemic": the one after which, in total, there will be less than 10 deaths with Corona virus. Well, this day should be near now: April 10, or so.
Of course, this is only an estimate: I can be wrong, and in a few days you will know how much. In the meantime, we can say that if the estimate was perfect, today's afternoon data (March 19) should lead to a (cumulated) number of deaths equal to 3,300, that is about 330 more than yesterday (March 18).
We must not forget that, as new data arrive, the model adapts and estimates adjust. Furthermore, this forecasting exercise implies that we are all homogeneous, that is, all equally subject to mortality from this virus, while we know it is not: there are more or less resistant people.
There are factors that can show discontinuity, collapsing (healthcare?), Or instead giving positive impulses (the confinement policy, the effects of which on the dead, still, in practice, have not been seen, and the discovery of new treatments).
And there is the mystery of southern Italy, so far relatively spared from the epidemic: virgin land (and therefore subject to rapid deterioration) or protected land (by confinement measures)? The answer, now, in a few days.