It seems to me that the way the Corona Virus numbers are used in this phase of global hysteria does not help in the understanding of the scenario.
Animations and “infographics” about the spread of contagions, deaths counts or the speed at which the virus propagates are ubiquitous, but the criteria used to produce these materials are hardly known, and sometimes there is a suspicion that some of them lack real basic knowledge of how statistics work.
I prevent an (easy) objection: it is true, I am a jurist and not a statistician, so I am not qualified to speak with scientific competence on the subject.
That is true, and indeed I do not intend to. I only use what I learned in mathematics between high school and university and what I studied in statistics by collaborating on the Italian edition of the classic by Darrell Huff, How to lie with statistics, edited and translated by Giancarlo Livraghi (who, as a great advertising man, knew the subject perfectly) and by Prof. Riccardo Puglisi (who, as an economist, is equally well versed on the subject).
I do not offer “truth”, therefore, but only doubts in search of answers.
Firstly: unifying the various categories of the deceased makes the sample unbalanced and calculating the mortality rate on an undifferentiated population provides an unreliable result. To establish the death rate of the virus, one should at least differentiate who had other pathologies on the consequences of which the virus was superimposed, from those who were sick of something else but did not know it, from those who were in particular conditions that favoured the expansion of the virus (immunodepression from hyperactivity, for example). This article goes in the right direction, even if the methodological problem of how to use statistics remains.
Secondly: it is one thing to analyse a statistically valid sample; it is another to analyse an unbalanced sample. In other words: if I look for the supporters of a football team in the supporters’ curve, I obtain a clearly different result than if I use a sample – depending on the level of the team – built on a city or national basis. Unbalanced champions can also serve, but you need to be clear about the limits of the knowledge they generate.
Thirdly (and consequently): even transforming the absolute values of deaths and infections in various countries into percentages without adopting weights is methodologically wrong. To say – as Il Giornale does – that the mortality rate is 4% out of 3,858 cases induces an incorrect generalisation when comparing the “raw” ratio between the number of cases and deaths.
Moreover, and concluding: as long as there are no numbers large enough to obtain statistical significance, one should be very cautious in spreading them. If 7 out of 10 or 490,000 out of 700,000 people give a particular answer to a questionnaire, in both cases, we can say that 70% of the respondents pronounced in a certain way. But (without prejudice to the need for a statistically valid sample) each case clearly has a different explanatory power. It would be useful to know, for example, whether the numbers used in a study like this are still too low to be statistically valid or not. In the first case, it would be “only” a frozen-frame of an upgoing video; in the second it would provide information on overall value.
Rereading Darrell Huff’s book, therefore, might not be a bad idea.