In the issue of February 26, we gave an example to illustrate Simpson’s Paradox. The data in  the table shows that 70 per cent of students from one school, St Giles, passed their SEC exams, while 60 per cent of students from another school, St James, were successful.

However, the same data shows that 53 per cent of St James’ boys passed their SEC compared with 43 per cent for St Giles, and 90 per cent of St James’ girls passed compared with 88 per cent of St Giles’ girls. Overall, St Giles has a better success rate, but for girls and for boys separately, St James has a better success rate in each category.

We also said that the situation can become even more complicated because the same data can be grouped under different categories pointing to different conclusions. For example, it could happen that the catchment area for both schools can be divided into either urban or rural, and the same data, this time categorised accordingly by catchment area, could indicate that St Giles has the better track record whether with children coming from a rural or an urban locality.

These unexpected ways in which numbers can behave should make those who present statistics and those who base their decisions around them very careful. Not presenting the categories underlying one’s figures can be misleading. But the opposite could equally well be true. What if we analyse the performance of the students in the two schools by colour of eyes, or favourite dish or shoe size? Wouldn’t we be opening ourselves to spurious and misleading associations with the schools’ success rates?

The answers to these questions are not always easy. Sometimes the issue goes straight to the thorny question of when a numerical association indicates a causation. In fact, one general rule statisticians go by is to avoid categorising the data into classes which cannot conceivably have a causal effect on the outcome being measured.

Simpson’s Paradox is only one instance of how the behaviour of data can be counter-intuitive. The best remedy is not to lose faith in statistics, but to make sure that one only believes conclusions by reputable statisticians and data analysts. Important data which can have serious impact on the lives of many people, such as mortality rates from illnesses and their possible causes, or data pertaining to world climate change and possible association with human behaviour, should be made publicly available so all competent investigators can analyse them and publish their findings. This way, the ordinary citizen can make informed decisions.

These practices have already started. Public funding bodies are beginning to insist that the results of research supported by their funds should be published in open-access journals and, moreover, the actual data should be made publicly available.

The role of the internet is crucial to all this. Many of modern society’s ills are put at the door of the internet. But the internet could be the cure, not by restricting its use but by exploiting its openness, which ensures that important data and information that can shape the lives of future generations is freely available to all.

Sign up to our free newsletters

Get the best updates straight to your inbox:
Please select at least one mailing list.

You can unsubscribe at any time by clicking the link in the footer of our emails. We use Mailchimp as our marketing platform. By subscribing, you acknowledge that your information will be transferred to Mailchimp for processing.