Statistics in journalism

June 21, 2018 Statistics Journalism

A few days ago, I saw an article on a local news site which claimed that alcohol consumption in New Zealand has become normalised, and as a result we're drinking more.

They even included this handy chart to give us an idea of how normalised it has become:

This set my statistician's sense tingling. Not only are they considering only absolute volume of alcohol consumed and not considering population change, we have somehow managed to avoid the decrease in alcohol consumption which has taken place among young people in many other developed countries over the past decade. A good example is the UK.

Alcohol is certainly an issue for some individuals, and has a significant impact on health at the population level. But if we're going to use statistics to talk about trends in consumption, let's do it properly.

First, let's take a look at alcohol available for consumption per capita. These figures are available quarterly from Stats NZ.

The first thing we notice is that this chart displays strong periodicity, or seasonality - the fourth quarter is always larger than any other quarter. So we can get a better idea of any trend by adding a four quarter moving average.

That gives us some better clues on the direction of any trend. It seems to be broken into three parts:

A period of sharp decline from the beginning of our series to 1997.
Then a slow but steady increase from 1997 through to 2011.
And a slow and steady decrease from 2011 through to today.

A better (and more rigorous) method of identifying a trend is to use seasonal decomposition. Fortunately, the python module statsmodels provides us with a useful function. Let's start by importing the necessary modules, loading our data, and creating a datetime index.

import pandas as pd
from matplotlib import pyplot
from statsmodels.tsa.seasonal import seasonal_decompose

data = pd.DataFrame([2.594,2.336,3.385,2.415,2.731,2.795,3.341,2.401,2.383,2.326,3.223,2.188,2.309,2.44,3.139,2.185,2.424,2.441,3.144,2.203,2.285,2.554,3.274,2.166,2.264,2.343,3.085,2.109,2.187,2.315,3.166,1.987,2.162,2.176,2.963,2.16,2.204,2.21,2.943,2.088,2.162,2.203,2.834,2.036,2.157,1.919,2.695,1.815,2.022,1.898,2.949,1.901,2.054,1.945,2.772,1.91,1.949,2.06,2.953,1.877,2.047,2.075,2.906,1.856,2.051,2.048,2.795,1.98,1.999,2.222,2.924,1.925,2.036,2.118,2.829,2.112,1.983,2.249,2.801,2.063,2.209,2.152,2.89,2.015,2.114,2.237,2.973,2.07,2.164,2.125,2.827,2.116,2.324,2.019,3.028,2.014,2.144,2.014,3.089,2.245,2.129,2.135,3.099,2.124,2.325,2.314,2.705,2.013,2.299,2.094,2.795,2.154,2.27,2.004,2.755,1.973,2.219,2.12,2.757,1.995,1.992,2.13,2.578,2.055,2.032,2.205,2.634,2.006,2.033,2.016,2.791,1.941])
data.index = pd.DatetimeIndex(freq="q", start='1985-06-30', periods=132)

Now run and plot the decomposition.

pyplot.rcParams["figure.figsize"] = (10,10)
result = seasonal_decompose(data[0], model='multiplicitive')
result.plot()
pyplot.show()

Now we see the trend, seasonal component, and any residuals nicely broken out. Note that the scales on the y-axis are different, and I've used a multiplicitive decomposition.

Let's focus now on the trend over the last few years - after all, that seems to be what the journalist was trying to report on. I will create a new dataframe only containing data from 2010 onwards, and perform the same seasonal decomposition.

data2 = data[data.index>='2010-03-31']
result = seasonal_decompose(data2[0], model='multiplicitive')
result.plot()
pyplot.show()

I think the conclusion is clear. Since the peak in 2011, the trend has been one of decreasing alcohol consumption. Over this period there has been a decrease in per capita alcohol consumption of roughly 10%. That would make a different headline.

Statistics in journalism

June 21, 2018 Statistics Journalism

Related Posts

May 10, 2019

What I'm working on — May 2019

November 18, 2018

Visualising large data sets in R

April 10, 2018

A first experience with Kaggle competitions