Non-response – Dealing with it when Reporting (Article 2)

Author: Neil Higgs

Analytic considerations

The first and almost easiest approach is to conduct a weighting exercise on a study in terms of what are felt to be key demographics (or other variables such as level of availability, degree of technophobia or other propensity-type issues). 

Note that weighting makes the major assumption that the variables being used in the weighting exercise are indeed predictive of views and behaviours for the subject matter at hand.

This is not a trivial assumption.

If a weighting exercise has been conducted, there will be an effect of margins of error, addressed by a calculation of sample efficiency and what is called effective sample size. This is either given by the software being employed or requires an experienced sampling statistician to calculate.

The effective sample size or the sampling efficiency is required to be stated wherever weights have been used. At the very least, the weights employed MUST be stated in any Technical Report on a study.

Ask yourself when last you saw this in a study that has been weighted. Next time, ask for it.

Some new ideas

It will generally be difficult-to-impossible to quantify which of the many reasons outlined in our first article are in play. In what follows, therefore, we take a simpler approach in examining the likely effect on a survey’s findings of non-response of different levels[4]. However, we propose a different approach to understanding the non-responders than commonly assumed.

Let n = the number of responders, this being the sample size usually quoted in survey results and press releases.

Let m = the number of non-responders for whatever of the above reasons.

Let pn, pm be the response to a question posed to responders and non-responders respectively: pn is the survey result; pm is the (unknown) response that non-responders might have made had they responded.

Let pt be the estimated true value in the population, taking into account both responders and non-responders.

Let r = the response rate.

Then, by definition –

In this equation, whilst we know pn and r, we obviously do not know pm.

There are a number of possible options to consider here.

Clearly, if we can obtain any estimate of this value, that will be a material help. In this regard, for example, if we know what the value of a key parameter was for those called on, say, a total of four times, we might decide to assume that the missing people were all just too active or out  – and assign that value to pm. This is not necessarily a poor decision, though it does ignore any other reason for non-response.

A better approach is to keep on trying to contact a portion of the non-responders with a very short questionnaire containing key demographics and one or two key questions to try to establish how different these people actually are. This is a recommended approach if time and cost allow. A good example of this is the Census Post Enumeration Survey (PES) which seeks to emulate the Census in miniature to establish who was missed and what they look like compared with those who participated.

Cochrane suggests one might assume that everyone either gives the response of interest or everyone gives a negative: that is, pm is either 0 or 1. This can be applied to the confidence interval calculated for pt but, as he points out, this gives a “distressingly large confidence interval”; indeed, both conditions cannot hold at the same time.

There is a further issue to consider: r is itself a random variable in that another study would yield a different value for r.

This appears to have escaped everyone’s attention.

This, in turn, suggests we need some standard protocols to address formally the issues raised in some standard ways. Three protocol options are presented next.

Suggested protocols

  1. If there is any data at all that can be used to estimate pm, that will always be first prize, although the value adopted will need explicit substantiation in reporting, along with the value of r.
  2. There is some evidence that responders might be those who feel more strongly about a topic whilst non-responders might be more phlegmatic. This, in turn, suggests that a reversion to the mean in some way might be a conceptual approach to consider in finding some value for pm. If this protocol is adopted, either pm should be set at some middle value such as 50%, or it should be set to be between pn and 50%, perhaps the simple average of pn and 50%. The wisdom of this or any other suggestion will be context-dependant. Whichever is adopted needs to be stated explicitly with reasons. As a first recommendation for Protocol 2, it is suggested that we set pm to be halfway between pn and 50%:

3. Alternatively, set pm to be 50%.

In the example so far of pn = 43%, the last two protocols would set pm at either 46,5% or 50%.

For convenience, tables of pt against r and pm  (or pn) are given for each of these last two scenarios at the end of the third article in this series. These require interpolation to use and so should be used for a first approximation: the actual equations would be best to use.

The protocols suggested here, for whatever value is finally adopted for pm, must be stated in the final report along with the response rate, r, for reasons that must now be clear – else we can have no idea of any possible effect of the differences that non-responders might have compared with responders and of the bias thereby introduced. This is a point that is largely ignored in many reported surveys today.

Studies that omit to mention the response rate or any assumed value for pm are implicitly assuming that pm = pn.

On what basis?

By definition, the two groups are different for some reason. Any attempt either to find that reason or to obtain any further data from the m non-responders must be made and reported. Alternatively, the reason for adopting any specific value of pm and what that value of pm is, must obviously be stated.

 


 

[4] This approach is mathematically the same as that proposed by William Cochrane in his 1977 book Sampling Techniques, and repeated in Moser and Kalton’s Survey Methods in Social Research. Both Kish and Cochrane effectively set the foundations of sample design. It is surprising that their approach is not widely recognised and used as a matter of course today. The approach given here uses relative ratios rather than absolutes as in the original. But the idea is identical.

[5] Note that, if one uses the negative to pn and pm (0,57 and 0,85 in the example), the same bias of six points emerges.