Non-response – Getting more technical (Article 3)

Author: Neil Higgs

Effect on Confidence Intervals

Estimating pm is subjective and open to challenge, but, ultimately it is about estimating a bias, rather than a variable error. One could argue a case of using varying estimates of pm to observe the effect on the estimate of pt, but, as Cochrane points out, pm cannot assume multiple values at the same time.

If we were indeed to examine how pt varies with different assumptions for pm, that might be termed the modelling of uncertainty, but it has nothing to do with traditional confidence intervals.

A recommended protocol then would be to examine modelling such uncertainty via such different assumptions, this being regarded as a sensitivity test to check on the seriousness of those different assumptions.

Sensitivity to those assumptions will be driven by the magnitude of r, again underlying the critical importance of stating the response rate. If it is low, the whole issue is less serious – but if it is not, such modelling of uncertainty might be important.

What might affect a confidence interval (ci), however, is that r itself is a random variable – we would expect a different value of r if we were to repeat the study. This appears to be an area neglected by other authorities.

A confidence interval for r can be calculated in the usual way, as can that for pt. The interval for r is based on a sample size of (n + m); that for pt on n, as this would have been the sample size without non-response.

However, the two confidence intervals must essentially be combined in some way, which raises the question of the level of confidence to use: if we use a 95% confidence interval for both variables, the result will be unduly pessimistic with a confidence level of 1 – α2 = 0,9975, as the two results are independent. A better level might be α = 0,2, giving 1 – 0,04 = 0,96 and 1,28 standard errors [6]. 

Suggested procedure

The procedure we propose calculates a new value for pt for each end-point value of r from its ci, and then obtains a ci for that value.

We work all this through via an example.

In Kanûsia, a mythical African country, a recent dipstick study of 400 people showed that 29% were pessimistic about the future. The study had a response rate of 67%. Hence, 600 people had been contacted but 200 were non-responders for whatever reason.

If we assume that the non-responders are ambivalent about the county’s future, we can indeed set pm to 50%, (Protocol iii). Then –

A naive confidence interval on this number based on a sample of 400 would be –

This yields 36% ± 5% or [31%;41%].

(Note that conventional reporting would have quoted 29% ± 4,4% or about [25%;33%], substantially different.)

Next, we consider r as a random variable in its own right.

The 95% ci for r is –

This yields [63%;71%].

But we need to drop the 95% ci approach and use an 80% ci in each case, as noted earlier.

Then 80% cis for both pt and r, give ±3% and ±2,5% respectively (and using exact 95% limits and 1,22 standard errors, we have ±3% and ±2,4% respectively).)

The new procedure then calculates a value for pt for each end-value of r from its ci, and then obtains a ci for that value. For now, we use exact 95% limits (1,22 standard errors) so that we can compare with the conventional results and the naïve result above.

Let the 80% ci for r be denoted by [rl | r | ru]80.

Then we calculate pt for each of [rl | r| ru], and then an 80% ci for each of these.

This gives r = [0,646|0,67|0,694] as lower, middle and upper estimates for r. We now calculate pt for these three values of r as 36,4%, 35,9% and 35,4%.

 A 95% ci for each of these gives ±3,1% in each case, to three decimal places.

The final interval is then the lower limit for pt at rl and the upper limit of pt at ru.

Our final interval is then [32,3;39,5] by subtracting 3,1% from 35,4% and adding 3,1% to 36,4%.

To compare:

Conventional: [25%;33%]

Naïve:             [31%;41%] but adjusting for r

Protocol iii      [32%;40%] now allowing for to be a random variable

This suggests that the Protocol iii approach, where we treat r as a random variable, has a small but noticeable additional effect whereas the need to adjust for bias given non-response has a major effect. The slight narrowing of the Protocol iii interval is due to combining two independent calculations and adjusting for that independence (termed a Bonferroni adjustment).

We now show the effects of these decisions on the original sample with a result of 43% (n = 400), an assumed value of pm of 0,5 and r = 80%, so that m = 100.

Then –

The original confidence interval for pt is –

That yields (39%:49%), which we take as a benchmark (although a study that ignores the response rate would have reported this as 43% ± 5% or (38%;48%).

The 95% ci for r is –

We shall take this as ±2% for the purpose of this example giving (78%;82%) for r.

If we use 80% cis, we obtain ±1,2% respectively, whereas an exact 95% confidence interval would yield ±1% in round numbers.

To summarise, the procedure now adopted calculates a new value for pt for each end-value of r’s ci and then obtains a ci for that value assuming both pt and r are random variables, not just pt.

As a final point, this suggests that there might be benefit in adopting a relative rather than absolute approach to what is essentially a Bayesian situation.

This entails a further piece of algebra.

Let s be the relative gap between responders and non-responders –

The term might be considered as the bias multiplier for different values of r and s. A table of these multipliers is given at the end of this article. (Here it is 0,88.).

Conclusions

Much effort is usually made to maximise response rates on the grounds of cost and validity.

But the final outcome is almost never reported, a departure from the halcyon days of the past, and an issue actively addressed by Cochrane et al.

Yet nonresponse rates have a potentially notable effect on the estimates derived from any survey – a potentially serious bias is introduced by not considering them: even worse is not even bothering to report them.

Why is this? Time? Cost? Effort? All of the above. Is the issue even still relevant?

Yes, it is relevant. Surveys are still regularly conducted by a variety – ever increasing – methods of data collection. But some people are missed. This renders the results biased in some way.

At the very least, the response rate must be reported.

Better, a likely value for pm should be given with explicit reasons, and then, taken to its ultimate conclusion, yielding a new estimate of pt.

At best, we should be calculating a new ci for that new pt by considering r to be a random variable.

 

[6] If we require an exact 95% confidence level, we need 1,22 standard errors.