Probability-proportional-to-size
sampling-
by Arman Thakker
In some cases the sample designer has
access to an "auxiliary variable" or "size measure",
believed to be correlated to the variable of interest, for each element in the
population. These data can be used to improve accuracy in sample design. One
option is to use the auxiliary variable as a basis for stratification, as
discussed above.
Another option is
probability-proportional-to-size ('PPS') sampling, in which the selection
probability for each element is set to be proportional to its size measure, up
to a maximum of 1. In a simple PPS design, these selection probabilities can
then be used as the basis for poisson sampling. However, this has the drawback
of variable sample size, and different portions of the population may still be
over- or under-represented due to chance variation in selections. To address
this problem, PPS may be combined with a systematic approach.
Example: Suppose we
have six schools with populations of 150, 180, 200, 220, 260, and 490 students
respectively (total 1500 students), and we want to use student population as
the basis for a PPS sample of size three. To do this, we could allocate the
first school numbers 1 to 150, the second school 151 to
330 (= 150 + 180), the third school 331 to 530, and so on
to the last school (1011 to 1500). We then generate a random start between
1 and 500 (equal to 1500/3) and count through the school populations by
multiples of 500. If our random start was 137, we would select the schools
which have been allocated numbers 137, 637, and 1137, i.e. the first,
fourth, and sixth schools.
The PPS approach can improve accuracy
for a given sample size by concentrating sample on large elements that have the
greatest impact on population estimates. PPS sampling is commonly used for
surveys of businesses, where element size varies greatly and auxiliary
information is often available - for instance, a survey attempting to measure
the number of guest-nights spent in hotels might use each hotel's number of
rooms as an auxiliary variable. In some cases, an older measurement of the
variable of interest can be used as an auxiliary variable when attempting to
produce more current estimates.