Download:
pdf |
pdfPaper 247-2009
Extreme Survey Weight Adjustment as a Component of Sample Balancing (a.k.a.
Raking)
David Izrael, Abt Associates Inc., Cambridge, MA
Michael P. Battaglia, Abt Associates Inc., Cambridge, MA
Martin R. Frankel, Baruch College, CUNY and Abt Associates Inc., Cambridge, MA
ABSTRACT
Raking is a widely used technique for developing survey weights. It assigns a weight value to each sampling unit
such that the weighted distribution of the sample is in very close agreement with two or more marginal control
variables. For example, in household surveys the control variables are typically sample design and sociodemographic variables. Raking is an iterative process that uses the sample design weight as the starting weight and
terminates when the convergence criterion is achieved. The resulting final weight may however exhibit considerable
variability, with some sampling units having extremely low or high weights relative to most of the other sampling units.
This leads to inflated sampling variances of the survey estimates. To combat this problem we developed and
incorporated into our popular IHB Raking macro two weight trimming procedures that are implemented during the
actual iterative process, allowing one to achieve convergence while controlling the highest and lowest weight values.
®
The new procedures work under SAS v. 8.2 and higher and is intended for a medium or high skill level audience.
INTRODUCTION
A survey sample may cover segments of the target population in proportions that do not match the proportions of
those segments in the population itself. The differences may arise, for example, from sampling fluctuations, from
nonresponse, or because the sample design was not able to cover the entire population. In such situations one can
often improve the relation between the sample and the population by adjusting the sampling weights of the cases in
the sample so that the marginal totals of the adjusted weights on specified characteristics agree with the
corresponding totals for the population. This operation is known as raking ratio estimation (Deming 1943, Kalton
1983), raking, or sample-balancing, and the population totals are usually referred to as control totals. Raking may
reduce nonresponse and noncoverage biases. The initial sampling weights in the raking process, generally referred
to as the design or input weights, are often equal to the reciprocal of the probability of selection and may have
undergone some adjustments for unit nonresponse and noncoverage. The weights from the raking process,
sometimes referred to as the final weights or the raked weights, are used in estimation and analysis, and add to the
total population size.
Raking usually proceeds one variable at a time, applying a proportional adjustment to the weights of the cases that
belong to the same category of the control variable. Convergence of the raking algorithm has received considerable
attention in the statistical literature, especially in the context of iterative proportional fitting for log-linear models,
where the number of variables is at least three and the process begins with a different set of initial values in the fitted
table (often 1 in each cell). One simple definition of convergence requires that each marginal total of the raked
weights be within a specified tolerance of the corresponding control total. As noted above, in practice, when a
number of raking variables are involved, one must check for the possibility that the iterations do not converge (e.g.,
because of sparseness or some other feature in the full cross-classification of the sample). One can guard against
this possibility by setting an upper limit on the number of iterations. As elsewhere in data analysis, it is sensible to
examine the sample (including its joint distribution with respect to all the raking variables) before doing any raking.
For example, if the sample contains no cases in a category of one of the raking variables, it will be necessary to
revise the set of categories and their control totals (say, by combining categories).
Izrael et al. (2000) introduced a SAS macro for raking (sometimes referred to as the IHB raking macro) that combines
simplicity and versatility. More recently, the IHB raking macro was enhanced to increase its utility and diagnostics
(Izrael et al. 2004). The IHB SAS macro produces diagnostic output that contains the following information: number
of iterations, name of variable currently being raked on, name of BY-variable if there is one, and marginal control total
and calculated total weight for each level of the current raking variable, along with their difference and percentage
difference. At termination, the macro gives the iteration number at which termination occurred and the reason, which
is either that the tolerance has been met or that the process did not converge. The macro also writes diagnostics into
the SAS LOG, from several of the checks that it makes
1
We received many requests by SAS users to obtain the IHB raking macro and consequently made it available at
http://www.abtassociates.com/Page.cfm?PageID=8600 (also see technical paper at
http://www.abtassociates.com/attachments/raking_survey_data_2_JOS.pdf )
One limitation of the IHB macro for raking is that it does not place any limits on the highest and lowest weight values.
In some situations the raking may converge but the resulting weights exhibit considerable variability as measured by
the ratio of the highest to lowest weight values, and by the design effect due to weighting: 1+cv2, where cv is the
coefficient of variation of the weights. To enhance the usefulness of the IHB raking macro to SAS users, our goal
was to therefore develop a new SAS raking macro that uses weight trimming to reduce variability in the weights.
WEIGHT TRIMMING
Weight trimming refers to increasing the value of extremely low weights and decreasing the value of extremely high
weight values to reduce their impact on the variance of the estimates, especially for subgroup estimates. For
example, all weights that are less than X are increased to X, and all weights that are greater than Y are reduced to Y.
One consequence of the trimming of low and high weight values is that the weights of the entire sample will not add
to the population size. Although weight trimming is a separate topic from raking; they are certainly related in the
sense that weight trimming typically takes place at the last step in the weight calculations, which is often raking. The
objective of weight trimming is to reduce the mean squared error (MSE) of the key outcome estimates. By trimming
low and high weight values one generally lowers sampling variability but may incur some bias. The MSE will be lower
if the reduction in variance is large relative to the increase in bias arising from weight trimming.
There are no strict rules or procedures either to define extreme weights or for trimming the weights. Different surveys
follow different rules and therefore in practice there are several procedures to trim extreme weights. Some common
procedures for trimming large weights include: 1) identifying any weight bigger than 4 or 5 times the mean weight as
an outlier weight and trimming that weight by making it equal to the limit, 2) identifying any weight bigger than the
median weight plus 5 or 6 times the inter-quartile range of the weights and trimming the weight by making equal to
the limit, and 3) truncating weights above a certain percentile like 95 or 99 in the distribution of weights.
We developed two alternative weight trimming methods. Both are implemented during the raking iterative process in
order to ensure that: 1) limits are placed on low and high weight values in the final weights, 2) the convergence
criteria are satisfied, and the weights sum to the population total. The first method goes beyond the commonly used
procedures by allowing control over extreme weights, not only in terms of their relationship to the mean weight, but
also in terms of the magnitude of change from individual input weight values.
IGCV TRIMMING METHOD
The IGCV (Individual and Global Cap Value) method is based on the specification of global low and high weight cap
factors, and individual low and high weight cap values. The global low cap value (GLCV) equals the mean of the
input weights time a user specified factor less than one. The global high cap value (GHCV) equals the mean of the
input weights time a user specified factor greater than one. The individual low and high weight cap values (ILCV and
IHCV, respectively) are calculated separately for each respondent in the survey. The individual low cap value equals
the respondent’s input weight value time a factor less than one. The individual high cap value equals the
respondent’s input weight value time a factor greater than one. In rare situations it is possible that a respondent may
have an individual high cap value less that the global low weight cap value or an individual low weight cap value
greater than the global low weight cap value. When the former occurs the GLCV is used in weight trimming. When
the latter occurs the GHCV is used in weight trimming.
The IGCV method is implemented at each iteration after the raking adjustment procedure is applied to each control
variable within that iteration. The weight trimming is implemented for up to 50 cycles (see example in Table 1). Let h
= iteration number, i = raking margin (i.e., control variable), j = category of variable i, k = respondent, l = weight
trimming cycle (step) for category j of variable i at iteration h, and m = weight adjustment cycle (step) after trimming
for weight trimming cycle l of category j of variable i at iteration h. At cycle l the program indicates how many
respondents had low weights that were increased and gives the sum of the weights before and after trimming for
those respondents. At cycle l the program also indicates how many respondents had high weights that were
decreased and gives the sum of the weights before and after trimming for those respondents.
For example, at iteration h = 1, control variable i = 1, control variable category j = 1, we calculate the sum of WT11kl for
the respondents that had their weight trimmed. Call this total X1111 (i.e., Xhijl). We then calculate Y1111 = POP11 X1111, where POP11 is the control total.
For weight adjustment cycle l = 1, we ratio-adjust the weights of the respondents that did not have their weights
trimmed:
2
WT11k11 =
WT11kl (Y1111 / sum of WT11kl of the respondents who did not have their weights trimmed).
If the respondent had their weight trimmed then WT11k11 = WT11kl.
This is implemented for each category j of control variable 1. We then go to cycle l = 2 and determine if any
respondents that did not have their weights trimmed at cycle l = 1 have weights that now exceed the trimming values.
We apply the weighting trimming to those respondents. The cycling is continued until no respondents in each
category of control variable 1 had their weights trimmed or a maximum of 50 cycles is reached. At that point the
procedure is then applied to control variable 2.
Table 1: Example of weight trimming cycles for a control variable during an iteration
Low Weights Increased
High Weights Decreased
Total
ResponCycle dents
Total
Sum of
Weights
Total
Total
Weight
Weight
Number Sum of
Sum of
Number Sum of
Sum of
Increase
Decrease
of
Weights Weights
for
of
Weights
Weights
for
Respon- Before
After
Cases with Respon- Before
After
Cases with
dents Trimming Trimming IHCV GHCV
1
21507 4952569.00
2
21507 4952569.00
769 14371.15 16114.56
0
0.00
0.00
0.00
0.00
127 228665.04 181973.93
41
47037.81
44531.67
0.00
0.00
3
21507 4952569.00
0
0.00
0.00
0.00
2
5069.78
5066.10
0.00
4
21507 4952569.00
0
0.00
0.00
0.00
0
0.00
0.00
0.00
As discussed below it is possible that the raking will not converge when the IGCV method is used. To reduce
computer run time we want to minimize the number of rakings that run for the maximum number of iterations (typically
set at 75) and do not converge. Our experience with the rakings that do not converge using the IGCV method is that
one of two situations generally occurs: 1) the Maximum Absolute Value of Difference in % reaches a level above
0.025% (our routine convergence criterion) and does not change (flat condition), and 2) the Maximum Absolute Value
of Difference in % fluctuates from iteration to iteration, getting slightly larger and then slightly smaller (oscillation
condition) at a level above 0.025%.
We therefore let the IGCV method raking always run for at least up to 10 iterations. Starting at iteration 11 we
calculate the change in the Maximum Absolute Value of Difference in % compared to iteration 10. We do the same
for iterations 12 (i.e., iteration 12 compared to iteration 11), 13, 14, and 15, so we have five sequential change
measures. If these change measure decreases monotonically we allow the raking program to continue because it is
at moving towards the convergence criterion of 0.025%. If it is flat or is oscillating we stop the raking program. We
repeat this process for iterations 13 to 17, 14 to 18, etc. (i.e., 5 iterations at a time).
Table 2 gives an example of the four user specified cap values. The cap values should be determined by a survey
statistician through an examination of the distribution of the input weights, the closeness of the design-weighted
sample to the marginal control totals, and the degree of control desired over increases in sampling variability. In our
example, we did some testing of alternative multipliers and determined that using the mean input weight times 11.0
as the global high weight cap factor allowed the raking to converge while preventing respondents from having weights
that were much higher than the mean weight. Although other approaches are possible, we maintained symmetry in
the global trimming by using the reciprocal of 11.0 to determine the global low weight cap value multiplier of 0.091.
The individual high weight cap value factor was set to the respondent’s input weight times 5.0 based on our desire to
control the increase in sampling variability from respondents ending up with weights that exceed their input weights
by a factor considerably higher than 5.0. Again, to maintain symmetry we specified the individual low weight cap
value factor at 0.20 = 1/5.0. If the raking converges one can consider attempting to reduce the degree of variability in
the final weights by using lower trimming factors (e.g., IHCV factor = 4.00). If the raking does not converge one can
increase the trimming factors (e.g., IHCV = 6.00).
Table 2: Example of the four user specified cap values for the IGCV method
Global low weight cap value factor: Mean input weight times 0.091
Global high weight cap value factor: Mean input weight times 11.00
Individual low weight cap value (ILCV) factor: Respondent's weight times 0.20
Individual high weight cap value (IHCV) factor: Respondent's weight times 5.00
3
MCV TRIMMING METHOD
The MCV (Margin Cap Value) method takes each margin (control variable) and independently ratio adjusts the input
weights so that the weighted sample totals are in exact agreement with the control totals. This process takes place
before the raking iterations start. For each survey respondent the program then looks across all the raking margins
and determines the minimum value of the ratio-adjusted input weight and the maximum value of the ratio-adjusted
input weight. This then determines the low weight and high weight cap values for each respondent during the raking
iterations. Table 3 gives an example of the user specified low and high weight cap values. The cap values should be
determined by a survey statistician through an examination of the distribution of the input weights and the closeness
of the design-weighted sample to the marginal control totals. In our example we examined distribution of the
maximum control variable ratio-adjusted weights and wanted to limit further increases in sampling variability by only
allowing the weight to be increased by a small relative additional amount. Hence we selected a small multiplier of 2.5
as the individual high weight cap value factor. The same relative factor was used for the minimum control variable
ratio-adjusted weights by specifying 1/2.5 = 0.40.
Table 3: Example of two user specified cap values for the MCV method
Individual high weight cap value factor: Respondent's maximum control variable ratio-adjusted weight times 2.5
Individual low weight cap value factor: Respondent's minimum control variable ratio-adjusted weight times 0.40
The early termination procedure that we developed for the IGCV method has also been applied to the MCV method.
Both weight trimming methods are available in the new SAS raking macro. We have generally used the IGCV
method because it allows for excellent control over extreme weight values through the use of global and individual
controls on the weights. However, in some situations the IGCV method can result in non-convergence of the raking
iterations. For example, if one has a disproportionate stratified sample design in which some strata have been
heavily oversampled and the stratum variable is used as a margin in the raking along with other control variables, the
imposition of global and individual cap values may cause the raking to never get close enough to the control totals for
the stratum variable and possibly other control variables to meet the convergence criterion. When this occurs, the
MCV method can be used. It circumvents this problem by specifying the individual low and high cap values for each
respondent by looking for the control margin that gives the lowest adjustment to the input weight of the respondent
and for the control margin that gives the highest adjustment to the input weight of the respondent. So for example, if
the highest input weight adjustment for a respondent results from their being in a sampling stratum that was heavily
undersampled, then their high weight cap value will be determined by that sampling stratum. This will in turn make it
much more likely that the raking iterations will converge. Our experience is that the MCV method generally trims the
weights of fewer respondents than the IGCV method.
%RAKE_AND_TRIMM SAS MACRO
We combined both trimming procedures, IGCV and MCV, along with the routine IHB raking macro (with no weight
trimming) in one macro %rake_and_trim building it on the “chassis” of IHB raking macro.
MACRO CALL
For routine IHB raking macro:
%rake_and_trimm
(
inds=inputds,
outds=b,
inwt= _wt2new,
freqlist=,
outwt=RAKED_WGT,
varlist= first_margin second_margin third_margin fourth_margin fifth_margin
sixth_margin seventh_margin eighth_margin ninth_margin tenth_margin eleventh_margin,
numvar=11,
cntotal=100,
trmprec=100,
trmpct= 0.025, /* macro will terminate based on this criterion */
numiter=75,
prdiag=N, /** N - condensed diagnostics, Y - full printout **/
namertf=,
/*** name of rtf output ***/
MethTrimm = ); /*** to run routine raking with no trimming MethTrimm must be blank***/
4
For IGCV trimming method:
%rake_and_trimm
(
inds=pa_input,
outds=b,
inwt= _wt2new,
freqlist=,
outwt=RAKED_TRIMMED_WGT,
varlist= first_margin second_margin third_margin fourth_margin fifth_margin
sixth_margin seventh_margin,
numvar=7,
cntotal=100,
trmprec=1,
trmpct= 0.025, /* macro will terminate based on this criterion */
numiter=75,
prdiag=N, /** N - condensed diagnostics, Y - full printout **/
MethTrimm = IGCV,
/*** Method of trimming - IGCV or MCV ***/
A= 5,
B= 0.2,
C = 11.0,
D = 0.091,
INOC=15
/* iteration from which start to check on signs of non-convergence */
);
and for MCV method:
%rake_and_trimm
(
inds=az_input,
outds=b,
inwt= _wt2new,
freqlist=,
outwt=RAKED_TRIMMED_WGT,
varlist= first_margin second_margin third_margin fourth_margin fifth_margin
sixth_margin seventh_margin,
numvar=7,
cntotal=100,
trmprec=1,
trmpct= 0.025, /* macro will terminate based on this criterion */
numiter=75,
prdiag=N, /** N - condensed diagnostics, Y - full printout **/
MethTrimm = MCV ,
MAXTRIMM=2.5,
MINTRIMM=0.40
);
/*** Method of trimming - IGCV or MCV ***/
5
MACRO PARAMETERS
For users who have executed our IHB raking macro most of the macro parameters should look familiar. They are
described in details in our previous presentations (Izrael et al. 2000, Izrael et al. 2004)) Nonetheless, we will mention
all of them here concentrating on those that are specific for the two trimming methods.
Raking macro parameters:
nds
outds
inwt
freqlist
outwt
varlist
numvar
cntotal
trmprec
trmpct
numiter
prdiag
- name of input data set
- name of output data set
- input raking weight being adjusted, if there is no weight, 1 is assigned
- list of data sets with marginal control totals or percents
- name of raked and trimmed weight
- list of raking variables
- number of raking variables
- general control total
- termination criterion based on marginal totals
- termination criterion based on marginal percents
- number of iterations, default is 75
- print detailed diagnostics, default is N
Trimming macro parameters:
a) if IGCV method is executed:
MethTrimm = IGCV,
/****
trimming method ***/
A= 5,
/* factor by which respondent’s weight is multiplied to get IHCV */
B= 0.2,
/* factor by which respondents weight is multiplied to get ILCV */
C = 11.0,
/* factor by which mean input weight is multiplied to get GHCV */
D = 0.091, /* factor by which mean input weight is multiplied to get GLCV */
INOC=15
/* iteration from which to start checking backward on signs of nonconvergence */
As discussed above the user can change the above macro parameters A, B, C, and D.
b) if MCV method is executed:
MethTrimm = MCV ,
MAXTRIMM=2.5, /*factor by which maximum control variable ratio-adjusted weight is
multiplied*/
MINTRIMM=0.40 /* factor by which minimum control variable ratio-adjusted weight is
multiplied */
As discussed above the user can change the above macro parameters MAXTRIMM and MINTRIMM.
c) if routine IHB macro with no trimming is executed:
MethTrimm
= ,
/****
blank – no trimming
6
***/
MACRO OUTPUT
To save time and paper we made the condensed output default. For IGCV method it includes:
a) lines with trimming parameters and input weight.
Sample size of completed interviews: 13231
Raking input weight adjusted to population total: _WT2NEW_ATPT
Mean value of raking input weight adjusted to population total: 732.57
Minimum value of raking input weight: 7.30
Maximum value of raking input weight: 5873.73
Coefficient of variation of raking input weight: 1.17
Global low weight cap value (GLCV): 66.66
Global low weight cap value factor: Mean input weight times 0.091
Global high weight cap value (GHCV): 8058.30
Global high weight cap value factor: Mean input weight times 11.0
Individual low weight cap value (ILCV) factor: Respondent's weight times 0.2
Individual high weight cap value (IHCV) factor: Respondent's weight times 5
Number of respondents who have an individual high weight cap value less than the global low weight cap value
(GLCV used in weight trimming): 15
Number of respondents who have an individual low weight cap value greater than the global high weight cap value
(GHCV used in weight trimming): 0
b) weighted distribution before raking – trimming process for each margin, for example:
Weighted Distribution Prior To Raking. Iteration 0
Third Control Variable
Less than HS
Input
Weight
Sum of
Weights
Target
Total
Sum of
% of
Weights
Input Target % of Difference
Difference Weights
Weights
in %
716771.21 1367335 -650563.52
7.395
14.107
-6.712
HS Grad
3738589.33 3975259 -236669.38
38.571
41.013
-2.442
Some College
2099967.34 2041836
58131.18
21.666
21.066
0.600
College Grad
3137338.12 2308236
829101.72
32.368
23.814
8.554
c) termination message line:
**** Program terminated at iteration 5 because all current percents differ from target percents by less than 0.025 ****
d) weighted distribution after raking – trimming for each margin, for example:
Weighted Distribution After Raking
Third Control Variable
Output
Weight
Sum of
Weights
Sum of
% of
Target Weights Output Target % of Difference
Total Difference Weights
Weights
in %
Less than HS
1367134.89 1367335
-199.84
14.105
14.107
-0.002
HS Grad
3975177.03 3975259
-81.68
41.012
41.013
-0.001
Some College
2041947.55 2041836
111.40
21.067
21.066
0.001
College Grad
2308406.53 2308236
170.13
23.816
23.814
0.002
7
e) some statistics on raking and trimmed weight.
Iteration
Number
Maximum Absolute Value
of Difference in %
Coefficient of Variation of
Weights at the Completion
of the Iteration
1
0.8948
1.55282
2
0.2370
1.56402
3
0.0803
1.56544
4
0.0277
1.56593
5
0.0113
1.56612
Number of Respondents Who Had Their Weights Decreased by the Trimming: 758.
Number of Respondents Who Had Their Weights Increased by the Trimming: 4946.
Raking output weight: RAKED_TRIMMED_WGT
Weight
Mean
Min
Max
CV
_WT2NEW_ATPT
732.57
7.30
5873.73
1.171
RAKED_TRIMMED_WGT
732.57
66.66
8058.30
1.566
For MCV method the output includes:
a) lines with trimming parameters and input weight.
Sample size of completed interviews: 4733
Raking input weight adjusted to population total: _WT2NEW_ATPT
Mean value of raking input weight: 961.56
Minimum value of raking input weight: 11.78
Maximum value of raking input weight: 11093.49
Coefficient of variation of raking input weight: 1.27
Individual high weight cap value factor: Respondent's maximum control variable ratio-adjusted _WT2NEW_ATPT weight
times 2.5
Individual low weight cap value factor: Respondent's minimum control variable ratio-adjusted _WT2NEW_ATPT weight
times 0.40
b) distribution of respondents by control variable which determined minimum and maximum ratio-adjusted input
survey weight.
Control
Variable
1
Number of respondents where the
minimum ratio-adjusted
_WT2NEW_ATPT occurred for the
control variable
1645
Number of respondents where the
maximum ratio-adjusted
_WT2NEW_ATPT occurred for the
control variable
426
2
47
336
3
503
1246
4
265
371
5
69
343
6
230
759
7
965
299
8
c) the same kind of outputs IGCV has (described above).
The full output for both methods (and IHB raking macro) includes also weighted distribution for each margin at
each iteration, for example:
Current
Sum of
Weights
Third Control Variable
Less than HS
Target
Total
Sum of
Weights Current % of Target % of Difference
Difference
Weights
Weights
in %
746380.30 1367334.73 -620954.43
7.700
14.107
-6.406
HS Grad
3627736.28 3975258.72 -347522.43
37.428
41.013
-3.585
Some College
2192220.44 2041836.16
150384.29
22.617
21.066
1.552
College Grad
3126328.97 2308236.40
818092.58
32.255
23.814
8.440
100.00
100.00
9692666.00 9692666.00
and the information on weight trimming at each iteration for each margin, for example (IGCV):
Low Weights Increased
Total
ResponCycle dents
Total
Sum of
Weights
High Weights Decreased
Total
Total
Weight
Weight
Number Sum of
Sum of
Increase
Number Sum of
Sum of
Decrease
of
Weights Weights
for
of
Weights Weights
for
Respon- Before
After
Cases with Respon- Before
After
Cases with
dents Trimming Trimming IHCV GHCV
1
13231 9692666.00
1946 108160.21 129728.32
77.49
373 621609.16 514508.99
0.00
2
13231 9692666.00
2
132.87
133.33
0.00
0
0.00
0.00
0.00
3
13231 9692666.00
0
0.00
0.00
0.00
0
0.00
0.00
0.00
or for MCV:
Number of respondents
Low weight increased
12
High weight decreased
3
APPLICATION AND RESULTS
To illustrate the results of the new raking-trimming macro we set up a raking application with eleven control variables
using a sample of 5,263 respondents from a survey with a relatively low response rate. The mean input weight
equaled 852.7 and the population size equaled 4,487,760. The IGCV and MCV methods both converged using the
cap value factors given in Tables 2 and 3. For comparison, we also executed the original IHB raking macro without
any weight trimming. Table 4 gives the key results of the three rakings. The IGCV method trimmed the weights of 74
more respondents than the MCV method. The ratio of maximum to minimum weight for the three rakings is 1,898.1
when no trimming takes place, and is 1,589.1 for the MCV method. On the other hand, the IGCV method has a ratio
of only 120.2. The coefficients of variation of the weights give a similar result with the IHB and MCV cv’s have
approximately the same value while the cv for IGCV is 10.2 percent lower than the IHB cv. The most important
measure is the design effect due to weighting. This measures the expected increase in sampling variability due to
unequal weights in comparison to a simple random sample of the 5,263 respondents. The smallest design effect
occurs for the IGCV method.
The design effect of the input (design) weights is 1.843. Therefore, the IGCV method has brought the sample into
agreement with the control totals for the eleven control variables, thus reducing the potential for nonresponse bias,
while increasing sampling variability by only 36.1 percent relative to the design-weighted sample.
9
Table 4: Results of the three rakings
Number of
Number of
respondents
respondents
Method
who had their
who had their
weight
weight
decreased
increased
IHB
0
0
macro
MCV
2
7
method
IGCV
30
53
method
Minimum
weight
Maximum
weight
Coefficient of
variation (cv) of
the weights
Design effect
due to weighting
(1+cv2)
20.0
37,961.2
1.368
2.871
20.0
31,782.6
1.339
2.793
78.0
9,376.4
1.228
2.508
We also calculated standard errors for 10 survey outcome variables. The mean standard error for the three methods
is shown in Table 5. The IGCV mean standard error is 7.9 percent lower than the mean standard error when no
weight trimming is used (IHB macro). The mean standard error for the three methods for the same 10 survey
outcome variables by three sample subgroups is also shown in Table 5. The IGCV mean standard error is 6.5
percent lower than the mean standard error when no weight trimming is used.
Table 5: Mean standard error for the three final weights
Method
Mean standard error
IHB macro
MCV method
IGCV method
1.179
1.163
1.086
Mean standard error for three
subgroups
3.510
3.504
3.283
REFERENCES
Deming WE. (1943). Statistical Adjustment of Data. New York: Wiley.
Izrael D, Hoaglin DC, and Battaglia MP. (2000). A SAS Macro for Balancing a Weighted Sample. Proceedings of the
Twenty-Fifth Annual SAS Users Group International Conference, Cary, NC: SAS Institute Inc., pp. 1350-1355.
Izrael D, Hoaglin DC, and Battaglia MP. (2004). To Rake or Not To Rake Is Not the Question Anymore with the
Enhanced Raking Macro. May 2004 SUGI Conference, Montreal, Canada.
Kalton G. (1983). Compensating for Missing Survey Data. Survey Research Center, Institute for Social Research,
University of Michigan.
Deming WE. (1943). Statistical Adjustment of Data. New York: Wiley.
CONTACT INFORMATION
David Izrael
Abt Associates Inc.
55 Wheeler Street
Cambridge, MA 02138
617-349-2434
David_Izrael@abtassoc.com
SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS
Institute Inc. in the USA and other countries. ® indicates USA registration. Other brand and product names are
trademarks of their respective companies.
10
File Type | application/pdf |
File Title | This program calculates raking input weight (_INPWGT) and performs raking with weight trimming for each state |
Author | Abt Associates Inc. |
File Modified | 2012-03-14 |
File Created | 2009-01-15 |