Download:
pdf |
pdf2014 VHA SURVEY OF VETERAN
ENROLLEES’ HEALTH AND RELIANCE
UPON VA
METHODOLOGICAL EXPERIMENTS AND NON-RESPONSE BIAS
ANALYSIS
FINAL REPORT
— Not for Distribution —
Submitted to:
Office of the Assistant Deputy Under Secretary for Health for Policy and Planning (ADUSH/PP)
Prepared by:
126 College Street
Burlington, Vermont 05401
November 24, 2014
TABLE OF CONTENTS
1. Background _________________________________________________________ 1
History of Survey _____________________________________________________ 1
History of Survey of Enrollees Bias Assessments ____________________________ 1
Summary of Methodological Experiments, 20062013 ________________________
Experiments Conducted Prior to Introduction of Mixed-Mode Design ____________
Experiments Conducted Following Introduction of Mixed-Mode Design __________
Overview of Methodological Experiments, 2014 _____________________________
3
3
5
6
2. Sampling and Weighting Design and Bias Evaluation ______________________ 7
Sampling ____________________________________________________________ 7
Sample Stratification and Allocation ____________________________________ 7
Frame Development _________________________________________________ 7
Sampling Process ___________________________________________________ 8
Weighting ___________________________________________________________ 8
Design Weight _____________________________________________________ 8
Non-Response Adjustment ____________________________________________ 9
Post-Stratification Adjustment ________________________________________ 10
Survey Outcomes ____________________________________________________ 11
Bias Assessment _____________________________________________________ 11
3. Experiment 1 – Impact of Survey Mode on Survey Estimates _______________
Design _____________________________________________________________
Results _____________________________________________________________
Health Care Coverage, Health Care Access, and Health Status _______________
Key Driver Questions _______________________________________________
Survey Mode Effects within Strata _____________________________________
Summary of Findings: Mode Effects Experiment _________________________
16
16
17
17
19
20
23
4. Experiment 2 – Impact of Second Survey Mailing on Response Rates following
CATI Non-Working/Non-Response ______________________________________ 24
Design _____________________________________________________________ 24
Results _____________________________________________________________ 25
5. Experiment 3 – Impact of Second Survey Mailing on Response Rates as Part of
Mail Survey Protocol __________________________________________________ 27
Design _____________________________________________________________ 27
Results _____________________________________________________________ 28
6. Non-Response Bias Analysis __________________________________________ 30
1. Long-Term Service and Supports ______________________________________ 30
Respondents vs. Non-Respondents _____________________________________ 30
Web vs. Mail/CATI ________________________________________________
2. Inpatient Treatment _________________________________________________
Respondents vs. Non-Respondents _____________________________________
Web vs. Mail/CATI ________________________________________________
3. Outpatient Treatment _______________________________________________
Respondents vs. Non-Respondents _____________________________________
Web vs. Mail/CATI ________________________________________________
4. VHA Pharmacy Services ____________________________________________
Respondents vs. Non-Respondents _____________________________________
Web vs. Mail/CATI ________________________________________________
31
34
34
35
38
38
38
42
42
42
7. Discussion and Recommendations______________________________________ 45
Summary of Findings _________________________________________________ 45
Recommendations ____________________________________________________ 45
Appendix A – Utilization Measures_______________________________________ 47
Appendix B – Non-Response Propensity Score Quintiles _____________________ 48
LIST OF TABLES
Table 1. Non-Response Adjustment ________________________________________ 10
Table 2. Sampling Process Bias Assessment, Unweighted Estimates ______________ 14
Table 3. Sampling Process Bias Assessment, Design-Weighted Estimates __________ 15
Table 4. Survey Completes by Default Survey Mode and Response Channel ________ 17
Table 5. Comparison of Selected Coverage, Access, and Health Status Proportions by
Default Survey Mode, Weighted (w3) ______________________________________ 18
Table 6. Comparison of Selected Key Driver Means by Default Survey Mode, Weighted
(w3) _________________________________________________________________ 20
Table 7. SSM-P Experiment Follow-Up Mail Protocols: Long (Treatment) vs. Short
(Control) _____________________________________________________________ 24
Table 8. SSM-P Treatment Group Sizes by CATI Non-Response/Non-working Status 24
Table 9. Sampled Records and Survey Responses by Population, SSM-P Condition and
Response Channel ______________________________________________________ 25
Table 10. SSM-M Experiment Mail Protocols: Long (Treatment) vs. Short (Control) _ 27
Table 11. Sampled Records and Survey Responses by SSM-M Condition and Response
Channel ______________________________________________________________ 27
Table 12. Percentage of Enrollees Receiving Institutional Long-Term Care, by Stratum 32
Table 13. Percentage of Enrollees Receiving Non-Institutional Long-Term Care, by
Stratum ______________________________________________________________ 33
Table 14. Percentage of Enrollees Receiving Inpatient Treatment for MHSA, by Stratum
_____________________________________________________________________ 36
Table 15. Percentage of Enrollees Receiving Inpatient Treatment for neither Mental
Health nor Substance Abuse, by Stratum ____________________________________ 37
Table 16. Percentage of Enrollees Receiving Outpatient Treatment for MHSA, by
Stratum ______________________________________________________________ 39
Table 17. Percentage of Enrollees Receiving Outpatient Treatment for neither Mental
Health nor Substance Abuse, by Stratum ____________________________________ 41
Table 18. Percentage of Enrollees Receiving Prescription Drug Services by Stratum _ 43
Table 19. Distribution of Non-Response Propensity Score Model Categorical Predictors
for Combined-Sample Respondents by Propensity Score Quintiles________________ 48
Table 20. Distribution of Non-Response Propensity Score Model Continuous Predictors
for Combined-Sample Respondents by Propensity Score Quintiles________________ 49
LIST OF FIGURES
Figure 1. Total Survey Error Analysis of the 2014 Survey of Enrollees _____________ 2
Figure 2. Assignment of Sample to the 2014 Methodological Experiments __________ 6
Figure 3. Distribution of Estimated Bias for 7 Utilization Percentages across 31 Domains,
Unweighted (w0) and Design-Weighted (w1) ________________________________ 12
Figure 4. Distribution of Mode Effects for 19 Coverage, Access, and Health Status
Estimates across 14 Domains, Weighted (w3) ________________________________ 21
Figure 5. Distribution of Responses by Channel between SSM-M Experiment Conditions
_____________________________________________________________________ 28
Figure 6. Percentage of Enrollees Receiving Institutional Long-Term Care _________ 31
Figure 7. Percentage of Enrollees Receiving Non-Institutional Long-Term Care _____ 31
Figure 8. Percentage of Enrollees Receiving Inpatient Treatment for MHSA ________ 35
Figure 9. Percentage of Enrollees Receiving Inpatient Treatment for neither Mental
Health nor Substance Abuse ______________________________________________ 35
Figure 10. Percentage of Enrollees Receiving Outpatient Treatment for MHSA _____ 39
Figure 11. Percentage of Enrollees Receiving Outpatient Treatment for neither Mental
Health nor Substance Abuse ______________________________________________ 39
Figure 12. Percentage of Enrollees Receiving Prescription Drug Services __________ 42
1. BACKGROUND
History of Survey
The Department of Veterans Affairs (VA) administers the country’s largest, most comprehensive
integrated health care system. More than 8 million Veterans are enrolled in the VA system and seek
services ranging from specialty care to social support services to wellness maintenance. VA’s authority to
provide this care is regulated in part by the Veteran’s Health Care Eligibility Reform Act of 1996 (Public
Law 104-262). This law implements a priority-based enrollment system for Veterans and gives the
Veterans Health Administration (VHA) the ability to plan to meet the needs of enrolled Veterans.
Changing demographics, availability of other health care coverage, economic changes, and rising health
care costs can all impact a Veteran’s decision to turn to VHA for care. Understanding factors that impact
Veterans’ choice is critical to VA’s continuous preparation and ability to meet Veterans’ expectations.
The Survey of Enrollees was developed with core and supplemental groups of survey questions to gather
a variety of information used to determine the relationship between utilization patterns and the
demographic and socioeconomic characteristics of Veteran enrollees.
Survey of Enrollees data are used to develop health care budgets and to assist VA with its annual
enrollment decisions. These data also inform the VA Enrollee Health Care Projection Model (EHCPM).
Forecasts developed from this model are used for a number of purposes, such as budgeting, as well as
scenario-based policy and planning analyses.
VHA has conducted twelve cycles of the Survey of Enrollees (1999, 2000, 2002, 2003, 2005, 2007, 2008,
2010, 2011, 2012, 2013, and 2014). The 2014 survey methodology can be summarized as an Englishonly, 15- to 20-minute survey available via Computer-Assisted Telephone Interviewing (CATI), selfadministered Paper and Pencil Interviewing (PAPI), or Computer-Assisted Web Interviewing (CAWI)
format, using a stratified sampling design to obtain 42,000 interviews.
ICF International, Inc. (ICF) has provided technical and data collection services to VHA in support of the
Survey of Enrollees since 2005. This methodology report pertains to the 2014 data collection period from
February 15 through June 30, 2014.
History of Survey of Enrollees Bias Assessments
Any information collection from the general public and conducted or sponsored by a Federal agency
requires periodic Office of Management and Budget (OMB) clearance. As part of the Fiscal Year (FY)
2006 OMB clearance package, VHA was tasked with conducting a non-response bias assessment as well
as examining sampling frame quality. A non-response bias assessment investigates the extent to which
survey non-respondents differ from respondents in ways that may affect survey outcomes, while the
examination of sample frame quality assesses the extent to which the sampling frame adequately covers,
or includes all members in, the target population. In 2006, VHA and ICF met with OMB to discuss the
non-response analysis and agreed to develop methods to improve the survey program. OMB granted
clearance to VHA but required that VHA improve the design, starting with the 2007 survey. Since then,
the Survey of Enrollees has:
Added a pre-survey notification letter sent from the Under Secretary for Health. The letter
describes the survey’s purpose, explains that ICF is conducting the study on VHA’s behalf, and
provides a number to call with questions or to complete the survey;
Methodological Experiments and Non-response Bias Analysis
Page 1
For Veterans with missing phone numbers, added a customized letter with an inbound phone
number to call to complete the survey;
Experimented with reverse phone number look-up based on address information;
Increased the maximum number of call attempts from six to seven; and
Improved the weighting methodology by using a propensity score adjustment based on
demographics and health care utilization administrative records, as well as a post-stratification
adjustment to match a consistent set of demographic control totals.
Discussion of survey bias can be organized in terms of the Total Survey Error (TSE) framework (see
Figure 1).1 The TSE framework divides survey error into two major sources: errors of representation,
which are due to the systematic and random errors that influence which members of the population
respond to the survey; and errors of observation, which are due to the systematic and random errors that
influence the accuracy with which survey constructs are measured. Random error is reduced through the
use of large sample sizes, such as those used in the Survey of Enrollees. On the other hand, systematic
error, which is also referred to as bias, is a consistent deviation from the “true” score for a survey
outcome and is not mitigated by large sample sizes. This report focuses specifically on bias in the Survey
of Enrollees, both with respect to errors of observation and errors of representation.
Figure 1. Total Survey Error Analysis of the 2014 Survey of Enrollees
Biases in representation can arise from three major sources:
Coverage error, due to systematic differences between enrollees included in vs. excluded from the
sampling frame;
Sampling error, due to a non-random selection mechanism or unadjusted disproportionate
sampling; and
Non-response error, due to respondents systematically differing from non-respondents with
regard to survey outcomes.
1
Groves Robert, Fowler Floyd, Couper Mick, Singer Eleanor, Tourangeau Roger. Survey Methodology. New York: Wiley; 2004.
Methodological Experiments and Non-response Bias Analysis
Page 2
Beginning in 2012, VHA introduced a mail mode to extend coverage to enrollees without a phone number
or with a non-working number, as well as a Web survey as an alternative to mail or telephone modes. The
inclusion of enrollees without a valid or working phone number in the sampling frame addressed the
undercoverage of these enrollees that existed prior to 2012.2 Beginning in 2013, the Methodological
Experiments Report also evaluated the possibility of bias due to sampling error to verify that the random
selection mechanism and subsequent design weights used to adjust for disproportionate sampling are
operating as expected (see Section 2. Sampling and Weighting Design and Bias Evaluation). Finally, bias
due to enrollee non-response continues to be evaluated by comparing responding and non-responding
enrollees using available frame variables (see Non-Response Bias Analysis).
Biases in observation are generally due to systematic measurement error, which can arise from a variety
of sources, such as question wording or item order. Since 2012, the most important potential source of
systematic measurement error in the Survey of Enrollees has been the use of multiple survey modes.
Although the introduction of multiple modes was needed to extend coverage to a large segment of the
enrollee population, doing so necessarily introduced the possibility of mode effects. Mode effects occur
when responses to survey items in one mode systematically differ from responses in other modes. Posthoc analyses (in 2012) and a methodological experiment (in 2013) have been conducted to test for mode
effects by comparing responses in the mail and CATI modes. Although some statistically significant
differences have been observed, the magnitude of these differences is generally quite small. The
methodological experiment was conducted again in 2014, using random assignment to survey modes as in
2013 (see Section 3. Experiment 1 – Impact of Survey Mode on Survey Estimates).
This 2014 report addresses sources of potential bias in both representation and observation in the 2014
Survey of Enrollees (see Figure 1). Following the organization of the report established in the 2013:
Section 2 of this report evaluates the sampling and weighting processes to verify that they are
unbiased.
Sections 3, 4, and 5 report the results of the methodological experiments conducted as part of the
2014 Survey of Enrollees, including an experiment to evaluate measurement error introduced by
the use of multiple survey modes and two experiments designed to reduce enrollee non-response.
Section 6 evaluates the potential for non-response bias.
Summary of Methodological Experiments, 20062013
Since 2006, ICF has conducted a bias assessment and has evaluated the results of methodological
experiments designed to reduce bias.
Experiments Conducted Prior to Introduction of Mixed-Mode Design
In 2006, ICF used the 2005 data to examine the survey process and potential biases resulting from
missing or outdated contact information as well as survey non-response—including both the inability to
make contact and the effects of respondent refusals. The report, submitted to OMB, included several
recommendations to improve the research design.
One of the resulting recommendations was a propensity score weighting adjustment. This weighting
adjustment, also used in 2007 and 2008, corrects for differential non-response by health care utilization
and demographic information. To determine the adjustment, ICF:
2
A small possibility of coverage error remains due to the frame exclusion criteria VHA applies when extracting the sampling
frame from the enrollee database. Specifically, enrollees lacking a valid mailing address, living outside the U.S. or Puerto Rico,
or missing one of the stratification variables are currently excluded from the sampling frame.
Methodological Experiments and Non-response Bias Analysis
Page 3
Used a probability model (described below) to estimate an enrollee’s individual propensity (or
probability) of being in the respondent sample;
Grouped enrollees into five equal-sized classes (or quintiles) with similar probabilities; and
Weighted the respondents up to account for the non-respondents, using an independent
adjustment for each class.
The assumption is that non-respondents would have given similar responses to the survey as the actual
respondents within the quintile in which they are grouped. The accuracy of this assumption depends on
the fit of the statistical model used to create these quintiles. The propensity score weighting adjustment
then reduces potential bias to the extent that non-respondents and respondents with similar response
probabilities are also similar with respect to the survey statistics of interest.
The 2007 Survey of Enrollees included several methodological experiments to gauge the impact of design
enhancements. These experiments included sending pre-survey notification letters to potential
respondents signed by the Under Secretary for Health and extending the maximum number of call
attempts from six to 10.
The results of these experiments are documented in the 2007 report, Supplementary Analysis and
Technical Assistance for the 2007 Annual Survey of Veteran Enrollees Health and Reliance on VA. The
response rate among the experimental treatment group (pre-survey notification letter and 10 call attempts)
more than doubled that of the control group (no pre-survey notification letters and six call attempts), at
43.3 percent vs. 21.4 percent, respectively. Based on the evidence, ICF recommended that VHA adopt
both of these design enhancements for the 2008 Survey of Enrollees. VHA approved sending pre-survey
notification letters and increasing the maximum call attempts to seven (concern for increased respondent
burden prevented an increase to 10).
Also during the 2007 Survey of Enrollees, enrollees were sampled only from a frame of enrollees with
telephone numbers. Enrollees without telephone numbers had no chance of selection—introducing a
potential source of coverage bias. The 2007 survey was therefore susceptible to two major forms of bias
affecting representation: coverage of enrollees with no chance of selection, and non-response bias among
enrollees who did not respond. For that reason, two separate propensity score adjustments were
developed: one for frame coverage and another for non-response.
In 2008, VHA approved a methodological experiment to improve sample frame coverage: utilizing
reverse telephone look-up directories that used respondent addresses to obtain valid telephone numbers
from a sample of 62,516 enrollees. This new process resulted in 59,426 potential respondents (95 percent
coverage of the test sample), and this group yielded 12,765 completed surveys.
Since the 2008 Survey of Enrollees, the survey sample has been selected from a frame of enrollees with
and without telephone numbers. Since the sample has been selected from this complete frame, coverage
bias has not been a concern. However, non-response due to a variety of sources, including invalid contact
information, has remained an issue. Some of these sources have been addressed through the addition of a
mail survey and a Web response channel; however, some sources of potential non-response bias remain.
Therefore, a single propensity score adjustment has been used to provide a general mechanism for
mitigating bias due to non-specific non-response.
The 2010 Survey of Enrollees followed a methodology similar to the 2008 survey—including a reverse
phone number look-up from a sample of 62,515 enrollees. Again, the results indicated that the address
matching improved contact information quality, resulting in 61,376 potential respondents (98 percent
coverage of the test sample). This experimental group yielded 16,851 completed surveys.
Methodological Experiments and Non-response Bias Analysis
Page 4
For 2011, the plan for the Survey of Enrollees also included reverse telephone look-ups. Unfortunately,
this service was not implemented because the address-matching vendor was not able to comply with the
project’s security requirements. However, the 2011 survey did include a tailored pre-survey notification
letter sent to enrollees with a known address but unknown telephone number, as listed in the database.
This letter asked the enrollee to call ICF to participate in the survey. This test yielded 244 interviews from
15,339 total enrollees without phone numbers. While relatively few, these respondents represent Veterans
who would not otherwise have been included in the results.
Experiments Conducted Following Introduction of Mixed-Mode Design
For 2012, two new survey modes were added to the existing telephone mode. The Survey of Enrollees
had been conducted strictly as a telephone interview since its inception in 1999. Enrollees with invalid
telephone numbers (e.g., missing or incorrect area code) or without a telephone were not included, and
this was a source of potential coverage bias. In 2012, VHA addressed this undercoverage by developing
an experimental mail survey that was sent to all enrollees without a valid telephone number. The mail
survey allowed respondents to complete the survey via paper-and-pencil; the mailed materials also
provided contact information if the Veteran wished to call ICF to complete a telephone interview and a
link to a Web survey option. In addition, ICF conducted a follow-up mailing for phone non-respondents.
Respondents in all modes also could request a mail survey at any point.
In addition to adding a mail survey, VHA offered an experimental Web option for the first time. Thirteen
percent of enrollees used the Web option instead of returning a mail survey or participating in a telephone
interview. Due to the cost savings on interviewer labor generated by the Web option, ICF recommended
that VHA continue offering this mode.
The experimental mail survey improved the response rate and reduced bias. Counting responses via all
four response channels (i.e., Web, mail, inbound CATI, and outbound CATI), the addition of a mail
component (mail survey, allowing mail requests, and mail follow-up) added 10,056 interviews.
While ICF recommended that the mail mode continue to be offered, a limitation was noted in the 2012
experimental design; specifically, the confounding of survey mode with sample type meant that
differences in survey responses between the survey modes could also be explained by pre-existing
differences between the populations choosing to respond in each mode. ICF therefore recommended a
randomized methodological experiment testing survey mode effects, which was conducted in 2013. This
experiment tested for survey mode effects by randomly assigning a subset of eligible sampled enrollees to
receive either the mail or CATI survey as their default mode of survey administration (i.e., the mode in
which enrollees would complete the survey unless they explicitly opted to complete in a different mode).
Results indicated that although survey mode (mail vs. CATI) does have a significant effect on some
survey responses, the magnitude of this effect is generally quite small. ICF thus recommended continuing
to administer the Survey of Enrollees in multiple modes, given the substantial increase in coverage this
design affords.
A second methodological experiment was conducted in 2013 to test the effect of a second survey mailing
(SSM) on response rates as part of the mail follow-up protocol for non-responding enrollees and nonworking phone records. The results of this experiment indicated that, among enrollees who do not
respond to the CATI survey and enrollees with non-working numbers, a second survey mailing as part of
a mail follow-up protocol significantly improves response rates (by approximately seven percentage
points).
Methodological Experiments and Non-response Bias Analysis
Page 5
Overview of Methodological Experiments, 2014
In 2014, ICF conducted three methodological experiments to investigate bias due to survey mode and
enrollee non-response.
The first experiment, replicating a design first used in 2013, tested for survey mode effects by randomly
assigning a subset of eligible sampled enrollees to receive either the mail or CATI survey as their default
mode of survey administration (i.e., the mode in which enrollees would complete the survey unless they
explicitly opted to complete in a different mode). This experiment has the potential to reveal systematic
differences in survey responses due to survey mode (specifically, mail vs. CATI modes).
The second and third experiments tested the effects of “short” and “long” mail protocols on response rates
among different subpopulations. In general, the “short” mail protocol involved only one survey packet
mailing, whereas the “long” mail protocol involved two complete survey packet mailings.
The second experiment, also replicating a design first used in 2013, tested the effect of a second survey
mailing (Second Survey Mailing/Follow-up to Phone/CATI Protocol, hereafter referred to as SSM-P) on
response rates as part of the mail follow-up protocol used for CATI non-respondents and non-working
phone records.
The third experiment tested the effect of a second survey mailing (Second Survey Mailing/Follow-up to
Mail Protocol, hereafter referred to as SSM-M) on response rates as part of the mail survey protocol used
for enrollees with only a valid mailing address. These latter two experiments continue the OMB-required
research to improve response rates and to minimize non-response bias. Figure 2 illustrates how sample
was assigned to all three experiments conducted in 2014.
Figure 2. Assignment of Sample to the 2014 Methodological Experiments
Methodological Experiments and Non-response Bias Analysis
Page 6
2. SAMPLING AND WEIGHTING DESIGN AND BIAS
EVALUATION
This section briefly presents the sampling protocol and corresponding weighting plan of the 2014 Survey
of Enrollees (a detailed description of the methodology can be found in VHA Survey of Veteran Enrollees’
Health and Reliance Upon VA Methodology Report 2014). Afterward, the bias component of the total
mean squared error that can be attributed to the sampling and weighting processes is evaluated.
Sampling
Sample Stratification and Allocation
The 2014 sampling design modifies a basic framework designed to support estimates by Veterans
Integrated Service Network (VISN)3 (21 levels) and priority group4 (eight levels) with additional
stratification and oversampling by gender. In addition to this “Main” sample, an independent
“Supplemental” sample was drawn of enrollees identifying as Hispanic/Latino. These modifications to the
2013 sampling design were made to increase data utility for these two emerging Veteran populations (i.e.,
female Veterans and Hispanic/Latino Veterans).
For the Main sample, each of the 21 VISNs was allocated 1,875 interviews as follows:
1. First, minimum sample sizes were allocated to each priority group:
50 for Priority Group 7;
150 for Priority Groups 4 and 6;
250 for Priority Groups 1, 2, 3, and 5; and
400 for Priority Group 8.
2. Second, 125 interviews were proportionally allocated to the largest priority groups within the
VISN.
Within each of the 168 VISN × priority group strata, women were oversampled by allocating sample at
twice the proportion of men. For example, if 10% of the stratum are women, the sample allocation would
be 2×10% / (2×10% + 90%) = 18% women and 92% men.
The Supplemental sample (Hispanic/Latino) was allocated 2,625 interviews. The sample was stratified
by VISN and sample was allocated in proportion to the number of Hispanics flagged on the frame. The
sample selection was simple random sample drawn from each VISN’s population of Hispanic/Latino
enrollees.
Frame Development
VHA provided a random stratified sample of 418,832 records from its enrollee database as follows:
VHA extracted the entire universe of enrollees who were listed as of September 30, 2013; this list
included Veterans enrolled in VA health care and living in both institutionalized and noninstitutionalized settings.
VHA then eliminated all records meeting one or more of the following criteria:
3
VISN is the geographic health care administration region to which each Veteran is assigned.
Priority group is the patient priority group to which a Veteran was assigned at enrollment. Priority groups help VA provide
health care services relative to annual funding.
4
Methodological Experiments and Non-response Bias Analysis
Page 7
o Lacking a valid address;
o Not living in the U.S. or Puerto Rico; or
o Missing one of the stratification variables listed in next bullet.
Remaining was a final file of 8,486,965 enrollees to be stratified by VISN, priority group, and
gender, from which the 2014 Main sample was drawn. The 2014 Supplemental sample was also
drawn from this file after filtering to include only enrollees positively identifying as
Hispanic/Latino.
Sampling Process
ICF then randomly selected a subsample of these records to meet the target sample sizes in each stratum.
ICF released records into the study as needed, using a random selection algorithm. To do so, ICF
monitored the number of completed interviews during fielding. ICF then compared the estimated sample
yield (that is, the number of completed interviews predicted from the sample at a given point in the study)
to the target number required by the sampling plan. To match actual to planned performance, enrollee
records were drawn and released into the study for calling/mailing randomly from the final, stratified set
of records provided by VHA.
A total of 140,698 enrollees were sampled to meet these sample size requirements in all strata of the Main
sample, and a total of 11,456 enrollees were sampled to meet these sample size requirements in the
Supplemental sample (with an overlap of 471 enrollees).
Following data collection, ICF evaluated the Main and Supplemental samples to determine whether or not
they should be combined into a single analytic dataset. Because the sample size increase gained by
combining the two samples outweighed the loss of precision due to increased weighting variance, it was
decided to combine the two samples (a more detailed description of this analysis can be found in VHA
Survey of Veteran Enrollees’ Health and Reliance Upon VA Methodology Report 2014). The evaluation
of the weighting process will therefore focus only on the combined sample.
Weighting
The analysis weight is a product of three components:
1. A design weight that adjusts for differential selection probabilities across sampling strata and
accounts for the increased probability of selection of Hispanic/Latino enrollees in the combined
sample;
2. A non-response adjustment that compensates for differential response patterns across enrollee
subgroups; and
3. A post-stratification adjustment that aligns weighted totals with population control totals along a
set of key demographic dimensions.
Design Weight
The design weight adjusts for differential selection probabilities and accounts for overlap created by
combining the Main and Supplemental samples. The Main sample was selected from the complete survey
frame independently in each of the strata, which had been defined by VISN, priority, and gender. The
Hispanic sample was selected from the filtered survey frame as a simple random sample (i.e., where the
frame serves as the single stratum). The probability of selection for enrollees in the Main and
Supplemental samples in the th stratum is then calculated equivalently as
, where:
= the probability of selection for each enrollee in the
= the number of enrollees sampled in the th stratum
= the total number of enrollees in the th stratum
Methodological Experiments and Non-response Bias Analysis
th
stratum
Page 8
The inverse of these selection probabilities is the design weight,
sampled enrollees in both the Main and Supplemental samples.
, which is calculated for all
In the combined sample, Hispanic/Latino enrollees received a probability of selection in both the Main
and Supplemental samples. The selection probability of the combined-sample design weight was
computed to account for this. Specifically, if
represents the probability of drawing the th enrollee
from the Supplemental (Hispanic/Latino) frame, and
represents the probability of drawing the same
th enrollee from the Main sample frame, then the correct selection probability for the th enrollee in the
combined sample is given by
, and the combined-sample design weight is
taken as
. This is the delivered design weight that was used as the basis for the following nonresponse and post-stratification adjustments.
Non-Response Adjustment
To calculate the non-response adjustment, each sampled enrollee was classified into a non-response
category (y) based on whether the attempted interview was complete or incomplete:
0 if interview is an incomplete interview
y
1 if interview is a complete interview
Using logistic regression, ICF estimated the probability that an enrollee completed the interview given his
or her characteristics:
e xβ
Pr( y 1 | x)
, where x is a matrix of sampled enrollees and each enrollee has a set of p
1 e xβ
covariates, xi (1, x1i ,...x pi ) for enrollee i. This set of covariates was used as explanatory (or predictor)
variables, and β ( 0 , 1 ,..., p ) was a set of regression coefficients, or parameters.
The predictor variables included:
The sample design variables (VISN, priority status, gender, and Hispanic/Latino);
Design variables previously used for sample stratification (OEF/OIF/OND status, and enrollee
type: Pre- vs. Post-enrollees);
Seven administrative health measures (listed below);
Demographic variables (age, urban/rural address);
Telephone number status (valid, not valid); and
A flag identifying whether multiple enrollees use the same telephone number.
VHA provided a file based on administrative records; the file indicated whether an enrollee had utilized
any of the following VHA services in the previous year (the file did not indicate the frequency of use or
amount paid for any of these benefits):
1. Received long-term care benefits,
a. Institutional
b. Non-institutional
2. Inpatient treatment,
a. Mental health or substance abuse
b. Non-mental health and non-substance abuse
3. Outpatient treatment,
a. Mental health or substance abuse
b. Non-mental health and non-substance abuse
4. VHA pharmacy services.
Methodological Experiments and Non-response Bias Analysis
Page 9
The utilization indicators have been used for weighting since the 2007 survey. From 2007–2010, the
indicators were sourced from VHA workload files based on bed section and clinic stop. This
categorization indicates where a Veteran received care. For the 2011 and 2012 survey, the indicators were
based on service utilization from Health Service Categories (HSCs), indicating what care a Veteran
received. A second change was to include institutional and non-institutional long-term care indicators as
compared to 2007–2010, when a single measure of home health service was used.
The outcome of the model is the propensity score, the estimated probability that the enrollee is in the final
sample of respondents given their characteristics (as defined by the list of predictor variables above).
After estimating each sampled enrollee’s probability of completing an interview based on the predictor
variables, respondents and non-respondents were grouped into quintiles based on their propensity score.
Within each quintile, respondents were ratio-adjusted to account for non-respondents. The first quintile
represents the enrollees with the lowest propensity scores; this means that these enrollees are less likely to
be in the final sample—thus, they receive the largest weights. The last quintile represents the enrollees
with the highest propensity scores; this means that these enrollees are more likely to be in the final sample
of respondents—thus, they receive the smallest weights. See Appendix B – Non-Response Propensity
Score Quintiles for distributions of propensity score predictors for respondents by propensity score
quintiles.
Table 1. Non-Response Adjustment
Percentile
Response
Non-Response
Non-Response Adjustment (NR)
0 – <20th
20th – <40th
40th – <60th
60th – <80th
80th – <100th
211,860
393,666
506,344
671,644
804,079
1,485,526
1,302,485
1,192,120
1,025,906
893,335
8.01
4.31
3.35
2.53
2.11
To calculate the non-response adjusted weights, each respondent’s design weight
the adjustment factor
from the quintile where he or she fell:
.
was multiplied by
Post-Stratification Adjustment
Because the 2014 sample design departed from the design used in previous years, a post-stratification
adjustment was included as part of the weighting to promote comparability. The primary motivation for
the post-stratification adjustment is to ensure that the distribution of the weighted sample matches the
distribution of the enrollee population across a stable set of dimensions, such as age and gender. Because
these post-stratification dimensions are independent of the dimensions used to define sampling strata in a
given year, the post-stratification adjustment facilitates flexibility in the sampling design while preserving
comparability across years.
Unlike previous years, the 2014 sample stratification did not include OEF/OIF/OND status and pre/postenrollee status. Including these dimensions in the post-stratification adjustment restores comparability to
previous years.
Finally, as the enrollee age distribution is related to both of these sets of variables, as well as to reliance
measures, age was included in the post-stratification. Enrollee age was categorized into seven levels:
under 35; 35-44; 45-54; 55-64; 65-74; 75-84; and 85+.
Methodological Experiments and Non-response Bias Analysis
Page 10
The dimensions used for post-stratification in 2014 were as follows:
Age x gender (14 levels),
Hispanic/Latino status (two levels),
Priority x VISN (168 levels),
OEF/OIF/OND status (two levels), and
Pre/Post-enrollee status (two levels).
The post-stratification adjustment was implemented via a raking, or iterative proportional fitting,
algorithm. During each iteration, the non-response-adjusted weight
was ratio-adjusted to match
population totals along each of the above post-stratification dimensions in turn. This iterative process
continues until the weighted totals match population totals along all dimensions within a specified
tolerance (in this case, by less than 1.00). For the 2014 combined sample, convergence was achieved after
15 iterations, indicating a stable adjustment. The post-stratification adjustment increased the coefficient of
variation of the weights (a measure of the weighting variability) from 0.78 to 0.83, indicating that only a
small increase in variance was required to achieve this bias reduction. The post-stratified weight
was
delivered with the weighted data and should be used as the analytic weight when generating population
estimates.
Survey Outcomes
Of the 418,832 records supplied by VHA, 151,683 were released into the study, resulting in 42,324
completed interviews. For the CATI treatment, 36,393 interviews were obtained with an American
Association for Public Opinion Research (AAPOR) response rate (RR1) of 34 percent.5 For the mail
treatment, 5,931 interviews were obtained for an AAPOR response rate of 40 percent.
Bias Assessment
The Survey of Enrollees differs from most population-based surveys in that a considerable amount of
information about the population under study is available. Specifically, seven measures of health care
utilization, along with basic demographics, are present on the sampling frame, or “Universe File,” for all
enrollees. This allows us to compute the total mean squared error (MSE) and its components—bias and
variance—for estimates of service utilization rates under different sampling and weighting schemes.
Using a resampling methodology, 400 Main and Supplemental replicate samples were drawn using the
current stratification and allocation scheme. Specifically, to simulate the 2014 sampling design, each
replicate involved drawing an independent Main and Supplemental sample and then combining the two
samples using the combined design weight ( ) described above. As non-response was not simulated, the
non-response weight ( ) and post-stratified weight ( ) were not computed. For each sample replicate,
each of the seven service utilization percentages ( ) were computed. For each service, averaging the
estimated utilization percentage across the
replicates approximates the expected value ( ) of the
utilization measure produced by the sampling process:
where is the number of sample replicates and is the utilization measure from the th sample replicate
for a given service. Since the true value ( ) for each utilization measure can be computed from the
5
Documentation for these response rates is available at
http://www.aapor.org/AM/Template.cfm?Section=Standard_Definitions2&Template=/CM/ContentDisplay.cfm&ContentID=3156
Methodological Experiments and Non-response Bias Analysis
Page 11
sampling frame delivered by VHA, the bias in the estimate of each utilization measure can be estimated
as the difference between the true value and the estimate produced by the resampling procedure:
Bias estimates were computed for both unweighted and design-weighted data for the seven utilization
measures, overall and by stratification variable categories (i.e., VISN, priority group, and gender, yielding
31 separate domains). While simple random samples were drawn within each cell defined by the crossing
of all stratification variables, the disproportionate allocation means that within an overall category of a
stratification variable (e.g., Priority Group 1), the sampling process did not yield a simple random sample.
Because disproportionate stratified samples are not design-unbiased, some bias in the unweighted
estimates is therefore expected. This expectation is confirmed in Table 2, which displays both the
unweighted estimated percentage (
) and the estimated bias (
) for each of the seven utilization
measures. Negative values for bias indicate that the sample design underestimates the true value, whereas
positive values indicate that the sample design overestimates the true value.
The unweighted sampling bias ranged from -3.56 percentage points to 5.48 percentage points across all
seven measures and 31 domains, with a median of 0.20 and an interquartile range of 0.70. Fifty-nine of
the 223 total domain bias estimates exceeded 1.00 percentage points. Figure 3 displays the overall
distribution of the unweighted bias estimates in red.
Figure 3. Distribution of Estimated Bias for 7 Utilization Percentages across 31 Domains, Unweighted
(w0) and Design-Weighted (w1)
Note that the larger biases were not evenly scattered across subgroups and measures; this indicates that
there are correlations between utilization rates and the characteristics used to define sampling strata, and
indicates the need for weighting to reduce this bias in representation.
The design weight ( ), computed as the inverse of combined-sample selection probabilities,
compensates for the disproportionate sample allocation. The design-weighted sampling bias, distributed
across estimates and stratification variables as depicted in blue in Figure 3, is negligible. The bias ranged
Methodological Experiments and Non-response Bias Analysis
Page 12
from -0.07 percentage points to 0.05 percentage points, with a median of 0.002 percentage points and an
interquartile range of 0.02 percentage points. None of the design-weighted utilization measures in any
stratification domains produced an expected bias above 1.00 percentage points. Table 3 provides the
design-weighted estimated percentage (
) and the estimated bias (
) for each of the seven
utilization measures.
Overall, then, the sampling process, with proper weighting, is exhibiting minimal bias and is performing
as expected.
The next sections examine the potential for bias due to other components of the survey process,
particularly bias due to mode effects and to non-response.
Methodological Experiments and Non-response Bias Analysis
Page 13
Table 2. Sampling Process Bias Assessment, Unweighted Estimates
Stratum
Overall
VISN
VISN
VISN
VISN
VISN
VISN
VISN
VISN
VISN
VISN
VISN
VISN
VISN
VISN
VISN
VISN
VISN
VISN
VISN
VISN
VISN
Priority Group
Priority Group
Priority Group
Priority Group
Priority Group
Priority Group
Priority Group
Priority Group
Gender
Gender
Level
1
1
2
3
4
5
6
7
8
9
10
11
12
15
16
17
18
19
20
21
22
23
1
2
3
4
5
6
7
8
F
M
Inpatient
Mental Health
and Substance
Abuse
Pct.
Bias
Estimate
1.45
0.31
1.89
0.47
1.53
0.34
1.55
0.44
1.48
0.31
1.45
0.24
1.31
0.25
1.10
0.01
1.68
0.48
1.69
0.41
1.44
0.19
1.31
0.26
2.07
0.70
1.70
0.42
1.38
0.21
1.37
0.17
1.38
0.33
1.37
0.30
1.26
0.17
1.10
0.22
1.18
0.23
1.30
0.31
2.38
0.26
0.93
0.04
0.79
0.03
6.66
0.23
1.41
0.03
0.30
0.01
0.59
0.04
0.17
0.01
1.36
0.02
1.47
0.35
Inpatient NonMental Health
and Substance
Abuse
Pct.
Bias
Estimate
4.87
0.41
4.08
0.43
5.54
0.86
4.07
0.85
4.16
0.55
4.25
0.39
4.35
0.20
3.95
-0.11
6.21
0.79
5.70
0.22
5.31
0.32
4.17
0.37
6.39
1.15
5.40
0.46
4.73
0.04
4.12
0.18
5.59
0.64
4.64
0.42
4.56
0.21
5.54
0.54
4.75
0.27
4.70
0.56
7.66
0.03
3.46
0.03
2.99
-0.03
16.58
0.20
5.77
-0.06
1.05
-0.03
3.98
0.07
1.49
0.02
3.94
-0.15
5.08
0.59
Institutional
Long-term Care
NonInstitutional
Long-term Care
Outpatient Mental
Health and
Substance Abuse
Pct.
Estimate
0.64
0.67
0.67
0.61
0.81
0.87
0.44
0.27
0.43
0.44
0.77
0.59
1.10
0.65
0.41
0.42
0.76
0.73
0.53
0.84
0.61
0.92
1.34
0.25
0.23
3.34
0.42
0.04
0.33
0.09
0.35
0.70
Pct.
Estimate
4.30
4.04
5.17
5.58
4.31
4.82
4.39
2.90
5.44
4.13
6.14
4.96
4.83
3.82
3.68
3.17
4.04
4.48
2.90
3.92
2.72
4.79
7.06
2.84
2.42
17.56
3.80
0.71
3.43
1.48
3.57
4.47
Pct.
Estimate
17.37
17.42
15.34
17.31
16.83
14.72
16.48
16.83
22.54
17.43
19.51
16.00
18.86
16.80
17.71
17.06
17.83
16.61
15.42
17.33
19.03
14.45
39.58
18.52
12.27
30.55
17.32
8.96
10.60
4.65
21.23
16.51
Bias
0.12
0.05
0.16
0.12
0.17
0.17
0.05
-0.02
0.05
0.05
0.12
0.09
0.27
0.10
0.01
0.00
0.13
0.13
0.04
0.14
0.06
0.15
-0.02
0.00
-0.01
0.23
0.00
0.00
-0.04
0.00
0.05
0.16
Bias
0.70
0.73
1.05
1.06
0.89
0.90
0.55
0.14
1.14
0.38
0.81
0.56
1.15
0.51
0.10
0.30
0.74
0.70
0.30
0.54
0.22
0.69
0.32
0.12
0.04
0.67
0.12
0.00
-0.05
0.00
0.21
0.84
Bias
1.03
1.36
1.86
3.38
2.31
-0.01
0.08
-1.19
4.49
0.13
1.37
1.01
3.12
0.79
-0.16
-0.34
1.34
1.04
-0.07
0.70
1.70
1.28
3.05
1.36
0.99
0.72
1.18
0.80
1.12
0.43
-2.20
0.72
Methodological Experiments and Non-response Bias Analysis
Outpatient NonMental Health
and Substance
Abuse
Pct.
Bias
Estimate
61.53
-0.67
62.71
-0.51
60.27
1.64
53.31
4.04
62.79
0.99
49.23
-2.11
60.47
-1.45
59.44
-2.36
73.49
4.95
63.77
-1.22
64.32
0.19
63.85
0.03
66.02
1.54
63.69
-0.89
62.27
-1.52
59.32
-1.03
64.41
1.92
60.98
0.05
60.19
-1.35
61.14
0.33
57.68
1.48
66.18
-1.52
83.41
0.71
65.64
0.02
57.65
-1.11
77.89
-0.31
62.52
0.81
41.46
-1.65
75.16
-0.39
47.59
-1.85
56.66
-3.13
62.62
0.23
Prescription Drug
Pct.
Estimate
53.35
52.58
51.56
45.71
53.42
41.05
53.81
52.61
64.64
56.32
56.33
55.92
58.52
56.13
55.57
52.40
56.16
51.83
51.78
51.91
48.66
56.79
76.69
54.57
46.05
73.77
56.92
30.27
62.39
40.37
48.76
54.37
Bias
-0.86
-0.54
1.27
3.50
0.79
-1.93
-1.54
-2.49
5.48
-1.46
0.05
-0.17
1.26
-1.08
-1.99
-1.17
1.66
0.20
-1.37
0.05
1.16
-1.00
0.79
0.25
-0.71
-0.40
0.45
-1.42
0.22
-1.88
-3.56
0.01
Page 14
Table 3. Sampling Process Bias Assessment, Design-Weighted Estimates
Stratum
Overall
VISN
VISN
VISN
VISN
VISN
VISN
VISN
VISN
VISN
VISN
VISN
VISN
VISN
VISN
VISN
VISN
VISN
VISN
VISN
VISN
VISN
Priority Group
Priority Group
Priority Group
Priority Group
Priority Group
Priority Group
Priority Group
Priority Group
Gender
Gender
Level
1
1
2
3
4
5
6
7
8
9
10
11
12
15
16
17
18
19
20
21
22
23
1
2
3
4
5
6
7
8
F
M
Inpatient
Mental Health
and Substance
Abuse
Pct.
Bias
Estimate
1.14
0.00
1.44
0.01
1.19
0.00
1.11
0.00
1.16
-0.01
1.20
-0.01
1.06
0.00
1.09
0.00
1.20
0.00
1.29
0.01
1.24
-0.01
1.04
0.00
1.36
0.00
1.29
0.02
1.18
0.01
1.19
-0.01
1.05
0.00
1.07
0.01
1.09
0.00
0.87
-0.01
0.96
0.01
0.99
0.00
2.11
0.00
0.89
0.00
0.76
0.00
6.46
0.03
1.38
0.00
0.29
0.00
0.55
0.00
0.16
0.00
1.35
0.00
1.13
0.00
Inpatient NonMental Health
and Substance
Abuse
Pct.
Bias
Estimate
4.46
0.00
3.66
0.01
4.67
-0.01
3.23
0.01
3.61
0.01
3.86
0.00
4.16
0.01
4.05
-0.01
5.41
-0.01
5.46
-0.02
4.99
0.00
3.79
-0.01
5.23
-0.01
4.93
0.00
4.70
0.01
3.93
-0.01
4.94
-0.01
4.21
-0.01
4.36
0.00
5.01
0.00
4.48
0.00
4.13
-0.01
7.63
-0.01
3.44
0.01
3.01
-0.01
16.39
0.01
5.82
-0.01
1.08
0.00
3.92
0.01
1.48
0.00
4.09
0.00
4.48
0.00
Institutional
Long-term Care
NonInstitutional
Long-term Care
Outpatient Mental
Health and
Substance Abuse
Pct.
Estimate
0.52
0.63
0.51
0.49
0.63
0.71
0.40
0.28
0.38
0.39
0.65
0.50
0.83
0.55
0.40
0.42
0.63
0.60
0.48
0.70
0.55
0.76
1.36
0.26
0.24
3.12
0.41
0.04
0.36
0.09
0.30
0.54
Pct.
Estimate
3.61
3.32
4.10
4.52
3.42
3.91
3.85
2.76
4.31
3.77
5.32
4.39
3.68
3.28
3.56
2.86
3.34
3.79
2.63
3.37
2.51
4.09
6.74
2.72
2.38
16.89
3.68
0.71
3.45
1.48
3.36
3.63
Pct.
Estimate
16.34
16.04
13.51
13.92
14.47
14.73
16.40
18.05
18.02
17.27
18.14
15.00
15.76
16.03
17.85
17.39
16.48
15.60
15.46
16.61
17.35
13.17
36.52
17.17
11.26
29.85
16.13
8.15
9.41
4.23
23.40
15.79
Bias
0.00
0.00
0.01
0.00
0.00
0.00
0.00
-0.01
0.00
0.00
0.00
0.00
-0.01
0.00
0.00
0.00
0.01
0.00
-0.01
0.00
0.00
0.00
0.00
0.00
0.00
0.01
0.00
0.00
0.00
0.00
0.00
0.00
Bias
0.00
0.01
-0.02
0.00
-0.01
-0.01
0.00
-0.01
0.01
0.02
0.00
0.00
0.00
-0.02
-0.02
-0.01
0.03
0.00
0.02
-0.01
0.00
0.00
0.01
0.01
0.00
0.00
0.00
0.00
-0.03
0.00
0.00
0.00
Bias
-0.01
-0.02
0.04
0.00
-0.04
0.00
0.00
0.03
-0.03
-0.02
0.00
0.01
0.02
0.01
-0.01
-0.01
-0.01
0.03
-0.04
-0.02
0.02
0.00
-0.01
0.00
-0.03
0.02
0.00
-0.01
-0.07
0.01
-0.03
0.00
Methodological Experiments and Non-response Bias Analysis
Outpatient NonMental Health
and Substance
Abuse
Pct.
Bias
Estimate
62.20
-0.01
63.20
-0.01
58.59
-0.04
49.25
-0.02
61.83
0.03
51.32
-0.02
61.89
-0.03
61.81
0.02
68.51
-0.03
65.03
0.03
64.11
-0.02
63.83
0.01
64.50
0.03
64.58
0.00
63.76
-0.03
60.32
-0.03
62.54
0.05
60.93
0.00
61.50
-0.03
60.80
-0.02
56.19
0.00
67.69
-0.01
82.68
-0.01
65.64
0.02
58.75
-0.01
78.17
-0.03
61.70
0.00
43.06
-0.06
75.59
0.04
49.44
0.00
59.75
-0.03
62.39
0.00
Prescription Drug
Pct.
Estimate
54.20
53.11
50.24
42.20
52.63
42.93
55.33
55.15
59.12
57.82
56.26
56.10
57.31
57.18
57.54
53.53
54.54
51.59
53.08
51.81
47.52
57.81
75.89
54.34
46.72
74.17
56.48
31.65
62.17
42.24
52.32
54.35
Bias
-0.01
-0.01
-0.04
-0.01
0.01
-0.05
-0.01
0.04
-0.04
0.05
-0.01
0.01
0.04
-0.03
-0.02
-0.03
0.05
-0.03
-0.07
-0.05
0.01
0.02
-0.01
0.02
-0.03
0.00
0.02
-0.03
0.00
-0.01
0.00
-0.01
Page 15
3. EXPERIMENT 1 – IMPACT OF SURVEY MODE ON
SURVEY ESTIMATES
In 2013, a Mode Effects (ME) experiment tested for survey mode effects by randomly assigning enrollees
for whom both a phone number and mailing address were available to one of two modes (CATI vs. mail)
as the default mode of survey administration. By holding population characteristics constant in this way,
any potential effects of mode on survey outcomes (including response rates and survey estimates) can be
identified. Although survey mode was found to have a significant effect on some survey responses, the
magnitude of these effects was generally quite small. To support trending and further increase confidence
that survey mode effects are minimal, ICF recommended replicating this experiment in 2014.
Design
Given similar parameters in 2014, the power analyses conducted in 2013 were again used to determine
the sample sizes required to achieve sufficient power for detecting two-way mode × stratum interactions
at a 95 percent confidence level.6 The resulting recommendation was that 7,948 records be assigned to
each of the mail and CATI protocols. Ultimately, 8,000 records were assigned to receive the mail
protocol.
Because the treatment of records explicitly assigned to the CATI protocol (n = 8,000) is functionally
identical to the treatment of records eligible for the experiment (i.e., having valid mail and phone contact
information) but not explicitly included in it, the size of the sample assigned to receive the CATI protocol
was effectively 8,000 + 376,067 = 384,067.7 Power analyses based on expected response rates showed
that these sample sizes would be sufficient to detect two-tailed differences in proportions between the
experimental treatment groups of at least three percentage points with greater than 90 percent power, and
to detect mean differences of at least 0.71 units with 80 percent power. The actual number of eligible
completed surveys received from the two treatment groups (mail n = 2,808, CATI n = 25,000) resulted in
over 80 percent power to detect two-tailed differences in proportions of at least three percentage points
and 80 percent power to detect two-tailed mean differences of at least 0.78 units.8
It is important to note that this experiment manipulated the default mode of survey administration rather
than mode of survey completion. That is, enrollees in the ME experiment received one of two versions of
the pre-survey notification letter: one version indicated that the enrollee would soon receive a paper copy
of the survey in the mail, and the other version indicated that the enrollee would soon receive a call to
complete the survey over the phone.
The pre-survey notification letters were identical in all other respects and included a URL to complete the
Web survey online, as well as a phone number that the enrollee could call to complete the survey over the
phone at their convenience or to request a mailed survey (as applicable). Consequently, a sampled
6
A response rate of 39 percent (observed in the 2012 Survey of Enrollees for both phone and mail) was assumed when allocating
sample for this experiment; in 2013, a response rate of 40 percent was observed for mail and a response rate of 32 percent was
observed for phone.
7 The two treatment groups were not equal in size due to cost considerations. Of all enrollees sampled for this survey, 26,765 had
a valid mailing address but not a valid phone number, making them ineligible for inclusion in this experiment; all other sampled
enrollees had both types of contact information.
8 The actual power of a test depends on the specific proportions being tested; proportions lower or higher than 50 percent will
have less variance, and tests will therefore be more powerful than the worst-case scenario described here. In addition, power will
also be affected by item-missing data; the numbers reported here assume no missing data. In addition, only enrollees who
completed the survey in the same mode as that which they were assigned were eligible for analysis.
Methodological Experiments and Non-response Bias Analysis
Page 16
enrollee assigned to either treatment still had the choice to respond in any of the three modes offered in
2014. (The final section of this report assesses differences in response patterns due to mode of
completion.)
This design choice was made to increase the response rate at the cost of complete experimental control
over mode of survey response. The result is that self-selection into response mode, or “response channel,”
presents a threat to the randomization of the experimental design: Enrollees ultimately chose the mode of
completion they preferred, regardless of the mode to which they were nominally assigned. This threat to
experimental control is a limitation of the current design and, as in 2013, was considered acceptable to
prevent the experiment from negatively impacting overall response rates.
The majority of respondents in the ME experiment, however, completed the survey in the default mode.
Specifically, 82 percent of responding enrollees assigned to the mail mode completed a mail survey and
69 percent of respondents assigned to the CATI mode completed a CATI survey. Notably, the latter figure
is lower than the 2013 rate (75%), indicating that in 2014, enrollees assigned to the CATI protocol
became more likely to choose alternative modes for responding (i.e., mail or Web). In particular,
compared to 2013, the use of the mail mode by this group rose from 12 percent to 16 percent, while the
use of the Web mode rose from 13 percent to 15 percent.
Table 4 shows counts of completed surveys in the ME experiment broken out by default mode (i.e., the
mode randomly assigned) and response channel (i.e., the mode ultimately used for response). Note that
for the purposes of the experimental analyses, enrollees were grouped into treatment (mail) vs. control
(CATI) conditions based on whether they completed the survey in their assigned default mode (i.e.,
regardless of the response channel by which they ultimately arrived there). These groups are shaded in
Table 4. “Outbound CATI” refers to enrollees who were called by an interviewer, whereas “inbound
CATI” refers to enrollees who called in to complete an interview. Note that comparisons between Web
mode respondents and CATI/mail respondents are discussed later in this report.
Table 4. Survey Completes by Default Survey Mode and Response Channel
Response Channel
CATI*
Mail
Web
Outbound CATI
Inbound CATI
Mail (Default)
Mail Request
Default Mode
Mail (Treatment)
389
58
2,808
N/A
186
CATI (Control)
23,402
1,568
N/A
5,916
5,507
Note: Shaded cells indicate groups included in the ME experimental analyses, due to completing the survey in the
randomly assigned mode. An additional 2,490 responding enrollees not shown in this table did not have a valid phone number,
making them ineligible for inclusion in this experiment.
* The completed interviews included in the Default Mode: CATI, Response Channel: CATI groups also include
completes from enrollees who requested a mail survey, did not return it, and then completed a CATI non-response
follow-up survey.
Results
Health Care Coverage, Health Care Access, and Health Status
Table 5 compares the effect of default survey mode (mail vs. CATI) on selected population estimates of
coverage, access, and health status. Estimates were weighted using the non-response-adjusted and poststratified analytic weight (W3 on the data file). For each measure, the significance level (p) for the RaoScott chi-square test is reported for the comparison of mail vs. CATI estimates. Significant differences (p
< .05) are flagged with an asterisk, and those that replicate significant effects from the 2013 ME
experiment are indicated by “Rep2013”.
Methodological Experiments and Non-response Bias Analysis
Page 17
Assuming minimal effects of self-selection into mode, these results suggest that survey mode does have
some influence on how enrollees respond to survey items and/or some association with who chooses to
respond. These mode effects were not dramatic, however, with nearly all effects creating a difference of
less than five percentage points.
Of the 19 outcomes tested, 11 showed statistically significant mode effects in the 2014 survey. Nine of
these significant effects replicated significant effects from the 2013 ME experiment. Of these replicated
effects, the maximum difference between mail and CATI estimates was 7.50 percentage points, with a
mean absolute difference of 3.91 percentage points. The following summary of findings focuses on the
statistically significant mode effects that replicated this year, as these effects have the strongest evidence
of being systematic.
The mail survey produced a higher estimate of the proportion of enrollees covered by Medicare, but a
lower estimate of the proportion of enrollees covered by Medicaid for some health care. The mail survey
produced a higher estimate of enrollees who use VA services to meet “none” of their health care needs,
whereas the CATI survey produced a higher estimate of enrollees who use VA services to meet “most” of
their health care needs.
The CATI survey (compared to the mail survey) produced higher estimates of enrollees being in
“excellent” or “poor” health, but lower estimates of enrollees being in “good” health. This pattern may
suggest that the CATI survey promotes more use of the extreme ends of response scales compared to the
mail survey, which is consistent with previous findings in mixed-mode research).9
Only one mode effect with regard to employment status replicated; the CATI survey produced a higher
estimate of unemployed enrollees (“unemployed, looking for work, or laid off”) compared to the mail
survey. This effect might be explained by the greater ease with which phone contacts are made with
unemployed individuals.
Table 5. Comparison of Selected Coverage, Access, and Health Status Proportions by Default Survey
Mode, Weighted (w3)
Survey Item
Medicare coverage
Response
1- Yes
Medicaid coverage for
some health care
1- Yes
Coverage by another
individual or group health
plan
Use VA services to meet...
1- Yes
1- All of my health care needs
2- Most of my health care
needs
3- Some of my health care
needs
Overall (%)
Mail (%)
CATI (%)
p
51.9
58.6
51.1
(51.1, 52.6)
(56.2, 61)
(50.3, 51.9)
<.001*
Rep2013
8.3
7.1
8.5
(7.9, 8.7)
(5.9, 8.2)
(8.1, 8.9)
27
27.7
26.9
(26.3, 27.7)
(25.5, 29.8)
(26.2, 27.7)
33.5
33.1
33.6
(32.8, 34.2)
(30.9, 35.3)
(32.8, 34.3)
17.1
14.6
17.4
(16.6, 17.7)
(13, 16.1)
(16.8, 18)
27.1
27
27.1
(26.4, 27.8)
(25, 28.9)
(26.4, 27.8)
.034*
Rep2013
.532
.712
.002*
Rep2013
.900
continued on next page
9
Dillman, D., Smyth, J., & Christian, L.M. (2009). Internet, Mail and Mixed-Mode Surveys: The Tailored Design Method. 3rd
edition. Hoboken, NJ: Wiley
Methodological Experiments and Non-response Bias Analysis
Page 18
continued from previous page
Survey Item
Response
4- None of my health care
needs
5- I have no health care needs
Self-reported general
health
1- Excellent
2- Very good
3- Good
4- Fair
5- Poor
Employment status
1- Employed full-time
2- Self-employed full-time
3- Employed part-time
4- Self-employed part-time
5- Unemployed, looking for
work, or laid off
Overall (%)
Mail (%)
CATI (%)
p
17.2
21.8
16.7
(16.6, 17.8)
(19.7, 23.8)
(16.1, 17.4)
<.001*
Rep2013
5
3.6
5.2
(4.6, 5.4)
(2.4, 4.7)
(4.8, 5.6)
10.9
9.1
11.1
(10.4, 11.4)
(7.6, 10.6)
(10.6, 11.6)
23.7
24.4
23.6
(23, 24.3)
(22.3, 26.4)
(22.9, 24.3)
30.9
36.6
30.2
(30.2, 31.6)
(34.4, 38.9)
(29.5, 31)
23.1
21.4
23.3
(22.5, 23.8)
(19.6, 23.3)
(22.7, 24)
11.4
8.5
11.7
(10.9, 11.9)
(7.2, 9.9)
(11.2, 12.3)
23.3
19.8
23.7
(22.6, 24)
(17.7, 21.9)
(22.9, 24.5)
2.9
3.2
2.8
(2.6, 3.1)
(2.2, 4.2)
(2.5, 3.1)
5.4
5.9
5.4
(5, 5.8)
(4.8, 7)
(5, 5.8)
2.5
3.1
2.4
(2.2, 2.7)
(2.2, 4)
(2.1, 2.7)
7.2
4.6
7.5
(6.7, 7.6)
(3.4, 5.8)
(7, 7.9)
.030*
.021*
Rep2013
.462
<.001*
Rep2013
.059
<.001*
Rep2013
.001*
.478
.364
.132
<.001*
Rep2013
6- Currently not employed:
58.8
63.5
58.2
<.001*
Either retired, a homemaker,
(58, 59.5)
(61, 65.9)
(57.4, 59.1)
student, etc.
Note: 95 percent confidence intervals for estimated proportions are given in parentheses.
Note: Rao-Scott chi-square tests of association were used to compare proportions for each response between ME treatment
groups (mail vs. CATI).
*A p-value less than .05 indicates a statistically significant association between survey mode and the enrollee characteristic
indicated by that response. “Rep2013” indicates a replicated finding from the 2013 ME experiment.
Key Driver Questions
For key driver questions, the respondents were read a series of statements and then asked if they: 1)
completely agreed, 2) agreed, 3) neither agreed nor disagreed, 4) disagreed, or 5) completely disagreed.
Mean responses to these items are presented in Table 6. Lower values (minimum = 1.00) indicate stronger
agreement with the statement, whereas higher values (maximum = 5.00) indicate stronger disagreement
with the statement.
As above, this summary will focus on significant effects in 2013 that were replicated in the 2014 ME
experiment. Five of the six effects tested were replicated, although as in 2013, the magnitude of these
differences was not dramatic; of the significant effects in Table 6, the mean absolute difference between
Methodological Experiments and Non-response Bias Analysis
Page 19
estimates by mode was 0.17 points on the five-point rating scale, with a maximum difference of 0.21
points.
Replicating the 2013 findings, the mail survey produced more positive opinions about VA than the CATI
survey, with the ease of getting to a local VA facility showing the largest difference. The exception to this
pattern was that the CATI survey produced higher estimates of how well enrollees understand how their
VA health benefits work. Social desirability may explain this difference, as enrollees may be more
concerned about appearing competent when being interviewed.
Table 6. Comparison of Selected Key Driver Means by Default Survey Mode, Weighted (w3)
Survey Item
d11c: VA offers Veterans like me the best value for our
health care dollar
d12b: Veterans like me who use VA are satisfied with the
health care they receive
d13b: Veterans like me can get in and out of an
appointment at VA in a reasonable time
d14d: I understand how my VA health benefits works
d15f: It is easy to get to my local VA facility
d16c: I would only use VA if I did not have access to any
other source of health care
Overall
Mail
CATI
p
2.16
2.01
2.18
(2.15, 2.18)
(1.97, 2.06)
(2.17, 2.2)
<.0001*
Rep2013
2.26
2.14
2.28
(2.25, 2.28)
(2.09, 2.19)
(2.26, 2.29)
2.36
2.24
2.37
(2.34, 2.38)
(2.19, 2.29)
(2.36, 2.39)
2.42
2.59
2.40
(2.40, 2.44)
(2.54, 2.65)
(2.38, 2.42)
2.22
2.03
2.24
(2.20, 2.23)
(1.98, 2.07)
(2.22, 2.26)
2.91
2.97
2.90
(2.89, 2.93)
(2.90, 3.03)
(2.89, 2.92)
<.0001*
Rep2013
<.0001*
Rep2013
<.0001*
Rep2013
<.0001*
Rep2013
<.0001*
Note: 95 percent confidence intervals for estimated means are given in parentheses.
Note: Independent-samples t-tests were used to compare means for each response between ME treatment groups (mail vs.
CATI).
*A p-value less than .05 indicates a statistically significant association between survey mode and the enrollee characteristic
indicated by that response. “Rep2013” indicates a replicated finding from the 2013 ME experiment.
Survey Mode Effects within Strata
To explore the effects of survey mode on responses in more detail, the two ME treatment groups (mail vs.
CATI) were compared within sampling strata (i.e., VISN, priority group, gender, and Hispanic identity).
To simplify analyses and conserve statistical power, the 21 VISNs were collapsed into four groups
according to VA area office boundaries (i.e., East, Central, South, and West).10 The eight priority groups
were collapsed into two levels (1-4 = high priority, 5-8 = low priority).
These analyses, which are equivalent to the decomposition of mode × stratum interactions, highlight
outcomes where significant mode effects are observed in one level of a stratum (e.g., gender = Female)
but not in another level of that stratum (e.g., gender = Male). Significant mode effects that are consistent
across stratum levels are not discussed, since these are equivalent to main effects and are reflected in the
discussion of the overall estimates above. Furthermore, only effects that replicated findings from the 2013
ME experiment are discussed, as these effects have the strongest evidence of being systematic. The
outcome variables analyzed in this section are the same as shown in Table 5 (coverage, access, and health
status proportions) and Table 6 (key driver questions).
10
See http://www2.va.gov/directory/guide/division_flsh.asp?dnum=3
Methodological Experiments and Non-response Bias Analysis
Page 20
To provide a visual summary of the magnitude of the mode effects observed across domains in 2014,
Figure 4 displays the distribution of mode effects (i.e., the estimated outcome percentage from the mail
survey minus the estimated outcome percentage from the CATI survey) for the 19 weighted coverage,
access, and health status estimates across 14 domains (four VISN regions, high vs. low priority groups,
gender, Hispanic identity, OEF/OIF/OND status, and pre- vs. post-enrollee status). The mode effects
ranged from -7.80 percentage points to 12.50 percentage points across all measures and domains, with a
median effect of -0.30 and an interquartile range of 4.08. Forty-six of the 266 estimated mode effects
exceeded ±5.00 percentage points.
Figure 4. Distribution of Mode Effects for 19 Coverage, Access, and Health Status Estimates across 14
Domains, Weighted (w3)
VISN Groups
As in 2013, some variability in mode effects was observed across the four geographic regions (East,
Central, South, and West). The following list summarizes the replicated regional mode effects (mail vs.
CATI) for the 19 measures of health care coverage, access, and health status:
In the East region, the mail survey produced a higher estimate of enrollees having part-time
employment.
In the Central region, the mail survey produced a higher estimate of Medicare coverage; a higher
estimate of enrollees who use VA services to meet “none” of their health care needs; and a lower
estimate of enrollees with “poor” health.
In the South region, the mail survey produced a lower estimate of “poor” general health.
In the West region, the mail survey produced a higher estimate of Medicare coverage and a
higher estimate of enrollees who use VA services to meet “none” of their health care needs.
None of the mode effects on key driver questions reported above varied by region.
Methodological Experiments and Non-response Bias Analysis
Page 21
Priority Groups
Two survey mode × priority group (high vs. low) effects were replicated. The mail survey (compared to
the CATI survey) produced a lower estimate of Medicaid coverage among low-priority enrollees, whereas
there was no mode effect for high-priority enrollees. The mail survey also produced a lower estimate of
enrollees using VA services for “most” of their health care needs among low-priority enrollees, whereas
there was no mode effect for high-priority enrollees. None of the mode effects on key driver questions
reported above differed between priority groups.
Gender
Because there are many more men than women in the responding sample, statistical tests of mode effects
among the male respondents have much greater power than tests among the female respondents. This
difference in power leads to a higher probability of achieving statistical significance among the former
subgroup even if a mode effect of the same magnitude exists in both populations. To focus on the more
robust interactions between survey mode and enrollee gender, only mode effects where: a) significance
was achieved in only one subgroup, and b) the absolute difference in the magnitude of the mode effects
between subgroups was greater than or equal to five percentage points are discussed here.
Looking first at the measures of health care coverage, access, and health status, the mail survey
(compared to the CATI survey) produced a higher estimate of Medicare coverage among men, whereas
there was no significant mode effect for Medicare coverage among women. The mail survey also
produced a higher estimate of “good” general health among men, whereas there was no significant mode
effect for this outcome among women. Finally, the mail survey produced a lower estimate of full-time
employment among men, whereas there was no significant difference for this outcome among women in
the CATI mode.
None of the mode effects on key driver questions reported above varied by enrollee gender.
Hispanic/Latino Ethnicity
Due to a small proportion of the responding sample identifying as Hispanic/Latino, comparisons of mode
effects between enrollees identifying as Hispanic/Latino vs. not raises the same issue of asymmetric
statistical power noted with regard to enrollee gender. The same criteria used to identify the more robust
interactions between survey mode and Hispanic/Latino ethnicity will be applied here.
Looking first at the measures of health care coverage, access, and health status, the mail survey
(compared to the CATI survey) produced a higher estimate of Hispanic/Latino enrollees who use VA
services to meet “some” of their health care needs, whereas there was no significant difference for this
outcome among non-Hispanic/Latino enrollees. In addition, the mail survey produced a higher estimate of
non-Hispanic/Latino enrollees who use VA services to meet “none” of their health care needs, whereas
there was no significant difference for this outcome among Hispanic/Latino enrollees.
One difference in mode effects on the key driver questions was observed: Among Hispanic/Latino
enrollees, the CATI survey (compared to the mail survey) created stronger agreement with the statement
“I would only use VA if I did not have access to any other sources of health care,” whereas there was no
significant difference for this outcome among non-Hispanic/Latino enrollees.
Methodological Experiments and Non-response Bias Analysis
Page 22
Summary of Findings: Mode Effects Experiment
In 2012, the first year with a mail mode, an analysis of mode effects comparing enrollees responding via
CATI vs. mail indicated some differences between groups. However, because enrollees were not
randomly assigned to response channels, potential mode effects were confounded with pre-existing
differences between the populations of enrollees who preferred to respond by CATI vs. mail. To
disentangle mode effects from population differences, ICF recommended conducting a methodological
experiment to randomly assign enrollees to survey modes.
Two randomized mode effects experiments have now been conducted as part of the 2013 and 2014
surveys, and the findings have been consistent: Although there are some significant differences between
survey modes on key survey outcomes, the magnitude of these differences is generally small. Moreover,
only 14 of the 25 overall outcomes tested (collapsing across strata) produced mode effects that replicated
across years. This suggests that the mode effects observed in any given year are often not systematic.
The magnitudes of the effects that did replicate were acceptably small and do not present a substantive
threat of bias to survey estimates. With regard to measures of health care coverage, access, and health
status, the mean absolute difference between mail and CATI estimates was 3.91 percentage points. With
regard to the key driver questions, the mean absolute difference between mail and CATI estimates was
0.17 points on the five-point rating scale.
Replicated overall mode effects indicated small differences between survey modes in estimates of
enrollees covered by Medicare and Medicaid, as well as differences in estimates of general health. In
addition, the mail survey appears to generate slightly more positive opinions of VA services compared to
the CATI survey.
At the level of individual sampling strata, there were few replicated survey mode × stratum effects among
the many that were tested. Although some effects were consistent across years, these findings were
scattered across domains and outcomes, giving no indication that one mode is biasing responses in a
particular direction.
Without having access to “true” values for the measures evaluated here, it is impossible to know if the
mail or CATI mode (or both) is introducing measurement bias when mode effects are detected. Thus, we
have no reason to assume that one mode is more accurate than the other. The current evidence justifies the
recommendation to continue encouraging response in all modes, as the mixed-mode design provides a
substantial reduction in undercoverage without substantially increasing measurement error due to mode
effects.
Methodological Experiments and Non-response Bias Analysis
Page 23
4. EXPERIMENT 2 – IMPACT OF SECOND SURVEY
MAILING ON RESPONSE RATES FOLLOWING
CATI NON-WORKING/NON-RESPONSE
The Second Survey Mailing/Follow-Up to Phone/CATI Protocol (SSM-P) experiment tested the effect on
response rates of mailing one vs. two surveys as part of the CATI survey non-response/non-working
number follow-up protocols. The follow-up protocols being compared in this experiment are shown in
Table 7. The key difference is that in the long protocol, a second complete survey is mailed following
non-response to the first follow-up mail survey. This experiment is a replication of the “Second Survey
Mailing” experiment reported in the 2013 Methodological Experiments Report.
Table 7. SSM-P Experiment Follow-Up Mail Protocols: Long (Treatment) vs. Short (Control)
Long Protocol (Treatment)
1.
2.
3.
4.
Pre-survey Notification Letter
1st Survey Packet Mailing
Reminder Postcard
2nd Survey Packet Mailing
Short Protocol (Control)
1.
2.
3.
Pre-survey Notification Letter
1st Survey Packet Mailing
Reminder Postcard
Design
Enrollees became eligible for this experiment when their phone numbers were determined to be nonworking or they were determined to be non-respondents to the CATI survey. Table 8 shows how these
records were randomly assigned to the SSM-P treatment conditions. Only a subsample of telephone nonrespondents from the first wave of the sample release was entered into this experiment to receive any mail
follow-up. The entire first-wave sample of dialed phone records determined to be non-working received a
mail follow-up protocol (either via “explicit” assignment to the short vs. long protocols following the
power analyses described below, or via “implicit” assignment to the short protocol for the balance of the
first-wave non-working sample).
Table 8. SSM-P Treatment Group Sizes by CATI Non-Response/Non-working Status
Status
SSM-P Condition
Long Protocol (Treatment)
Short Protocol (Control)
Non-Response
1,750
1,750
Non-working
(Explicit Assignment)
2,000
1,200
Non-working
(Implicit Assignment)
N/A
8,763
Total
3,750
11,713
Sample sizes were determined by the decision to conduct an exact replication of the 2013 experiment.
Power analyses show that for the non-working records, these sample sizes are sufficient to detect a onesided difference in response rates (assuming a 15.4 percent response rate to the short protocol, as
observed in the 2013 experiment) of at least three percentage points with over 80 percent power.
Similarly, for the non-response records, these sample sizes are sufficient to detect a one-sided difference
in response rates of at least three percentage points with nearly 80 percent power. Combining nonresponse and non-working records yields over 80 percent power for detecting a one-sided difference in
response rates of at least two percentage points.
It is important to note that enrollees who entered into this experiment were allowed to complete the
survey in any of the three available modes (CATI, mail, or Web). As shown in Table 9, the majority of
enrollees entered into the SSM-P experiment and who ultimately responded did so using the mail mode,
although a small number in each treatment group also completed surveys in other two modes. For the
Methodological Experiments and Non-response Bias Analysis
Page 24
purposes of analysis, all responses are counted toward the total response rate for each SSM-P condition
regardless of the response channel, since the outcome of interest in this experiment is overall response
rate improvement due to changes in follow-up protocol.
Table 9. Sampled Records and Survey Responses by Population, SSM-P Condition and Response
Channel
Sample
Population: Non-working Phone Records
SSM-P Condition
Long Protocol (Treatment)
2,000
Short Protocol (Control)
9,963
Population: CATI Non-Respondents
SSM-P Condition
Long Protocol (Treatment)
1,750
Short Protocol (Control)
1,750
Overall (Combined Populations)
SSM-P Condition
Long Protocol (Treatment)
3,750
Short Protocol (Control)
11,713
*Difference is significant, p < .05.
† Inbound CATI
Responses by Response Channel
RR1
RR1
Change
CATI†
3
20
Mail
526
1,841
Web
7
35
Total
536
1,896
26.8%
19.0%
+7.8 pts*
CATI†
1
1
Mail
459
385
Web
2
2
Total
462
388
26.4%
22.2%
+4.2 pts*
CATI†
4
21
Mail
985
2,226
Web
9
37
Total
998
2,284
26.6%
19.5%
+7.1 pts*
Total response rates (i.e., combining CATI, mail, and Web completes) for the SSM-P experiment were
computed following AAPOR standards, specifically formula AAPOR RR1, which divides the number of
completed interviews by the total number of attempted interviews.11 The random assignment of records to
experimental groups ensures that the expected distribution of outcome dispositions between groups is
balanced, so that any differences in the number of completed interviews can be attributed to the
experimental treatment (i.e., the second survey mailing).
Results
The results of the 2014 SSM-P experiment replicated the findings of the 2013 experiment in all key
respects. Specifically, the long mail protocol used to follow up with phone non-working records and
phone non-respondents significantly increased total response rates compared to the short protocol in both
populations, as well as the overall (combined) CATI follow-up population.
As shown in Table 9, when looking at the population of non-working phone records, the total response
rate was significantly12 higher in the long protocol (536/2,000 = 26.8 percent) compared to the short
protocol (1,896/9,963 = 19.0 percent), leading to a response rate improvement in this population of 7.8
percentage points. In 2013, a response rate improvement of 6.8 percentage points was observed due to the
use of the long protocol in this population.
When looking at the population of phone non-response records, the total response rate was significantly13
higher in the long protocol (462/1,750 = 26.4 percent) compared to the short protocol (388/1,750 = 22.2
percent), leading to a response rate improvement in this population of 4.2 percentage points. In 2013, a
11
AAPOR RR1 is equivalent to the simplified, lower-bound RR3 computation used to analyze the 2013 version of this
experiment, so response rates are directly comparable across replications. Documentation for response rate calculations is
available at
http://www.aapor.org/AM/Template.cfm?Section=Standard_Definitions2&Template=/CM/ContentDisplay.cfm&ContentID=3156
12 Rao-Scott χ2(1) = 62.44, p < .0001
13 Rao-Scott χ2(1) = 8.74, p < .01
Methodological Experiments and Non-response Bias Analysis
Page 25
response rate improvement of 7.2 percentage points was observed due to the use of the long protocol in
this population.
Finally, when looking at the overall CATI follow-up population (combining the phone non-working and
non-response records), the total response rate was significantly14 higher in the long protocol (998/3,750 =
26.6 percent) compared to the short protocol (2,284/11,713 = 19.5 percent), leading to a response rate
improvement in the overall population of 7.1 percentage points. In 2013, a response rate improvement of
8.0 percentage points was observed due to the use of the long protocol in the overall population.
These results indicate that, among enrollees who do not respond to the CATI survey and enrollees with
non-working numbers, a second survey mailing as part of a mail follow-up protocol significantly
improves response rates. The replication of these findings across two years of the survey provides strong
evidence that they are systematic. The only deviation from the 2013 results was that, in 2014, the
response rate improvement due to the long protocol was higher among the phone non-working population
than among the non-response population (+7.8 vs. +4.2 percentage points, respectively), whereas the
opposite pattern was observed in 2013 (+6.8 vs. +7.2 percentage points, respectively). This likely reflects
random variation across years and does not change the overall recommendation to use a second survey
mailing as part of the CATI follow-up protocol to significantly increase response rates in both of these
populations.
14
Rao-Scott χ2(1) = 85.97, p < .0001
Methodological Experiments and Non-response Bias Analysis
Page 26
5. EXPERIMENT 3 – IMPACT OF SECOND SURVEY
MAILING ON RESPONSE RATES AS PART OF MAIL
SURVEY PROTOCOL
The Second Survey Mailing/Mail Protocol (SSM-M) experiment tested the effect on response rates of
mailing one vs. two surveys as part of the mail survey protocol. The survey protocols being compared in
this experiment are shown in Table 10. The key difference is that in the long protocol, a second complete
survey is mailed to non-respondents two weeks after the first mail survey is sent out.
Table 10. SSM-M Experiment Mail Protocols: Long (Treatment) vs. Short (Control)
Long Protocol (Treatment)
1.
2.
3.
4.
5.
Short Protocol (Control)
Pre-survey Notification Letter
1st Survey Packet Mailing
Reminder Postcard
2nd Survey Packet Mailing
Telephone Follow-Up
1.
2.
3.
4.
Pre-survey Notification Letter
1st Survey Packet Mailing
Reminder Postcard
Telephone Follow-Up
Design
A subsample (n = 5,813) of the 8,000 enrollees who were randomly assigned to receive the mail protocol
as part of the ME experiment were entered into the SSM-M experiment (see Figure 2). Specifically, 2,907
enrollees were assigned the long mail protocol (treatment group) and 2,906 enrollees were assigned the
short mail protocol (control group). Power analyses show that these sample sizes are sufficient to detect a
one-sided difference in response rates (assuming a 40 percent response rate to the long protocol, as
observed with the 2013 mail survey) of at least three percentage points with nearly 80 percent power.
As with the SSM-P experiment, it is important to note that enrollees entered into the SSM-M experiment
were allowed to complete the survey in any of the three available modes (CATI, mail, or Web). As shown
in Table 11, the majority of enrollees entered into the SSM-M experiment and who ultimately responded
did so in the mail mode, although a small number in each treatment group also completed interviews in
the other two modes. For the purposes of analysis, all responses are counted toward the total response rate
for each SSM-M condition regardless of the response channel, since the outcome of interest in this
experiment is overall response rate improvement due to changes in survey protocol.
Table 11. Sampled Records and Survey Responses by SSM-M Condition and Response Channel
Sample
SSM-M Condition
Long Protocol (Treatment)
Short Protocol (Control)
† Inbound CATI
2,907
2,906
Responses by Response Channel
CATI†
23
362
Mail
825
501
Web
28
24
Total
876
887
RR1
30.1%
30.5%
RR1
Change
-0.4 pts
Total response rates (i.e., combining CATI, mail, and Web completes) for the SSM-M experiment were
computed following AAPOR standards, specifically formula AAPOR RR1, which divides the number of
completed interviews by the total number of attempted interviews.15 The random assignment of records to
experimental groups ensures that the expected distribution of outcome dispositions between groups is
15
Documentation for response rate calculations is available at
http://www.aapor.org/AM/Template.cfm?Section=Standard_Definitions2&Template=/CM/ContentDisplay.cfm&ContentID=3156
Methodological Experiments and Non-response Bias Analysis
Page 27
balanced, so that any differences in the number of completed interviews can be attributed to the
experimental treatment (i.e., the second survey mailing).
Results
As shown in Table 11, the difference in total response rates between the two SSM-M conditions was not
significant.16 Both the short and long mail protocols produced a total response rate of just over 30 percent.
This finding indicates that sending a second survey mailing following non-response to the first mailing
does not produce an advantage in total response rates given similar subsequent follow-up procedures (in
this case, CATI follow-ups).
Although the total response rate did not differ between conditions, the distribution of response channels
used by respondents was significantly different.17 As shown in Figure 5, 94 percent of respondents in the
long protocol used the mail response channel, compared to only 57 percent of respondents in the short
protocol. Assuming equal rates of response to the first survey mailing (due to random assignment to
conditions), and given the equivalent total response rates in Table 11, this finding indicates that there is a
fixed number of first-mailing non-respondents who can be converted into respondents through subsequent
follow-up effort. The mode used to convert these non-respondents, however, appears not to matter: If the
next follow-up attempt is made in the mail mode (as in the long protocol), first-mailing non-respondents
will choose to respond via mail; on the other hand, if the next follow-up attempt is made via phone (as in
the short protocol), first-mailing non-respondents will choose to respond in that mode.
Figure 5. Distribution of Responses by Channel between SSM-M Experiment Conditions
Based on these findings, and assuming that a second survey mailing has a lower cost than a phone followup, an initial recommendation can be made to employ the long mail protocol to reduce survey
administration costs without negatively impacting response rates. In fact, if the second survey were the
16
17
Rao-Scott χ2(1) = 0.11, p = .744
Rao-Scott χ2(2) = 416.78, p < .0001
Methodological Experiments and Non-response Bias Analysis
Page 28
sole follow-up in the mail protocol, and phone follow-ups were eliminated, this experiment suggests that
only 2.6 percent (23 / 876) of first-mailing non-respondents would fail to be converted.18
Given ICF’s recommendation to increase the use of the mail mode in the Survey of Enrollees going
forward, this experiment warrants replication to ensure that the current findings are systematic before
reducing or eliminating phone follow-ups in the mail protocol.
18
Of the 23 phone-channel respondents in the long protocol, 16 completed via outbound CATI and seven completed via inbound
CATI. Of the 362 phone-channel respondents in the short protocol, 339 completed via outbound CATI and 23 completed via
inbound CATI.
Methodological Experiments and Non-response Bias Analysis
Page 29
6. NON-RESPONSE BIAS ANALYSIS
Non-response bias can arise when the propensity to respond to a survey is correlated with survey
outcomes. In such cases, respondents and non-respondents will be systematically different in ways that
bias survey estimates. Non-response bias is typically analyzed using auxiliary variables on the sampling
frame that are available for both respondents and non-respondents. In most cases, the information
available from these auxiliary variables is limited; however, for the SoE, the sampling frame contains
considerable administrative data about the enrollee population. This information makes it possible to
estimate non-response biases with respect to enrollees’ use of various VHA services described below.
This section of the report compares the utilization rate between responding and non-responding enrollees
for each of these VHA services, referred to as HSCs (for details on the utilization indicators, see
Appendix 1).19 These analyses can reveal subgroups of enrollees who are less likely to respond to the
survey, and may therefore benefit from more targeted survey administration efforts. For these analyses,
the data are weighted to account for the differential sampling probabilities in each of the sampling strata
without adjusting for non-response (i.e., using the design weight W1 on the survey data file).
In addition, this section of the report compares utilization rates between enrollees responding via Web and
those responding via mail or CATI. Because assignment to the Web mode was not part of the 2014 Mode
Effects experiment, potential differences between enrollees choosing to respond via the Web survey are
examined here. For these analyses, the data are weighted using the final analysis weight (W3 on the
survey data file), which accounts for sampling probabilities in each of the sampling strata as well as nonresponse and post-stratification adjustments. Because the Web mode is offered to all enrollees, utilization
differences between those who responded by Web and those who responded by CATI or mail could be
due to mode effects or population differences between those more likely to respond through the Web.
Past analyses have examined non-response bias for stratification variables: OEF/OIF/OND, VISN,
priority group, and enrollee type (pre/post). We continue to calculate the non-response bias for these
variables and also include gender (a stratification variable in 2014) and Hispanic ethnicity (oversampled
in 2014).
1. Long-Term Service and Supports
A small proportion of the enrollee population receives long-term service and support (LTSS): 0.52
percent receives institutional long-term care, and 3.61 percent receive non-institutional long-term care.
Respondents vs. Non-Respondents
A significantly lower proportion of respondents (0.38 percent) compared to non-respondents (0.59
percent) receives institutional long-term care, whereas a significantly higher proportion of respondents
(4.37 percent) receives non-institutional long-term care compared to non-respondents (3.23 percent).
Across subgroups, the pattern is generally consistent with respondents having a lower institutional LTSS
utilization rate and higher non-institutional LTSS utilization rate. Consistent with the overall pattern,
responding enrollees are lower for institutional, 2.55 percent (p = .127) and higher for non-institutional (p
= .972).
19
Health Service Categories (HSCs) are defined as the category of care a Veteran received (Inpatient: medical, surgical,
psychiatric, substance abuse, skilled nursing/extended care facility; Ambulatory care: allergy immunotherapy, allergy testing,
anesthesia, cardiovascular; chiropractic, consultations, emergency room visits ,hearing/speech exams, immunizations,
miscellaneous medical, office/home/urgent care visits, outpatient psychiatric, outpatient substance abuse, pathology, physical
exams, physical medicine, radiology, surgery, therapeutic injections, vision exams).
Methodological Experiments and Non-response Bias Analysis
Page 30
Comparisons to population proportions indicate that the survey respondents under-represent the
population of enrollees receiving institutional long-term care (0.52 percent of the population vs. 0.38
percent of respondents) but over-represent enrollees receiving non-institutional long-term care (3.61
percent of the population vs. 4.37 percent of respondents). After response propensity score weighting and
raking, the overall LTSS utilization rate for respondents is 0.55 percent for institutional and 3.69 percent
for non-institutional, not significantly different from the population values.
Web vs. Mail/CATI
Overall, enrollees responding via the Web had significantly lower utilization rates for institutional (0.29
percent) and non-institutional (2.69 percent) long-term care compared to enrollees responding via
mail/CATI (0.59 percent and 3.84 percent, respectively; see Figure 6 and Figure 7). This pattern was also
consistent across strata, indicating that the Web mode is, in general, less likely to be used by enrollees
receiving long-term care compared to the mail and CATI modes. For both of these HSC indicators, the
estimated proportions among mail and CATI respondents were closer to the population values than were
the proportions among Web respondents.
Figure 6. Percentage of Enrollees Receiving Institutional Long-Term Care
Figure 7. Percentage of Enrollees Receiving Non-Institutional Long-Term Care
Methodological Experiments and Non-response Bias Analysis
Page 31
Table 12. Percentage of Enrollees Receiving Institutional Long-Term Care, by Stratum
Stratum
Level
Population
%
Non-Responding
Enrollees %
Responding
Enrollees %
Sig.
Weighted
%
Sig.
Mail/CATI
%
Web
%
Sig.
Overall
-
0.52
0.59
0.38
<.0001
0.55
0.62978
0.59
0.29
0.0337
Hispanic
N
0.73
0.89
0.46
<.0001
0.73
0.98348
0.78
0.41
0.0770
Hispanic
Y
0.50
0.49
0.28
0.0688
0.40
0.37509
0.45
.
Hispanic
Unk
0.10
0.07
0.13
0.0929
0.14
0.41664
0.15
0.07
0.3041
Gender
F
0.30
0.23
0.31
0.3249
0.35
0.59062
0.38
0.16
0.2757
Gender
M
0.54
0.62
0.39
<.0001
0.56
0.67532
0.60
0.30
0.0435
OEF/OIF/OND
N
0.59
0.69
0.41
<.0001
0.63
0.57286
0.67
0.33
0.0322
OEF/OIF/OND
Y
0.05
0.04
0.02
0.5937
0.02
0.01891
0.02
.
VISN
1
0.62
0.70
0.37
0.1773
0.61
0.95994
0.70
.
VISN
2
0.51
0.71
0.29
0.0359
0.40
0.45046
0.42
0.30
0.6691
VISN
3
0.50
0.67
0.19
0.0126
0.36
0.52405
0.41
0.11
0.2150
VISN
4
0.63
0.82
0.58
0.3061
0.82
0.40664
0.90
0.29
0.2542
VISN
5
0.71
0.37
0.59
0.2832
0.96
0.46730
1.09
0.30
0.1977
VISN
6
0.39
0.40
0.33
0.7322
0.41
0.94944
0.46
.
VISN
7
0.28
0.44
0.02
<.0001
0.04
0.00000
0.04
.
VISN
8
0.38
0.40
0.17
0.1430
0.24
0.24258
0.19
0.53
VISN
9
0.39
0.44
0.36
0.7150
0.43
0.83178
0.48
.
VISN
10
0.65
0.78
0.58
0.4421
0.78
0.57384
0.89
.
VISN
11
0.50
0.74
0.33
0.0482
0.42
0.60178
0.48
.
VISN
12
0.83
1.05
0.65
0.1584
0.94
0.69536
1.04
0.42
VISN
15
0.55
0.54
0.58
0.8894
1.01
0.18912
1.13
.
VISN
16
0.40
0.43
0.05
<.0001
0.10
0.00000
0.11
.
VISN
17
0.42
0.30
0.09
0.0233
0.14
0.00000
0.13
0.19
0.7453
VISN
18
0.63
0.56
0.70
0.5602
1.26
0.14274
1.41
0.40
0.0880
VISN
19
0.60
0.77
0.25
0.0213
0.29
0.03047
0.30
0.27
0.9349
VISN
20
0.48
0.77
0.63
0.6397
0.75
0.30552
0.67
1.19
0.4179
VISN
21
0.70
0.98
0.79
0.4825
0.97
0.27957
1.11
0.30
0.0600
VISN
22
0.55
0.54
0.29
0.2168
0.51
0.87301
0.42
0.97
0.4615
VISN
23
0.76
0.49
0.73
0.3045
1.08
0.34890
1.25
.
Priority Group
1
1.37
1.87
0.74
<.0001
1.12
0.15752
1.28
0.24
0.0119
Priority Group
2
0.26
0.18
0.14
0.5415
0.14
0.02192
0.13
0.19
0.6031
Priority Group
3
0.24
0.25
0.23
0.8119
0.25
0.83976
0.29
0.08
0.0865
Priority Group
4
3.11
3.11
2.55
0.1272
4.36
0.01141
4.13
8.17
0.0544
Priority Group
5
0.41
0.39
0.46
0.4124
0.63
0.06752
0.65
0.45
0.7130
0.3563
0.3707
continued on next page
Methodological Experiments and Non-response Bias Analysis
Page 32
continued from previous page
Stratum
Level
Population
%
Non-Responding
Enrollees %
Responding
Enrollees %
Sig.
Weighted
%
Sig.
Mail/CATI
%
Web
%
Sig.
Priority Group
6
0.04
0.02
0.05
0.2827
0.07
0.60856
0.08
.
Priority Group
7
0.37
0.11
0.27
0.2743
0.34
0.89611
0.32
0.45
0.7833
Priority Group
8
0.09
0.09
0.06
0.3564
0.07
0.62248
0.05
0.18
0.2174
Pre/PostEnrollee
POST
0.32
0.32
0.28
0.2687
0.37
0.19881
0.39
0.25
0.2796
Pre/PostEnrollee
PRE
1.30
1.62
0.78
<.0001
1.20
0.53532
1.29
0.47
0.0489
Web
%
Sig.
Table 13. Percentage of Enrollees Receiving Non-Institutional Long-Term Care, by Stratum
Stratum
Level
Population
%
Non-Responding
Enrollees %
Responding
Enrollees %
Sig.
Weighted
%
Sig.
Mail/CATI
%
Overall
-
3.61
3.23
4.37
<.0001
3.69
0.45740
3.84
2.69
<.0001
Hispanic
N
4.91
4.61
5.33
<.0001
4.87
0.77290
5.05
3.66
0.0008
Hispanic
Y
4.59
4.24
5.04
0.0385
4.29
0.37097
4.17
5.22
0.3526
Hispanic
Unk
0.77
0.63
1.18
<.0001
0.80
0.70268
0.87
0.42
0.0507
Gender
F
3.36
2.70
4.64
<.0001
3.83
0.15171
3.89
3.43
0.5812
Gender
M
3.63
3.27
4.36
<.0001
3.67
0.66899
3.84
2.63
<.0001
OEF/OIF/OND
N
3.95
3.60
4.58
<.0001
4.00
0.65029
4.19
2.81
<.0001
OEF/OIF/OND
Y
1.33
1.17
2.05
0.0008
1.57
0.29017
1.53
1.84
0.5999
VISN
1
3.31
3.33
3.47
0.8049
3.19
0.76957
3.50
1.17
0.0662
VISN
2
4.12
3.73
5.42
0.0032
4.86
0.14568
4.68
5.73
0.4857
VISN
3
4.52
3.69
5.27
0.0034
4.01
0.21417
4.10
3.52
0.6154
VISN
4
3.43
3.16
3.62
0.3664
3.47
0.91666
3.64
2.32
0.2501
VISN
5
3.92
3.55
5.24
0.0044
3.99
0.87934
4.55
1.08
0.0015
VISN
6
3.85
3.76
4.92
0.0693
3.92
0.87785
4.15
2.29
0.1986
VISN
7
2.77
2.59
4.64
0.0003
3.70
0.05435
3.85
2.55
0.3274
VISN
8
4.30
3.59
3.98
0.4333
3.66
0.11181
3.88
2.34
0.1116
VISN
9
3.75
3.16
4.19
0.0794
3.34
0.32442
3.37
2.99
0.7730
VISN
10
5.32
4.60
6.77
0.0010
5.62
0.57855
5.92
3.48
0.1156
VISN
11
4.39
3.91
4.93
0.0905
4.20
0.67359
4.15
4.54
0.7907
VISN
12
3.68
3.23
3.38
0.7761
3.20
0.24647
3.38
2.16
0.2024
VISN
15
3.30
2.56
3.29
0.1518
2.72
0.14737
2.86
1.58
0.2747
VISN
16
3.58
3.33
4.46
0.0560
3.87
0.53730
3.94
3.32
0.6586
VISN
17
2.87
2.53
3.90
0.0118
3.19
0.46917
3.22
3.03
0.8703
VISN
18
3.31
3.22
5.03
0.0013
4.03
0.09135
4.16
3.30
0.5351
VISN
19
3.78
3.57
4.16
0.3024
3.38
0.34136
3.54
2.44
0.3256
VISN
20
2.60
2.27
3.48
0.0197
3.00
0.40888
3.10
2.49
0.5918
continued on next page
Methodological Experiments and Non-response Bias Analysis
Page 33
continued from previous page
Stratum
Level
Population
%
Non-Responding
Enrollees %
Responding
Enrollees %
Sig.
Weighted
%
Sig.
Mail/CATI
%
Web
%
Sig.
VISN
21
3.38
2.92
4.09
0.0244
3.26
0.73815
3.52
1.92
0.1040
VISN
22
2.50
2.09
3.71
0.0002
2.75
0.50451
2.75
2.74
0.9933
VISN
23
4.10
3.97
5.29
0.0354
4.90
0.12319
5.28
2.36
0.0396
Priority Group
1
6.74
6.25
7.62
0.0013
7.06
0.36973
7.38
5.25
0.0162
Priority Group
2
2.72
2.47
3.30
0.0040
2.52
0.33843
2.60
2.14
0.3639
Priority Group
3
2.37
2.00
2.80
0.0024
2.13
0.19168
2.21
1.69
0.3257
Priority Group
4
16.89
16.34
17.76
0.0911
17.15
0.73687
17.40
12.98
0.1244
Priority Group
5
3.69
3.17
5.00
<.0001
4.00
0.19653
4.01
3.92
0.9213
Priority Group
6
0.71
0.58
1.27
0.0002
0.92
0.19912
0.87
1.14
0.4862
Priority Group
7
3.48
3.44
3.31
0.8551
3.21
0.62434
3.29
2.64
0.6813
Priority Group
8
1.48
1.33
1.77
0.0055
1.37
0.31760
1.47
0.77
0.0265
Pre/PostEnrollee
POST
2.84
2.52
3.61
<.0001
2.98
0.17361
3.10
2.26
0.0038
Pre/PostEnrollee
PRE
6.54
5.98
7.24
0.0006
6.36
0.52644
6.56
4.77
0.0340
2. Inpatient Treatment
A small proportion of the enrollee population (1.14 percent) receives inpatient treatment related to mental
health or substance abuse (MHSA), and 4.46 percent receives inpatient treatment for other reasons (nonMHSA).
Respondents vs. Non-Respondents
A significantly lower proportion of respondents receives MHSA inpatient treatment (0.79 percent)
compared to non-respondents (1.30 percent), whereas a significantly higher proportion of respondents
receives non-MHSA inpatient treatment (5.03 percent) compared to non-respondents (4.21 percent).
These differences are consistent across strata and indicate that enrollees who respond to the survey
(compared to non-respondents) tend to have a lower utilization rate for MHSA inpatient treatment, but a
higher utilization rate for non-MHSA inpatient treatment. After adjusting for age, the response differences
still exist—those receiving MHSA inpatient treatment are less likely to respond; those receiving nonMHSA inpatient treatment are more likely to respond.
Comparison to population proportions indicates that the survey respondents under-represent the
population of enrollees receiving MHSA inpatient treatment (1.14 percent of the population vs. 0.79
percent of respondents) but over-represents enrollees receiving non-MHSA inpatient treatment (4.46
percent of the population vs. 5.03 percent of respondents). Overall, the response propensity score model
and raking adjustments reduce bias such that the utilization is not significantly different from the
population, 1.17 percent for MHSA inpatient and 4.40 percent for non-MHSA inpatient. However, for
females, the model increases the bias for MHSA inpatient treatment. This is occurring because the overall
results are underestimating MHSA inpatient treatment and the non-response adjustment compensates by
increasing the weights for respondents who have utilized MHSA inpatient. However, female respondents
Methodological Experiments and Non-response Bias Analysis
Page 34
have higher MHSA inpatient utilization than non-respondents, opposite to males.20 Since the nonresponse adjustment increases the weights for those who utilized MHSA inpatient overall, the females
who utilized these services are also getting increased weight, which causes the bias to increase. Note that
this is the only indicator where this effect occurs for females. The non-response model reduces bias for all
other indicators.
A similar effect occurs for Hispanics for the non-MHSA inpatient indicator. In this case, respondents
overall overestimate the population of non-MHSA inpatient indicator so the non-response model
decreases the weights for respondents who have utilized non-MHSA inpatient. The Hispanic respondents
slightly overestimate the population, but the non-response adjustment overcompensates for this
overestimation so the final weighted result underestimates the population.
Web vs. Mail/CATI
Overall, Web respondents had significantly lower utilization rates for MHSA-related (0.62 percent; see
Figure 8) and non-MHSA-related (2.65 percent; see Figure 9) inpatient treatment compared to mail and
CATI respondents (1.25 percent and 4.68 percent, respectively). This pattern was also consistent across
strata, indicating that the Web mode is, in general, less likely to be used by enrollees receiving inpatient
treatment compared to the mail/CATI modes. For both HSC indicators, the estimated proportions among
mail/CATI respondents were substantially closer to the population values than among Web respondents.
Figure 8. Percentage of Enrollees Receiving Inpatient Treatment for MHSA
Figure 9. Percentage of Enrollees Receiving Inpatient Treatment for neither Mental Health nor
Substance Abuse
20
Note that MHSA inpatient treatment is not a significant predictor of response among females when adjusting for
age. However, it is a significant predictor of response among males even after adjusting for age.
Methodological Experiments and Non-response Bias Analysis
Page 35
Table 14. Percentage of Enrollees Receiving Inpatient Treatment for MHSA, by Stratum
Stratum
Level
Population
%
Non-Responding
Enrollees %
Responding
Enrollees %
Sig.
Weighted
%
Sig.
Mail/CATI
%
Web
%
Sig.
Overall
-
1.14
1.30
0.79
<.0001
1.17
0.73571
1.25
0.62
0.0032
Hispanic
N
1.58
1.92
0.94
<.0001
1.49
0.42868
1.60
0.78
0.0086
Hispanic
Y
1.73
1.92
1.42
0.0526
2.22
0.12989
2.28
1.79
0.6128
Hispanic
Unk
0.15
0.11
0.21
0.0522
0.24
0.21527
0.26
0.17
0.5868
Gender
F
1.35
1.22
1.42
0.3708
1.94
0.06257
2.07
1.18
0.1971
Gender
M
1.12
1.31
0.75
<.0001
1.11
0.81246
1.19
0.58
0.0067
OEF/OIF/OND
N
1.09
1.29
0.74
<.0001
1.12
0.66222
1.23
0.47
0.0005
OEF/OIF/OND
Y
1.49
1.36
1.35
0.9709
1.45
0.88895
1.41
1.76
0.6156
VISN
1
1.43
1.46
0.92
0.0947
1.46
0.93032
1.69
.
VISN
2
1.19
1.37
0.89
0.1400
1.55
0.40973
1.51
1.76
VISN
3
1.11
1.31
1.06
0.4122
1.62
0.17350
1.92
.
VISN
4
1.17
1.21
0.56
0.0190
0.72
0.02557
0.83
.
VISN
5
1.20
1.45
0.84
0.0786
1.54
0.43577
1.55
1.53
0.9890
VISN
6
1.06
1.41
0.91
0.1619
1.28
0.54349
1.44
0.12
0.0027
VISN
7
1.09
1.16
1.61
0.2457
2.01
0.04626
2.01
1.94
0.9612
VISN
8
1.21
1.53
0.84
0.0304
1.10
0.69877
1.21
0.45
0.0989
VISN
9
1.29
1.91
0.66
0.0011
0.80
0.03953
0.88
.
VISN
10
1.26
1.39
0.64
0.0167
1.12
0.70781
1.02
1.87
0.5532
VISN
11
1.05
1.26
0.48
0.0092
0.74
0.20441
0.74
0.74
0.9962
VISN
12
1.37
1.62
0.64
0.0010
1.25
0.73777
1.46
0.04
<.0001
VISN
15
1.28
1.47
0.96
0.1304
1.35
0.82810
1.42
0.70
0.4831
VISN
16
1.17
1.11
0.89
0.4497
1.58
0.31490
1.59
1.45
0.8890
VISN
17
1.20
1.15
1.10
0.8823
1.48
0.46195
1.56
0.95
0.5128
VISN
18
1.05
1.25
0.56
0.0081
0.81
0.22310
0.83
0.69
0.8469
VISN
19
1.07
1.33
0.39
0.0018
0.54
0.00814
0.63
.
VISN
20
1.09
1.54
0.31
0.0002
0.42
0.00004
0.51
.
VISN
21
0.88
0.83
0.49
0.1589
0.73
0.50333
0.67
1.02
0.5898
VISN
22
0.94
0.91
0.45
0.0270
0.79
0.57687
0.89
0.29
0.1325
VISN
23
0.99
0.96
0.86
0.7158
1.34
0.31998
1.54
.
Priority Group
1
2.12
2.41
1.57
0.0007
2.48
0.18359
2.61
1.73
0.2197
Priority Group
2
0.89
1.12
0.58
0.0032
0.80
0.59528
0.88
0.39
0.3123
Priority Group
3
0.76
0.90
0.55
0.0272
0.73
0.87626
0.84
0.16
0.0664
Priority Group
4
6.43
7.56
3.77
<.0001
6.69
0.69162
6.86
3.82
0.2876
Priority Group
5
1.38
1.54
0.95
0.0012
1.32
0.76744
1.37
0.79
0.2818
0.8354
continued on next page
Methodological Experiments and Non-response Bias Analysis
Page 36
continued from previous page
Stratum
Level
Population
%
Non-Responding
Enrollees %
Responding
Enrollees %
Sig.
Weighted
%
Sig.
Mail/CATI
%
Web
%
Sig.
Priority Group
6
0.29
0.27
0.04
0.0072
0.09
0.01105
0.09
0.08
0.8840
Priority Group
7
0.56
0.46
0.18
0.2056
0.37
0.47998
0.42
.
Priority Group
8
0.16
0.19
0.10
0.1161
0.13
0.40476
0.13
0.08
0.5909
Pre/PostEnrollee
POST
0.92
1.06
0.67
<.0001
0.96
0.61884
1.04
0.49
0.0064
Pre/PostEnrollee
PRE
1.97
2.23
1.23
<.0001
1.94
0.89170
2.02
1.26
0.2818
Table 15. Percentage of Enrollees Receiving Inpatient Treatment for neither Mental Health nor Substance
Abuse, by Stratum
Stratum
Level
Population
%
Non-Responding
Enrollees %
Responding
Enrollees %
Overall
-
4.46
4.21
5.03
Hispanic
N
6.15
6.15
6.11
Hispanic
Y
5.75
5.45
Hispanic
Unk
0.76
Gender
F
Gender
Sig.
<.0001
Weighted
%
Sig.
Mail/CATI
%
Web
%
Sig.
4.40
0.63029
4.68
2.65
<.0001
0.8148
5.82
0.04418
6.17
3.53
<.0001
5.88
0.3149
5.04
0.04826
5.20
3.87
0.2165
0.59
1.40
<.0001
0.96
0.05392
1.01
0.70
0.2757
4.09
3.68
4.70
0.0022
4.20
0.73790
4.35
3.31
0.1891
M
4.49
4.26
5.05
<.0001
4.42
0.57733
4.70
2.60
<.0001
OEF/OIF/OND
N
4.91
4.76
5.33
0.0004
4.87
0.73517
5.19
2.84
<.0001
OEF/OIF/OND
Y
1.45
1.20
1.64
0.0728
1.32
0.51464
1.33
1.27
0.9087
VISN
1
3.65
3.51
4.31
0.1607
3.96
0.52976
4.34
1.53
0.0417
VISN
2
4.68
4.03
5.19
0.0506
4.73
0.91871
4.90
3.90
0.5300
VISN
3
3.22
3.05
3.99
0.0562
3.74
0.27529
4.15
1.41
0.0073
VISN
4
3.60
3.92
4.36
0.4561
4.02
0.38524
4.24
2.49
0.2993
VISN
5
3.86
3.37
3.69
0.5592
3.46
0.40397
3.98
0.74
0.0093
VISN
6
4.15
3.94
4.14
0.7382
3.52
0.15948
3.78
1.59
0.1901
VISN
7
4.06
3.52
5.47
0.0029
4.85
0.17609
5.14
2.61
0.1319
VISN
8
5.42
5.17
5.11
0.9268
4.77
0.18025
5.07
2.93
0.1678
VISN
9
5.48
4.98
6.54
0.0314
5.49
0.97950
5.80
2.65
0.0621
VISN
10
4.99
4.88
5.75
0.1972
5.27
0.61998
5.44
4.04
0.5657
VISN
11
3.80
3.78
4.52
0.2216
3.72
0.85363
3.91
2.40
0.2569
VISN
12
5.24
5.17
5.10
0.9202
4.42
0.08392
4.97
1.26
0.0042
VISN
15
4.93
4.93
5.59
0.3347
5.26
0.57271
5.57
2.67
0.0630
VISN
16
4.68
4.53
4.56
0.9687
3.71
0.02948
3.99
1.53
0.0701
VISN
17
3.94
3.79
4.28
0.4308
3.54
0.38400
3.66
2.79
0.5128
VISN
18
4.95
4.72
6.48
0.0075
5.91
0.11767
6.42
2.97
0.0116
VISN
19
4.22
4.42
4.57
0.8022
3.31
0.01835
3.50
2.19
0.1748
continued on next page
Methodological Experiments and Non-response Bias Analysis
Page 37
continued from previous page
Stratum
Level
Population
%
Non-Responding
Enrollees %
Responding
Enrollees %
Sig.
Weighted
%
Sig.
Mail/CATI
%
Web
%
Sig.
VISN
20
4.35
4.05
5.58
0.0273
4.53
0.72867
4.29
5.75
0.2814
VISN
21
5.00
4.72
5.92
0.0686
5.46
0.42107
5.77
3.86
0.1736
VISN
22
4.48
3.90
4.59
0.2081
4.06
0.38570
4.24
3.10
0.4407
VISN
23
4.14
3.47
5.17
0.0048
4.92
0.13952
5.30
2.40
0.0335
Priority Group
1
7.63
7.98
7.87
0.8120
7.40
0.53179
8.01
4.06
<.0001
Priority Group
2
3.43
3.02
3.65
0.0535
2.93
0.03377
3.03
2.38
0.3366
Priority Group
3
3.02
2.66
3.54
0.0030
2.81
0.37413
3.15
1.02
<.0001
Priority Group
4
16.38
16.30
16.27
0.9698
17.67
0.10869
17.38
22.42
0.1547
Priority Group
5
5.82
5.23
7.33
<.0001
6.11
0.35101
6.18
5.28
0.4599
Priority Group
6
1.08
0.88
1.50
0.0051
1.01
0.66103
0.97
1.20
0.5852
Priority Group
7
3.91
4.17
4.30
0.8739
3.84
0.90376
3.66
5.03
0.3993
Priority Group
8
1.47
1.26
1.65
0.0122
1.33
0.20983
1.45
0.60
0.0207
Pre/Post-Enrollee
POST
3.46
3.17
4.24
<.0001
3.57
0.33966
3.77
2.35
<.0001
Pre/Post-Enrollee
PRE
8.25
8.29
7.98
0.4714
7.57
0.03862
8.01
4.08
0.0001
3. Outpatient Treatment
Compared to inpatient treatment, larger proportions of the enrollee population utilize outpatient services:
16.34 percent of enrollees use outpatient treatment for MHSA and 62.21 percent use outpatient treatment
for other reasons (non-MHSA).
Respondents vs. Non-Respondents
A significantly higher proportion of respondents receives MHSA outpatient treatment (17.17 percent)
compared to non-respondents (16.10 percent), and a significantly higher proportion of respondents also
receives non-MHSA outpatient treatment (76.22 percent) compared to non-respondents (56.26 percent).
These differences are consistent across strata and indicate that enrollees who respond to the survey
(compared to non-respondents) tend to have higher utilization rates for both MHSA and non-MHSA
outpatient treatment. This pattern differs from that observed for inpatient treatment, where survey
respondents tended to have lower utilization rates for MHSA inpatient treatment.
Comparison to population proportions indicates that the survey respondents over-represent the population
of enrollees receiving MHSA outpatient treatment (16.34 percent of the population vs. 17.17 percent of
respondents) and more substantially over-represent enrollees receiving non-MHSA outpatient treatment
(62.21 percent of the population vs. 76.22 percent of respondents). After the response propensity score
adjustment, the estimate of enrollee utilization for outpatient treatment is no longer significantly different
from the population for MHSA (16.66 percent) and non-MHSA (62.09 percent).
Web vs. Mail/CATI
Overall, enrollees responding via the Web had significantly lower utilization rates for MHSA-related
(12.62 percent) and non-MHSA-related (57.63 percent) outpatient treatment compared to enrollees
responding via mail and CATI (17.29 percent and 62.78 percent, respectively). This pattern was also
consistent across strata, indicating that the Web mode is, in general, less likely to be used by enrollees
receiving outpatient treatment compared to the mail and CATI modes.
Methodological Experiments and Non-response Bias Analysis
Page 38
Figure 10. Percentage of Enrollees Receiving Outpatient Treatment for MHSA
Figure 11. Percentage of Enrollees Receiving Outpatient Treatment for neither Mental Health nor
Substance Abuse
Table 16. Percentage of Enrollees Receiving Outpatient Treatment for MHSA, by Stratum
Stratum
Level
Population
%
Non-Responding
Enrollees %
Responding
Enrollees %
Sig.
Weighted
%
Sig.
Mail/CATI
%
Web
%
Sig.
Overall
-
16.34
16.10
17.17
<.0001
16.67
0.15171
17.29
12.62
<.0001
Hispanic
N
21.84
22.59
20.42
<.0001
21.32
0.08572
21.95
17.08
<.0001
Hispanic
Y
27.44
26.88
31.10
<.0001
30.57
0.00024
31.18
25.91
0.0407
Hispanic
Unk
3.40
3.13
4.64
<.0001
3.64
0.28097
3.93
1.96
0.0003
Gender
F
23.43
22.15
28.66
<.0001
25.36
0.00542
26.36
19.32
0.0003
Gender
M
15.79
15.58
16.49
0.0010
15.99
0.40446
16.59
12.07
<.0001
OEF/OIF/OND
N
15.37
15.17
16.03
0.0014
15.34
0.88364
15.93
11.56
<.0001
OEF/OIF/OND
Y
22.78
21.23
30.22
<.0001
25.45
0.00250
26.20
20.23
0.0159
VISN
1
16.06
15.88
15.60
0.7991
15.98
0.93363
16.45
12.95
0.1770
VISN
2
13.48
12.22
13.38
0.2226
12.96
0.53392
13.43
10.65
0.2214
VISN
3
13.93
13.62
16.05
0.0097
14.97
0.19051
15.69
10.94
0.0342
VISN
4
14.51
14.14
14.42
0.7850
14.14
0.65920
14.36
12.69
0.5109
VISN
5
14.73
13.82
16.07
0.0396
14.49
0.80272
15.28
10.39
0.0565
continued on next page
Methodological Experiments and Non-response Bias Analysis
Page 39
continued from previous page
Stratum
Level
Population
%
Non-Responding
Enrollees %
Responding
Enrollees %
Sig.
Weighted
%
Sig.
Mail/CATI
%
Web
%
Sig.
VISN
6
16.39
16.50
16.23
0.8211
15.25
0.24828
15.97
10.11
0.0389
VISN
7
18.02
18.13
21.29
0.0152
19.19
0.25959
19.87
13.94
0.0910
VISN
8
18.05
18.32
17.96
0.7337
19.13
0.27008
19.85
14.80
0.0516
VISN
9
17.30
16.92
18.12
0.3227
17.04
0.79344
17.12
16.25
0.7980
VISN
10
18.14
18.74
19.19
0.7005
18.69
0.59840
18.86
17.40
0.6589
VISN
11
14.99
15.11
14.93
0.8636
14.90
0.91550
15.37
11.65
0.1731
VISN
12
15.73
15.99
15.45
0.6065
16.68
0.34860
17.77
10.48
0.0111
VISN
15
16.01
14.55
18.33
0.0008
17.72
0.09282
18.59
10.22
0.0053
VISN
16
17.86
17.58
19.87
0.0584
19.08
0.25015
19.51
15.78
0.2503
VISN
17
17.40
15.47
20.28
<.0001
18.40
0.32524
18.64
16.83
0.5578
VISN
18
16.49
16.18
16.98
0.4641
15.86
0.47872
16.85
10.12
0.0087
VISN
19
15.57
15.23
14.94
0.7836
13.99
0.07405
14.58
10.45
0.1562
VISN
20
15.49
16.09
16.08
0.9906
15.50
0.99570
16.86
8.50
0.0003
VISN
21
16.63
16.51
17.64
0.3125
16.89
0.79329
17.44
14.08
0.2307
VISN
22
17.34
16.77
19.65
0.0059
18.85
0.13282
19.76
14.03
0.0225
VISN
23
13.17
13.24
12.41
0.4179
12.11
0.17500
12.66
8.52
0.0819
Priority Group
1
36.53
37.08
36.33
0.3682
36.26
0.70746
37.90
27.26
<.0001
Priority Group
2
17.17
16.74
17.07
0.6167
16.22
0.09834
16.92
12.65
0.0039
Priority Group
3
11.28
11.50
12.13
0.2581
11.52
0.62370
12.20
7.92
0.0008
Priority Group
4
29.83
30.00
28.47
0.1381
31.77
0.05033
31.87
30.13
0.6704
Priority Group
5
16.13
15.98
17.39
0.0149
17.96
0.00139
17.97
17.86
0.9542
Priority Group
6
8.16
8.25
8.60
0.5659
8.32
0.76587
8.96
5.22
0.0062
Priority Group
7
9.48
9.08
8.23
0.4443
8.55
0.30675
8.68
7.71
0.7220
Priority Group
8
4.22
4.00
4.17
0.5036
3.96
0.26013
4.26
2.26
0.0010
Pre/PostEnrollee
POST
14.47
14.30
15.18
0.0017
14.91
0.07636
15.51
11.26
<.0001
Pre/PostEnrollee
PRE
23.44
23.14
24.62
0.0242
23.31
0.80858
23.85
19.10
0.0078
Methodological Experiments and Non-response Bias Analysis
Page 40
Table 17. Percentage of Enrollees Receiving Outpatient Treatment for neither Mental Health nor Substance
Abuse, by Stratum
Stratum
Level
Population
%
Non-Responding
Enrollees %
Responding
Enrollees %
Sig.
Weighted
%
Sig.
Mail/CATI
%
Web
%
Sig.
Overall
-
62.21
56.26
76.22
<.0001
62.09
0.71169
62.78
57.63
<.0001
Hispanic
N
80.20
76.08
87.66
<.0001
77.43
0.00000
77.93
74.07
0.0004
Hispanic
Y
76.40
73.96
85.12
<.0001
74.78
0.07710
75.06
72.67
0.3902
Hispanic
Unk
22.88
18.64
37.67
<.0001
24.07
0.01250
24.39
22.22
0.0950
Gender
F
59.79
55.06
76.26
<.0001
61.84
0.01379
62.56
57.50
0.0248
Gender
M
62.39
56.36
76.22
<.0001
62.10
0.40058
62.80
57.64
<.0001
OEF/OIF/OND
N
63.87
57.85
77.14
<.0001
63.84
0.94257
64.64
58.82
<.0001
OEF/OIF/OND
Y
51.17
47.50
65.68
<.0001
50.41
0.46674
50.59
49.23
0.6490
VISN
1
63.21
58.69
76.86
<.0001
63.80
0.67257
64.87
56.90
0.0483
VISN
2
58.63
51.25
73.73
<.0001
59.29
0.64495
59.88
56.39
0.3661
VISN
3
49.27
44.55
63.93
<.0001
50.50
0.34562
52.36
40.11
0.0009
VISN
4
61.80
55.70
76.72
<.0001
62.38
0.69415
63.42
55.47
0.0701
VISN
5
51.34
47.16
66.61
<.0001
50.52
0.55389
53.52
34.87
<.0001
VISN
6
61.92
57.13
72.85
<.0001
57.94
0.00669
59.34
47.87
0.0091
VISN
7
61.80
56.25
76.73
<.0001
62.88
0.44918
64.03
54.07
0.0257
VISN
8
68.54
62.86
81.53
<.0001
70.08
0.25469
69.93
71.00
0.7680
VISN
9
65.00
58.57
78.86
<.0001
65.40
0.77906
65.63
63.30
0.6227
VISN
10
64.13
57.85
78.50
<.0001
65.56
0.32221
65.59
65.33
0.9539
VISN
11
63.82
57.98
77.38
<.0001
63.47
0.81019
63.97
60.04
0.3464
VISN
12
64.47
58.28
77.95
<.0001
64.60
0.93095
65.05
61.99
0.4568
VISN
15
64.58
57.80
79.58
<.0001
65.97
0.34143
66.40
62.33
0.3688
VISN
16
63.79
57.92
76.53
<.0001
62.35
0.31544
63.11
56.46
0.1208
VISN
17
60.35
55.38
76.70
<.0001
61.70
0.33699
62.51
56.59
0.1491
VISN
18
62.49
57.07
76.48
<.0001
62.49
0.99901
63.37
57.38
0.1396
VISN
19
60.93
54.85
73.18
<.0001
56.94
0.00589
57.45
53.87
0.3769
VISN
20
61.54
52.87
73.94
<.0001
60.32
0.39462
60.99
56.87
0.2673
VISN
21
60.81
56.00
76.12
<.0001
61.77
0.49671
61.06
65.42
0.2522
VISN
22
56.19
51.72
69.53
<.0001
55.36
0.54803
55.53
54.49
0.7822
VISN
23
67.70
59.18
80.22
<.0001
66.70
0.48704
66.98
64.87
0.6000
Priority Group
1
82.70
79.79
88.50
<.0001
79.76
0.00014
81.18
71.99
<.0001
Priority Group
2
65.62
60.91
75.90
<.0001
62.33
0.00021
63.69
55.32
0.0003
Priority Group
3
58.76
53.04
73.02
<.0001
57.38
0.10769
58.18
53.15
0.0238
Priority Group
4
78.20
74.22
90.04
<.0001
83.60
0.00000
83.86
79.25
0.2215
Priority Group
5
61.71
55.84
79.56
<.0001
65.83
0.00000
65.59
68.42
0.3129
continued on next page
Methodological Experiments and Non-response Bias Analysis
Page 41
continued from previous page
Stratum
Level
Population
%
Non-Responding
Enrollees %
Responding
Enrollees %
Sig.
Weighted
%
Sig.
Mail/CATI
%
Web
%
Sig.
Priority Group
6
43.12
37.13
61.67
<.0001
42.48
0.53321
41.18
48.87
0.0031
Priority Group
7
75.55
70.27
80.69
<.0001
72.56
0.05778
72.15
75.31
0.4551
Priority Group
8
49.44
41.85
65.50
<.0001
49.00
0.47833
49.95
43.58
0.0002
Pre/PostEnrollee
POST
59.80
53.74
74.29
<.0001
59.58
0.55104
60.17
56.01
<.0001
Pre/PostEnrollee
PRE
71.34
66.09
83.43
<.0001
71.60
0.71026
72.39
65.34
0.0008
4. VHA Pharmacy Services
A substantial proportion (54.21 percent) of the enrollee population receives prescription drug services.
Respondents vs. Non-Respondents
A significantly higher proportion of respondents receives prescription drug services (66.79 percent)
compared to non-respondents (48.91 percent). This relatively large difference is consistent across all
strata and indicates that enrollees who respond to the survey (compared to non-respondents) tend to have
higher utilization rates for prescription drug services.
Comparison to the population proportion indicates that survey respondents substantially over-represent
the population of enrollees receiving prescription drug services (54.21 percent of the population vs.
66.79v of respondents). The propensity score model and raking adjustment reduce the utilization rate for
respondents to 54.27 percent.
Web vs. Mail/CATI
Overall, enrollees responding via the Web had significantly lower utilization rates for prescription drug
services (47.07 percent) compared to enrollees responding via mail and CATI (55.40 percent). This
pattern was also consistent across strata, indicating that the Web mode is, in general, less likely to be used
by enrollees receiving prescription drug services compared to the mail and CATI modes. This effect still
exists after adjusting for age.
Figure 12. Percentage of Enrollees Receiving Prescription Drug Services
Methodological Experiments and Non-response Bias Analysis
Page 42
Table 18. Percentage of Enrollees Receiving Prescription Drug Services by Stratum
Stratum
Level
Population
%
Non-Responding
Enrollees %
Responding
Enrollees %
Sig.
Weighted
%
Sig.
Mail/CATI
%
Web
%
Sig.
Overall
-
54.21
48.91
66.79
<.0001
54.27
0.84240
55.40
47.07
<.0001
Hispanic
N
71.34
67.59
78.17
<.0001
69.01
0.00000
70.07
61.93
<.0001
Hispanic
Y
67.84
65.17
77.61
<.0001
66.90
0.32329
67.66
61.17
0.0227
Hispanic
Unk
16.76
13.50
28.14
<.0001
17.69
0.02373
18.16
14.99
0.0049
Gender
F
52.32
47.62
67.48
<.0001
54.22
0.01974
55.12
48.83
0.0050
Gender
M
54.36
49.02
66.74
<.0001
54.28
0.80910
55.42
46.92
<.0001
OEF/OIF/OND
N
56.37
51.11
67.93
<.0001
56.29
0.80057
57.51
48.51
<.0001
OEF/OIF/OND
Y
39.87
36.84
53.69
<.0001
40.90
0.29524
41.49
36.83
0.1028
VISN
1
53.12
48.11
65.45
<.0001
54.04
0.49188
55.42
45.17
0.0078
VISN
2
50.28
43.53
63.26
<.0001
50.61
0.81245
52.14
43.09
0.0120
VISN
3
42.21
38.20
55.02
<.0001
43.36
0.35783
45.40
31.93
0.0001
VISN
4
52.62
47.59
64.60
<.0001
52.20
0.76675
53.79
41.56
0.0027
VISN
5
42.98
39.82
55.86
<.0001
42.10
0.50221
45.07
26.58
<.0001
VISN
6
55.34
50.98
64.92
<.0001
52.34
0.03369
54.48
36.96
<.0001
VISN
7
55.11
50.61
68.42
<.0001
55.77
0.63235
57.24
44.53
0.0041
VISN
8
59.16
54.19
70.05
<.0001
60.27
0.40178
60.43
59.33
0.7587
VISN
9
57.77
52.22
68.77
<.0001
57.03
0.59914
57.48
52.91
0.3264
VISN
10
56.28
50.53
69.18
<.0001
57.68
0.31550
57.68
57.73
0.9906
VISN
11
56.09
51.78
68.14
<.0001
55.72
0.79476
56.70
48.97
0.0546
VISN
12
57.26
50.91
69.21
<.0001
57.14
0.92881
58.10
51.66
0.0996
VISN
15
57.21
51.58
70.56
<.0001
58.71
0.29116
60.06
47.17
0.0025
VISN
16
57.56
51.81
70.10
<.0001
57.02
0.70101
57.99
49.53
0.0450
VISN
17
53.57
48.33
67.49
<.0001
54.27
0.61199
55.50
46.45
0.0240
VISN
18
54.49
50.14
69.05
<.0001
55.83
0.32457
57.23
47.72
0.0145
VISN
19
51.63
46.33
63.19
<.0001
49.27
0.08557
50.39
42.62
0.0401
VISN
20
53.15
46.15
64.43
<.0001
52.36
0.56952
53.36
47.20
0.0895
VISN
21
51.86
47.07
65.90
<.0001
53.38
0.26351
53.25
54.07
0.8230
VISN
22
47.50
43.39
59.32
<.0001
47.20
0.81681
47.81
43.98
0.2820
VISN
23
57.79
50.45
70.24
<.0001
58.32
0.70129
59.15
52.86
0.1072
Priority Group
1
75.90
73.32
81.64
<.0001
73.76
0.00566
75.60
63.68
<.0001
Priority Group
2
54.32
50.66
62.80
<.0001
51.41
0.00061
52.92
43.70
<.0001
Priority Group
3
46.75
42.19
59.32
<.0001
46.36
0.62366
47.70
39.25
<.0001
Priority Group
4
74.17
70.63
86.10
<.0001
80.38
0.00000
80.86
72.36
0.0293
Priority Group
5
56.46
50.72
73.20
<.0001
59.93
0.00001
59.86
60.74
0.7488
continued on next page
Methodological Experiments and Non-response Bias Analysis
Page 43
continued from previous page
Stratum
Level
Population
%
Non-Responding
Enrollees %
Responding
Enrollees %
Sig.
Weighted
%
Sig.
Mail/CATI
%
Web
%
Sig.
Priority Group
6
31.68
26.59
46.62
<.0001
31.44
0.78227
31.03
33.47
0.2726
Priority Group
7
62.17
57.01
66.28
<.0001
59.67
0.12246
59.75
59.13
0.8905
Priority Group
8
42.25
35.58
56.15
<.0001
41.65
0.29685
43.00
33.91
<.0001
Pre/PostEnrollee
POST
50.83
45.55
63.73
<.0001
50.82
0.98469
51.82
44.76
<.0001
Pre/PostEnrollee
PRE
67.04
62.04
78.19
<.0001
67.36
0.63848
68.55
58.00
<.0001
Methodological Experiments and Non-response Bias Analysis
Page 44
7. DISCUSSION AND RECOMMENDATIONS
This is the seventh report in the Experimental Methods Series. The approach taken in the current report
was to provide a comprehensive account of TSE across all aspects of the Survey of Enrollees (see Figure
1). The TSE framework divides survey error into two major sources: errors of representation, which are
due to the systematic and random errors that influence which members of the population respond to the
survey; and errors of observation, which are due to the systematic and random errors that influence the
accuracy with which survey constructs are measured.
Summary of Findings
Across all areas investigated in the current report, evidence of low or no bias was found, with no evidence
of major bias in any TSE domain. Where bias was detected, the survey weights were shown to effectively
reduce bias in population estimates.
The evaluation of potential bias in the sampling and weighting design revealed that the
disproportionate stratified sampling plan introduces representation bias in the unweighted sample
(as expected), but the design weights eliminate this bias.
The ME experiment shows that the substantial reduction in potential coverage bias due to the
introduction of a mail mode has been achieved with only minor increases in measurement error
due to mode effects. The replication of conclusions drawn in 2013 greatly increases confidence
that any mode effects due to the Survey of Enrollees’s mixed-mode design are of small
magnitude and do not threaten substantive conclusions drawn from the data.
The SSM-P experiment found that the use of a “long” mail follow-up protocol, with two survey
mailings, compared to a “short” protocol, with only one survey mailing, significantly increased
response rates among both CATI survey non-respondents and enrollees with non-working phone
numbers. The successful replication of 2013 experiment findings indicates that this response rate
increase is a systematic effect and can be recommended for decreasing the potential for nonresponse bias.
The SSM-M experiment found that, although a “long” mail protocol with two survey mailings did
not increase total response rates compared to a “short” protocol with one survey mailing, the
distributions of response channels used by converted non-respondents during follow-up did differ.
Specifically, converted first-mailing non-respondents appeared to respond using whatever
response channel was used for the initial non-response follow-up (i.e., mail in the long protocol,
or phone in the short protocol). This finding suggests a potential cost savings by favoring mail
follow-ups over CATI follow-ups in the mail survey protocol.
The non-response bias analysis showed that, as in past years, although there were some
differences between respondents and non-respondents with respect to health service utilization
indicators, these differences were not of large magnitude and were in nearly all cases eliminated
by the response propensity score and raking weight adjustments.
Thus, the general conclusions of this report are that the Survey of Enrollees is representative of the target
population and that the survey instrument is accurately measuring the outcomes of interest.
Recommendations
Recommendations that have stemmed from prior annual analyses are to (parenthetical notes indicate if
these recommendations were implemented and if so, when):
Use propensity score weighting based on utilization of administrative records (Full adoption);
Send a pre-survey notification letter to Veterans prior to calling (Full adoption);
Methodological Experiments and Non-response Bias Analysis
Page 45
Increase the call attempts from six to seven (Full adoption);
Use address information to locate and update telephone numbers via database look-ups (Mixed
adopton: full adoption in 2008 and 2010; not implemented in 2011 due to security and privacy
concerns; implemented sparingly in 2012, 2013, and 2014 for seven-digit telephone numbers
and invalid area codes);
Add a mail survey (Partial adoption as described in the current report); and
Add a Web survey (Full adoption).
Based on the current analyses, we make the following additional recommendations:
1. Continue to offer the mail and CATI modes. The 2013 and 2014 mode effects experiments
revealed some replicable differences in survey responses between modes, but these differences
are quite small. Given that we cannot know which mode provides more accurate results (i.e.,
which comes closer to the “true score” for a given outcome), and given that the differences
between modes were minor, the guaranteed benefit of reducing the potential for coverage bias by
including all modes outweighs the introduction of small amounts of measurement error due to the
use of a mixed-mode design.
2. Implement the second survey mailing as part of CATI survey follow-up. As part of the CATI
non-response/non-working follow-up protocol, the second survey mailing raised total response
rates by eight percentage points over a single follow-up mailing in 2013, and by seven percentage
points in 2014. Based on this evidence of a replicable effect, we recommend broader adoption of
the “long” mail protocol for following up with CATI non-respondents and non-working numbers.
However, the cost implications need to be considered in more detail when deciding how to scale
this protocol modification. In particular, the cost of a response generated from a second survey
mailing must be compared to the cost of a response from sampling another record.
3. Replicate the SSM-M experiment. The SSM-M experiment conducted in 2014 found that,
although a second survey mailing as part of the mail protocol did not increase total response rates
compared to a single mailing, it did lead to a significant increase in the use of the mail survey by
first-mailing non-respondents. Given ICF’s recommendation to increase the use of the mail mode
going forward, this experiment warrants replication to ensure that the current findings are
systematic before reducing or eliminating CATI follow-ups in the mail protocol.
4. Continue to offer a Web response channel. There is some evidence that the population
choosing to respond via Web differs from the populations responding via mail/CATI.
Specifically, Web respondents reported lower utilization of VA health care services; continuing
to offer the Web option will increase coverage of this group.
5. Continue to investigate the potential for coverage bias. Coverage bias arises when differences
between enrollees included in vs. excluded from the sampling frame are associated with survey
outcomes. Although the potential for coverage bias in the Survey of Enrollees has been greatly
reduced by the introduction of the mail mode to cover enrollees without a phone number on
record, the current frame development procedures still leave a small window for coverage bias
due to the use of particular criteria that exclude enrollees from the sampling frame. Of particular
concern is the exclusion from the sampling frame of enrollees in the VHA database who do not
have a valid address on record. Through intensive efforts to match contact information to a
sample of these currently excluded enrollees and obtaining completed interviews with them, a
cost-benefit analysis can then be conducted to determine the extent to which the exclusion of
these enrollees introduces coverage bias, and whether extending coverage to them warrants the
increased cost of doing so.
Methodological Experiments and Non-response Bias Analysis
Page 46
APPENDIX A – UTILIZATION MEASURES
Utilization indicators based on administrative records are provided for the following services in the
previous year:
Institutional and non-institutional long-term care benefits,
Inpatient and outpatient treatment serves, both for MHSA and non-MHSA issues, and
Prescription drug benefits.
Based on administrative records, these measures indicate whether an enrollee had utilized any of the
following services in the previous year (the file did not indicate the frequency of use or amount paid for
any of these benefits):
1. Received long-term care services,
a. Institutional
b. Non-institutional
2. Received Inpatient treatment,
a. MHSA
b. Non-MHSA
3. Received Outpatient treatment,
a. MHSA
b. Non-MHSA
4. Received VHA pharmacy services.
Since 2007, these utilization indicators have been used in the weighting process, for bias assessment, and
for assessing sample design performance.
From 2007–2010, the indicators were based on service utilization sourced from VHA workload files that
were based on bed section and clinic stop. This categorization indicated where a Veteran received care.
For the 2011 survey, the indicators were based on service utilization from HSCs. This categorization
indicates what care a Veteran received. A second change made in 2011 included separating long-term
care in institutions and not in institutions As such, from 2007–2010, the indicator was a single measure of
home health service.
Methodological Experiments and Non-response Bias Analysis
Page 47
APPENDIX B – NON-RESPONSE PROPENSITY SCORE
QUINTILES
The following tables show the distribution of non-response propensity score model predictors for
combined-sample respondents by the propensity score quintiles used to compute the non-response
adjustment. For categorical variables (all dichotomous), percentages are reported, and for continuous
variables, means are reported.
Table 19. Distribution of Non-Response Propensity Score Model Categorical Predictors for CombinedSample Respondents by Propensity Score Quintiles
1st Quintile
2nd Quintile
3rd Quintile
4th Quintile
5th Quintile
VISN1
9.8%
18.4%
20.4%
29.2%
22.2%
VISN2
11.1%
20.8%
20.4%
30.6%
17.0%
VISN3
24.6%
26.6%
22.9%
20.4%
5.6%
VISN4
9.5%
17.8%
17.8%
32.5%
22.3%
VISN5
24.9%
24.3%
22.6%
19.0%
9.2%
VISN6
9.0%
18.8%
19.9%
24.0%
28.2%
VISN7
8.4%
17.1%
22.2%
25.8%
26.4%
VISN8
11.1%
22.2%
27.4%
25.4%
13.9%
VISN9
6.7%
13.9%
19.4%
24.6%
35.3%
VISN10
7.5%
17.0%
21.1%
27.6%
26.8%
VISN11
7.7%
15.3%
20.1%
24.1%
32.9%
VISN12
9.3%
16.1%
18.8%
25.0%
30.7%
VISN15
6.4%
12.6%
17.1%
20.3%
43.7%
VISN16
9.5%
16.5%
23.2%
25.1%
25.8%
VISN17
13.3%
20.8%
24.9%
25.6%
15.5%
VISN18
8.8%
14.6%
22.1%
25.8%
28.8%
VISN19
9.6%
17.0%
20.3%
22.1%
30.9%
VISN20
6.3%
13.5%
21.7%
21.2%
37.2%
VISN21
11.4%
20.2%
20.6%
26.9%
20.8%
VISN22
24.2%
24.8%
25.3%
20.4%
5.3%
VISN23
5.4%
9.0%
16.2%
18.2%
51.2%
Priority Group 1
5.9%
13.0%
21.0%
30.3%
29.7%
Priority Group 2
9.8%
18.0%
20.7%
24.1%
27.5%
Priority Group 3
12.5%
18.5%
19.8%
23.7%
25.5%
Priority Group 4
12.9%
26.0%
31.9%
23.6%
5.6%
Priority Group 5
14.8%
18.0%
29.3%
25.9%
12.0%
Priority Group 6
21.0%
22.7%
13.6%
22.0%
20.6%
Priority Group 7
1.6%
7.3%
13.8%
20.0%
57.3%
Priority Group 8
10.0%
18.7%
16.9%
21.6%
33.0%
20.3%
25.2%
28.0%
Predictor
Male (vs. Female)
9.6%
16.9%
continued on next page
Methodological Experiments and Non-response Bias Analysis
Page 48
continued from previous page
Predictor
1st Quintile
2nd Quintile
3rd Quintile
4th Quintile
5th Quintile
Has phone
11.1%
17.4%
21.3%
25.1%
25.1%
Patient (Sep13
Enrollment)
3.5%
11.1%
22.0%
31.1%
32.3%
Hispanic/Latino
24.1%
26.6%
29.4%
16.5%
3.4%
9.4%
18.1%
26.0%
28.7%
17.9%
42.8%
30.4%
20.3%
5.8%
0.6%
Urban
14.8%
21.8%
23.5%
24.9%
14.9%
Rural
5.5%
12.1%
17.7%
24.1%
40.7%
Highly Rural
3.1%
6.2%
16.4%
15.3%
59.0%
44.8%
35.2%
16.2%
3.8%
.
4.8%
13.4%
25.7%
31.1%
25.0%
44.8%
35.1%
17.8%
2.0%
0.2%
7.8%
16.7%
26.1%
27.8%
21.7%
7.8%
21.6%
29.8%
26.4%
14.4%
2.8%
10.7%
21.9%
31.6%
32.9%
3.0%
10.6%
21.8%
30.6%
34.0%
Pre-Enrollee (vs.
Post-Enrollee)
OEF/OIF/OND Yes
(vs. No)
Received long-term
care services,
Institutional
Received long-term
care services, NonInstitutional
Received Inpatient
treatment, MHSA
Received Inpatient
treatment, NonMHSA
Received Outpatient
treatment, MHSA
Received Outpatient
treatment, NonMHSA
Received VHA
pharmacy services
Table 20. Distribution of Non-Response Propensity Score Model Continuous Predictors for CombinedSample Respondents by Propensity Score Quintiles
Predictor
Age
1st Quintile
2nd Quintile
3rd Quintile
4th Quintile
5th Quintile
45.8
58.0
61.7
67.3
74.8
Methodological Experiments and Non-response Bias Analysis
Page 49
File Type | application/pdf |
File Title | Background |
Author | ZuWallack, Randy |
File Modified | 2016-01-25 |
File Created | 2014-11-24 |