Supporting Statement B for
EVALUATION OF NIAID’S HIV VACCINE RESEARCH EDUCATION INITIATIVE
HIGHLY IMPACTED POPULATION SURVEY
(NIAID)
August 4, 2010
Project Officer:
Katharine Kripke, Ph.D.
Assistant Director, Vaccine Research Program
Division of AIDS, NIAID, NIH, DHHS
6700 B Rockledge Drive, Room 5144
Bethesda, MD 20892
Telephone: 301-594-2512
Fax: 301-402-3684
E-mail: kripkek@niaid.nih.gov
Supporting Statement Section B
Table of Contents
B.1. Respondent Universe and Sampling Methods 22
B.2. Procedures for the Collection of Information 22
B.3. Methods to Maximize Response Rates and Deal with Non-Response 22
B.4. Test of Procedures or Methods to Be Undertaken 22
B.5. Individuals Consulted on Statistical Aspects and Individuals Collecting and/or Analyzing Data 22
Collections of Information Employing Statistical Methods
The proposed survey will be conducted with adults living in the United States, with particular focus on specific populations that are highly impacted by HIV/AIDS—African Americans, Hispanic/Latinos, and MSM. A randomized stratified sample of residential addresses will form the basis of a General Population sample; sample augments from African Americans, Hispanic/Latinos, and MSM will be created to improve the precision of the estimates associated with these groups. Data will primarily be collected by means of a telephone or an online survey.
1.1 Respondent Universe and Survey Objectives
The populations of interest for this study are all English or Spanish-speaking adults 18 or older residing in the 50 United States, as well as specific subpopulations known to be highly impacted by HIV/AIDS:
(1) African Americans (AA)
(2) Hispanic/Latinos (H/L)
(3) Men who have sex with men (MSM).
Table B.1-1 provides information from U.S. census estimates on the expected population distribution for African American and Hispanic/Latinos within the U.S. population.1 Estimates of MSM are more difficult to obtain, but the most oft-cited reference is a 1994 book from Michael et al. They place the number of men who have had sex with men since the age of 18 at about 5 percent of the U.S. adult male population.2
| Table B.1-1 2008 Census Estimates of the Adult Population | ||
| 
					 | Estimate | Proportion of General U.S. Population | 
| U.S. Population | 221,419,638 | 100% | 
| African American | 27,836,291 | 12.6% | 
| Hispanic/Latino | 30,851,076 | 13.9% | 
The primary objective for this survey is to provide estimates regarding knowledge, attitudes, and beliefs for the general population and for each of the three highly impacted populations. For the African American and Hispanic/Latino populations, the survey will provide point estimates with a sampling error of +/- 3 percentage points at the 90 percent confidence level.
The study will involve four samples. The first, the General Population Sample, will be a random stratified sample of U.S. households. In order to provide estimates at the level specified, additional cases will be provided through three augment samples: (1) the African American Augment; (2) the Hispanic/Latino Augment; and (3) the MSM Augment. The samples are drawn from two different survey frames, as described below.
1.2 Survey Frames
The survey relies on an Address Based Sampling (ABS) frame for the general population, African American augment, and the Hispanic/Latino augment samples. The general population sample and augment samples for African Americans and Hispanic/Latinos will be drawn from an ABS frame based on the U.S. Postal Services Delivery Sequence File, a list of all residential use has grown considerably in recent years as an alternative to more traditional random digit dial (RDD) approaches that have been in use for the past 30 years. The growing popularity of ABS frames is related to the increasing problems associated with the exclusion of cell phone listings from RDD samples and the avoidance of such problems when employing ABS frames.
The proportion of Americans who rely solely or mostly on a cell phone has been growing steadily over the past decade, while the proportion of persons who regularly use a household landline phone to accept their incoming calls has been declining. Since RDD samples are developed exclusively from the areas codes and telephone exchanges applicable to household-based landline phone numbers, an increasing proportion of households are, therefore, not being reached through this sampling approach.
The National Center for Health Statistics (NCHS) reported that in 2008 “cell phone-only” households missed by RDD landline samples comprise about 18.4 percent of all U.S. households, more than double the level of just 3 years earlier.3 In addition, over the past 2 years, NCHS has also begun tracking another related threat derived from the expanded use of cell phones. This relates to the growing proportion of U.S. households that have both landline telephones and cell phones but that report that they receive all or nearly all of their incoming calls on their cell phones. NCHS estimates that such households, which are referred to as “cell phone-mostly” households, now constitute about 14.4 percent of all U.S. households.
The demographics of the people at risk of being excluded from RDD sample frames shows that they have distinctive characteristics that reduce the representation of certain population subgroups, especially younger adults. Thus, the exclusion of cell phone-only and cell phone-mostly households from RDD landline samples introduces the potential for non-coverage bias into surveys developed exclusively from this frame.
This coverage bias is minimized when developing household samples from an ABS frame. With over 99 percent coverage, one of the most compelling attributes of ABS is that it provides samples with virtually zero selection bias. This is because ABS samples are drawn through a random selection of residential addresses from the U.S. Postal Service’s Delivery Sequence File. ABS samples include nearly all types of households, including those with regular city style address listings, those with P.O. Box listings, as well as those with drop-point listings, which include multiple residential units with a single street address.
Samples of households developed from an ABS frame can be used effectively for surveys like this one, which attempts to employ telephone interviews as one mode of data collection, because they capture each of the various types of telephone households, including landline-only telephone households, cell phone-only households, and cell phone-mostly households. Implementing telephone surveys from an ABS sample frame allows for the use of multiple modes of approach to the household for most households. Persons at addresses with matching landline telephone numbers (estimated at about 60 percent of the sample) can be recruited through both telephone calls and mailings, while persons at addresses without matching telephone numbers must be recruited through mailings alone. Additional details about data collection procedures may be found in Section B2.
Knowledge Networks (KN). To augment MSM cases found in the course of collecting data from the ABS sample, we propose to approach adult gay/bisexual males who are previously identified after enrollment in KN’s KnowledgePanel®. KN provides access to a national online panel of 50,000 U.S. residents, age 18 and older, that is based on a national, probability-based representative sample. The panel size fluctuates slightly because of the addition of new panelists from the ongoing recruitment that compensates for attrition. Individuals voluntarily stay on the panel for an average of 2 years and the panel is replenished by ongoing recruitment based on quarterly samples. Up until about one year ago, the panel’s recruitment sample was drawn using a national RDD frame. That recruitment is now based on an ABS sample frame to resolve coverage issues related to cell phone use as described above. Currently, about one-third of KnowledgePanel members are derived from ABS recruitment. This proportion is growing over time as former RDD-recruited members leave the panel.
By providing laptop computers and Internet access to recruited households who would not otherwise be able to respond online, KN addresses the issue of the so-called “digital divide,” which prevents some 30 percent of U.S. households from taking part in online panels. By recruiting individuals through invitation from a random, representative sample, where each sampled persons has a known statistical probability of selection, KN minimizes the effects of “professional respondents” who self-select into the “opt-in” non-probability-based online panels. The concern is that a sample comprised entirely of persons who are interested in participating in opt-in online surveys will be biased when compared to a representative, random sample of persons, some of whom are not interested and would need to be heavily recruited and some of whom would not participate at all. In an opt-in sample, the number and characteristics of reluctant responders is unknown, and responses are impossible to estimate. In the KN panel, the number and characteristics of non-responders are known and can be investigated and estimated. Demographic information had been collected from all RDD-contacted households whether or not they joined the panel. Ancillary data attached to the ABS addresses provides information to asses non-response bias for households recruited through the mail.
The un-weighted demographic composition of KnowledgePanel compares very well to that of the general population. Weighted KnowledgePanel membership closely tracks a wide range of benchmarks from the Current Population Survey (CPS), published jointly by the U.S. Census Bureau and the Bureau of Labor Statistics. Attachment E, comparing KnowledgePanel adult members to the December 2008 Current Population Survey (CPS), shows this close approximation with CPS benchmarks.
The use of samples from KnowledgePanel for studies of public and health policy is not new. Founded in 1998, KN conducts a wide range of research spanning the fields of public policy, health policy and services, epidemiology, environmental protection, political science, sociology, and social psychology. Researchers in these and other fields have conducted text-based and multimedia surveys using the Web-enabled KnowledgePanel because it is based on a probability sample of U.S. households designed to be representative of the U.S. population. KN customers include a wide range of academic researchers from institutions such as Stanford University, Duke University, Harvard University, and New York University, as well as government researchers at the FDA, NOAA, EPA, USDA, CDC, and other Federal agencies.
Recruitment rates for KnowledgePanel are in the 14–20 percent range (depending on recruitment mode). Specific survey completion rates for recent studies with a minimal field period and one e-mail follow-up reminder range from 60 to 80 percent. Thus, overall panel response rates (not to be confused with cross-sectional survey, non-panel response rates) have been in the 8–16 percent range. We expect to improve upon past survey response rates through extending the field period, sending a second reminder e-mail, and also implementing two reminder telephone calls. (See Section B.3. Methods to Maximize Response Rates and Deal with Non-Response.)
In addition, non-response analysis will be conducted at the end of the data collection period. Substantial information about survey non-responders is available from KN so no additional data collection will be required from non-responders. This is because KN maintains a broad descriptive “profile” for each KnowledgePanel member. This profile is annually updated and covers a large number of topics from demographic data to health behaviors. Sampled MSM panel members who do not respond to the survey can be compared to MSM respondents using these data.
Alternative methods of sampling MSM cases in the general population using either RDD or ABS methods were considered, but are not recommended given multiple drawbacks, as follows. Given that the incidence of MSM in the general population is only about 5 percent,4 there is no method for selecting a national sample of MSM that is bias-free and that is cost-effective. One option, screening the general population for sexual orientation by mail or telephone, is problematic because information provided on such short screening instruments is likely to be biased. Furthermore, screening the entire general population is cost-prohibitive. Of course, the most expensive and inefficient option is collecting data from all randomly sampled U.S. households and then discarding more than 90 percent of all the responses who are not from MSM. We do not recommend that strategy for obvious reasons.
An alternative used in the past is to sample MSM from enclaves that have a high concentration of gays and bisexuals. Most large cities have such enclaves where the incidence of the MSM population rises into the 20–30 percent range. However, such enclaves are very small islands in the large ocean of America; hence, only a small portion of MSM live within them. Thus, the men in these are areas are likely to be different from MSM residing outside of the enclaves. We think this is particularly true with regard to attitudes about HIV/AIDS, given that the enclaves were devastated by the epidemic in the 1980s and 1990s. Thus, enclave sampling of MSMs will not meet the needs of the NHVREI program and will not be used in this survey. In contrast, note that the MSM in the KN panel are geographically diverse, as shown in Attachment F, and seem more likely to provide the data needed by NIAID.
A disproportionate sampling alternative, where all parts of the country are sampled but the enclaves are over-sampled, is also problematic since the cost of screening outside the enclaves is very high and the cases from the enclaves have to be weighted down considerably in order to keep design effects within reasonable bounds.
This section discusses sampling procedures for each of the four samples, as well as data collection procedures.
2.1 Sampling Procedures
As indicated earlier, the full target population for this study consists of adults ages 18 and older living in the 50 United States. The U.S. Postal Services Delivery Sequence File will be used as the Address-Based Sampling frame. The key advantage to this frame is that it allows sampling of almost all U.S. households—an estimated 99 percent of U.S. households are covered.
Data will be collected in two waves. In the first wave, we will release approximately two-thirds of the sample we anticipate needing for the entire study. After we have been in the field about 10 weeks, we will evaluate response rates and target sample sizes. The remaining sample, just enough to complete the survey, will be released soon thereafter for each stratum and each sample.
The African American and Hispanic/Latino population estimates will require an effective sample size of 750 cases to provide a point estimate of +/- 3 percent with a confidence interval of 90 percent. However, weighting often increases variances of survey estimates. The inflation due to weighting, which is commonly referred to as design effect, can be approximated by:
 
where Wi represents the final weight of the ith respondent. Consequently, the effective sample size will decrease as the variability in applied weights increases. While the resulting design effect for a particular survey estimate will not be known until the final survey weights have been computed, effective stratification and mindful oversampling will help maintain a control on this effect. It is our expectation that, for this study, the overall design effect for most survey estimates will be less than 2, and the target sample sizes listed below are based on this assumption.
| Table B.2-1 Summary of Sampling Strategies for the Highly Impacted Population Survey 
			 | ||
| Sample | Sample Frame | Target Number of Completed Surveys | 
| General U.S. Population | Stratified random sample from ABS | 1,000 | 
| African American Augment | ABS sample from neighborhoods with high AA density (yields cases for both the AA and Hispanic/Latino estimate) | 801* | 
| Hispanic/Latino Augment | ABS sample from neighborhoods with high Hispanic density (yields cases for both the AA and Hispanic/Latino estimate) | 801* | 
| Gay/Bisexual Men Augment | KnowledgePanel (Knowledge Networks) | 450* | 
* Note that all three augment samples contain African American, Hispanic/Latino, and MSM cases.
In the following sections we describe procedures for selecting respondents. Different strategies are employed to create the four different samples: (1) General Population; (2) African American Augment (3) Hispanic/Latino Augment, and (4) MSM Augment. The strategies are summarized in Table B.2-1.
2.1.1 General U.S. Population.
We propose surveying 1,000 adults age 18 or older living in the United States in English and Spanish as part of the general adult cross-section sample. As a first step, a random stratified sample of U.S. household addresses will be drawn from an ABS sample frame. The sample will be stratified by Census Region. This sample will then be matched against available telephone directory and commercial telephone-matching services to identify households for which a telephone number can be found. We estimate that about 60 percent of all address listings nationwide will likely yield a telephone match, while the remainder will not. The procedures for selection and for collecting data will differ according to whether the telephone number has been matched.
Matched sample. Potential respondents from each household with a telephone number will have an opportunity to participate through a telephone interview or an online survey. Sampling procedures differ slightly between these two modes.
Consistent with established RDD procedures for randomly selecting persons within a household, if a person who answers the telephone reports that more than one adult resides in the household, the person who is the target of the survey will be randomly selected. We will employ a method that minimizes the use of intrusive questions that can negatively affect survey participation. The procedure begins by asking the individual how many adults reside within the household and takes advantage of the fact that approximately three-quarters of all households have only one or two adults in residence. In households where only one adult resides, no respondent selection procedure is required, and interviews are attempted with that adult. In households where two adults were found to reside, the computer-assisted telephone interviewing (CATI) software will be programmed to randomly select either the initial adult contacted or the household’s other adult. In households where three or more adults reside, the CATI algorithm determines whether the adult being screened should be the randomly selected respondent after first giving each adult an equal chance of being selected. If that adult is not selected, then the “most recent birthday” method is applied to identify which of the other household adults should be the selected respondent. Once an adult is selected, repeated attempts are made to reach and complete an interview with that selected individual.
The a priori random selection procedure within households is implemented only for the outgoing telephone interviews, which are expected to comprise the majority of completes. For other data collection procedures, we propose to implement a post hoc procedure to investigate bias related to lack of randomization.
Once a household is selected by random sample, persons within both matched and unmatched samples who complete surveys online will not be randomly selected because of feasibility concerns and a decrease in response rate that is likely to occur. For example, it would be difficult to set up an online system to ensure that someone different from the initial online respondent signs on and completes the survey if the initial respondent were not selected. We intend to collect information from everyone willing to respond on line, including information on household composition. Using the same randomization algorithm implemented for the matched telephone survey, each online respondent will be assigned a flag indicating whether they would or would not have been randomly selected. Post hoc analyses will be conducted to determine whether cases that are not in the randomized sample would introduce bias. If bias were shown to exist, study estimates would be limited to respondents flagged as randomly selected. Because we expect that bias will be negligible and that instituting an online randomization strategy could compromise the validity of the data and result in the loss of data from all households where the initial respondent is not selected, we prefer the post hoc analysis strategy.
Unmatched Sample. Households for which a matching telephone number cannot be found will be sent multiple recruitment mailings printed in both English and Spanish. Persons from the unmatched sample will respond to the survey online or will set up an appointment for a telephone interview. Both English and Spanish versions will be available. We do not propose to make use of random household selection among the unmatched sample. We do expect to implement the post hoc analysis scheme for identifying bias discussed in the previous paragraph.
2.1.2 African American Augment Sample
The African American augment sample is one of three sources of cases for African American estimates. In order to obtain precision of three percentage points and a confidence interval of 90 percent, an effective sample size5 of about 750 African Americans after statistical weighting is needed, assuming no design effect. The three sources of African American respondents are:
African American adults identified and surveyed as part of the overall general U.S. sample,
An augment sample of African Americans identified from a sample of households in areas with dense African American population; and
African American adults identified from the augment sample of households in high-density Hispanic/Latino areas.
Since African American adults now comprise about 12 percent of the U.S. adult population, we expect that our random sample of 1,000 U.S. adults will retrieve interviews with approximately this proportion, yielding slightly more than 100 African Americans nationwide.6
We will draw an augment sample of African Americans from an ABS sample of U.S. households that is limited to census blocks units in which 40 percent or more of the population is black or African American. It is estimated that over 55 percent of the African American population live in communities that are more than 40 percent African American. We plan to complete an additional 801 interviews with African Americans nationwide from this augment sample.
African American respondents will also be obtained from a Hispanic/Latino augment sample. Based on population data, we expect that out of 900 interviews completed from the high-density Hispanic/Latino sample, 99 interviews will be completed with African American adults. Persons other than those with an Hispanic/Latino or African American background will be screened out.
This will bring the total number of interviews completed among African Americans in the two augment samples to about 900. However, because the augment samples are not national in scope, some post-survey statistical weighting will be required before combining these cases with the other cases in the general population sample. This statistical adjustment will have the effect of reducing slightly the augment samples’ effective sample size. After such weighting we estimate that the African American respondents from both augment samples when combined with the other 100 African American interviews from the general U.S sample will yield survey results with an effective sample size large enough to meet the desired +/- 3 percentage point margin of error at the 90 percent confidence level with an anticipated design effect less than 2.0.
The methods used to develop the African American augment sample will be similar to those developed for the U.S. general population sample. After sampling from an ABS frame of selected census blocks, address listings will first be matched against available telephone directory listings. The same protocol of mailings for matched and unmatched cases used for the general population will be used for the augment sample. Matched households receiving an initial letter who do not respond to the survey online will be called by telephone whenever a match is found. Households without a matching telephone number will be mailed multiple letters printed in English and Spanish instructing recipients that the survey can be completed either online or by telephone.
Because the goal of the augment sample is to select only eligible African American or Hispanic/Latino adults, household spokespersons will be asked several screener questions before initiation of the extended survey. Those calling in by telephone or who attempt to complete the survey online will be asked their racial background and ethnicity, and the survey will continue with those identifying themselves as black, African American, or Hispanic/Latino. Those mailing back a response card to set an appointment for a call will also be asked to indicate their racial and ethnic background on the response card (along with several other demographic questions), with callbacks only attempted with those who indicated that they were black, African American, or Hispanic/Latino.
2.1.3 Hispanic/Latino Augment Sample
The Hispanic/Latino augment sample is one of three sources of cases for Hispanic/Latino estimates. To obtain a precision of three percentage points and a confidence interval of 90 percent, we estimate that an effective sample size of about 750 Hispanic/Latinos is needed, assuming no design effect. We propose to develop the Hispanic/Latino sample from three sources:
Hispanic/Latino adults identified and surveyed as part of the overall general U.S. sample;
Hispanic/Latino adults identified from an augment sample of households in high-density African American areas, and
An augment sample of Hispanic/Latinos identified from a sample of households in areas with dense Hispanic/Latino population.
The expected number of completed interviews that would be captured from each of these sample sources is outlined below.
Since Hispanic/Latinos comprise about 14 percent of the U.S. adult population, we expect that the adult public sample will retrieve approximately this same proportion, thereby yielding a random sample of about 140 Hispanic/Latino adults nationwide.
Additional Hispanic/Latino cases will be obtained from the African American augment sample. According to population data, the proportion of Hispanic/Latino adults living in high-density African American areas (i.e., 40 percent or more African American) is slightly lower than the national average. Again, we will only interview African American, Black, and Hispanic/Latinos from the augment samples. We estimate that out of 900 interviews completed from the African American augment, 99 Latino adults will be reached and interviewed in addition to the 801 African Americans.
In addition, we will draw a second ABS augment sample of U.S. households, this one from census blocks in which at least 40% of the population is Hispanic/Latino. We expect to complete an additional 801 interviews with Hispanic/Latino adults from this sample, so that the total number of interviews conducted with Hispanic/Latino adults from the augment samples will be about 900.
Because cases from the Hispanic/Latino augment sample are not randomly distributed across the U.S., some post-survey statistical weighting will be required before combining cases in a single analytic file. After such weighting, we estimate that the sample of 900 interviews from the augment sample and 140 from the general U.S. sample will produce survey results with an effective sample size large enough to meet the desired +/- 3 percentage point margin of error at the 90 percent confidence level with an anticipated design effect less than 2.0.7
The methods used to develop random samples of Hispanic/Latino adults from the Hispanic/Latino sample augment will be similar to those described for the U.S. adult sample and the African American sample augment. After being sampled from an ABS frame of high density Hispanic/Latino census blocks, address listings will first be matched against available telephone directory listings. The same protocol of mailings for matched and unmatched cases used for the general population will be used for the augment sample. Households will be called by telephone whenever a match is found.
Because the primary goal of the augment sample is to find only eligible Hispanic/Latinos, respondents in matched households will be will screened for their racial background and ethnicity, and the survey will only continue with those identifying themselves as Hispanic/Latin or African American. Potential respondents mailing back a response card will be asked to indicate their racial and ethnic background on the card (along with several other demographic questions), with callbacks only attempted with those who indicated that they are Hispanic/Latino or African American.
2.1.4 MSM Adult Augment Sample
Developing a nationwide sample of adult MSM presents significant challenges. The following is a description of how we propose to implement this portion of the survey.
First, we estimate that about one-half of interviews completed with the adult public sample survey of 1,000 will be conducted with men, and that about 5 percent of these males will report (in the survey) having had sex with other men. This would yield a sample of about 25 adult MSM. Using the same logic, we expect approximately 23 additional MSM will come from the African American augment and 23 from the Hispanic/Latino augment sample.
To increase the number of cases in the study, we propose to survey adult gay/bisexual males enrolled in KnowledgePanel, a nationally representative, probability-based panel developed by KN through a dual recruitment strategy of RDD and ABS. MSM will be drawn from those panelists who selected the “gay” or “bisexual” response option to “Do you consider yourself to be …”8 Like most general population surveys/panels, KN does not ask about the gender of sex partners and, hence, cannot pre-identify MSM who do not self-identify as gay or bisexual. This issue is discussed in some detail below.
KnowledgePanel contains 658 self-identified gay/bisexual men; from this total, we expect to obtain about 450 completed interviews for the study. Like the other augment samples, cases from the Knowledge Panel augment will be weighted before combining them with cases in the other samples (i.e., general population, African American augment, and Hispanic/Latino augment). Further information about this weighting process may be found in Section B2.3. Altogether, we expect to obtain about 521 cases, with an effective sample size that is expected to produce estimates with a margin of error no larger than +/- 5.1% with 90% confidence.
2.1.5 Summary of Sampling Plan
Table B.2-2 displays a summary of the sampling plan for the HIP survey. The first two columns display the name of the sample and the methods used to derive the sample. The next four columns display the number of cases from each sample that will be used to compute the general population, African American, Hispanic/Latino, and MSM estimates. The total number of completed surveys is expected to be 3,250. Adding column totals across the bottom row leads to a higher number, since the same cases contribute to more than one estimate. For example, individuals in the General Population Sample contribute to all three highly impacted population estimates, and the African American and Hispanic/Latino augment samples contribute to the MSM estimate.
Notably, the general population, Hispanic/Latino, and African American samples have been developed in accordance with well-established sampling strategies. Estimates from these sources can be viewed with somewhat more confidence than estimates from the MSM sample based in part on a probability-based, online panel — a newer methodology. As explained above, robust estimates for MSM, a rare group, are more difficult to obtain. The strategy described in this document will, we believe, provide the best available data given reasonable budget considerations and the purpose for which the data are to be used.
| Table B.2-2 Source of Cases for Each of the HIP Estimates 
				 | |||||
| Sample | Method | Number of Completed Cases Contributing to Estimate | |||
| General U.S. Pop | African American | Hispanic/ Latino | MSM | ||
| General Population | All households in ABS frame | 1,000 | 
				100* | 
				140* | 
				25* | 
| African American Augment | High Density AA Augment from ABS frame | 0 | 801 | 99 | 
				23* | 
| Hispanic/ Latino Augment | High Density Hispanic/Latino Augment from ABS frame | 0 | 99 | 801 | 
				23* | 
| Augment MSM | KN panel of Gay/Bisexual Men | 0 | 0 | 0 | 450 | 
| Total | 
				 | 1,000 | 1,000 | 1,040 | 521 | 
* These cases provide information for more than one estimate.
2.2 Initial Sample Size
Sample size and sample yield estimates are shown in Table B.2-3. The estimates are based on various data sources and contractor experience with similar studies and the fact that we are planning substantial follow up with non-responders.
| Table B.2-3 Sample Yields | ||||
| 
				 | # in Sample | # Screened | # Eligible a | # Complete | 
| Matched b | 
				 | 
				 | 
				 | 
				 | 
| General Population | 2,000 | 790 | 750 | 600 | 
| African American Augment | 2,250 | 900 | 675 | 540 (481 AA/59 Hispanic/Latino) | 
| Hispanic Augment | 2,250 | 900 | 675 | 
				540 | 
| Unmatched c | 
				 | 
				 | 
				 | 
				 | 
| General Population | 3,509 | 702 | 667 | 400 | 
| African American Augment | 4,000 | 800 | 600 | 360 (320 AA/ 40 Hispanic/Latino) | 
| Hispanic Augment | 4,000 | 800 | 600 | 
				360
				 | 
| MSM Panels d | 
				 | 
				 | 
				 | 
				 | 
| KnowledgePanel | 643 | 643 | 643 | 450 | 
| Total | 18,652 | 5,535 | 4,610 | 3,250 | 
a. Eligibility assumptions are based on U.S. Census data and information from Marketing Systems Group, our sample vendor.
b. Calculations for matched households assume (1) a 40 percent response rate using AAPOR Response Rate 3; 92) approximately 20 percent of the initial sample will include out-of-scope telephone numbers; (3) an 80 percent cooperation rate among identified eligibles.
c. Calculations for unmatched households assume (1) 20 percent of households are screened, either by returning their response card or going online; (2) 60 percent of eligible respondents that return a response card will complete the survey; (3) about 40 percent will be unreachable in seven attempts.
d. KnowledgePanel MSM are pre-identified from “profile” data in order to be selected for the MSM sample. Estimated completion rates have been provided by Knowledge Networks based on previous experience with their panel surveys.
2.3 Weighting
Virtually all survey data are weighted before they can be used to produce reliable estimates of population parameters. While reflecting the selection probabilities of sampled units, weighting also attempts to compensate for practical limitations of survey sampling, such as differential non-response and under-coverage. Furthermore, by taking advantage of auxiliary information about the target population, weighting can render the sample more representative of the target universe. The weighting process for this survey includes the following major steps:
Calculation of Design Weights will be carried out in the first step to reflect the design-imposed disproportional allocation of the sample. Here, base weights will be calculated as reciprocal of the selection probabilities. This will be necessary because the needed sample will be selected from different strata, with varying selection probabilities in each stratum to increase the efficiency (hit rates) for minority subgroups of interest. Specifically, at this step adjustments will be made to compensate for over-sampling of Hispanic/Latino and African American respondents.
Adjustments for Non-response and Under-coverage is an essential part of survey weight calculations. For this purpose, the design weights will be adjusted for non-response guided by the findings resulting from the planned non-response bias analysis procedures. Subsequently, non-response-adjusted weights will be adjusted one last time so that aggregated final weights would match reported counts for the eligible population with respect to the available demographics. In this step, an iterative proportional fitting (raking) procedure will be used to simultaneously adjust the multiplicity-adjusted design weights to the counts of eligible adults, which will be obtained from the Current Population Survey (CPS) or the American Community Survey (ACS).
Since the needed sample for this study will be selected from two frames, an address-based main frame (CDSF) and a supplementary sub-frame from the Knowledge Networks (KN) nationwide online panel, the final (analysis) weights for the pooled MSM cross-sectional and panel samples will be produced using a special weighting methodology developed by Fahimi (1994)9. The key steps for this methodology are briefly outlined below.
Since the CDSF frame will be used to provide a nationally representative sample, upon computation of the final weights for this cross-sectional sample, the weighted distribution of the MSM respondents will be used to produce unbiased descriptions of the corresponding subset of the US adults.
Non-response-adjusted weights from the MSM panel sample (the panel sample to be independently post-stratified to be representative of the MSM population and adjusted with base weights to address elements of KN’s panel recruitment sample design, recruitment selection probability and recruitment non-response) will be weighted to the above empirical distribution of MSM developed from the cross-sectional sample.
Poststratified weights of MSM from the cross-sectional sample and the panel sample will be normalized with respect to their effective sample sizes.
Using the above weights, the pooled sample will be poststratified to the empirical distribution of MSM obtained from the cross-sectional sample one last time to create the final weights.
In effect, any weighted estimate for MSM (such as a sample mean) produced from the pooled sample will be a composite estimate of the following form:
 
In the above, and
and
 represent
estimates obtained from the cross-sectional and panel samples and 
reflects the optimal composition factor estimated by:
represent
estimates obtained from the cross-sectional and panel samples and 
reflects the optimal composition factor estimated by:
 
In t he above, 
 and
and represent
the associated design effects based on samples of size n1
and n2.
represent
the associated design effects based on samples of size n1
and n2.
Variance Estimation for Weighted Data from Complex Surveys. Survey estimates can only be interpreted properly in light of their associated sampling errors. Since weighting often increases variances of estimates, use of standard variance calculation formulae with weighted data can result in misleading statistical inferences. With weighted data, two general approaches for variance estimation can be distinguished. One is Taylor Series linearization, in which a nonlinear estimator is approximated by a linear one, and then the variance of this linear proxy is estimated using standard variance estimation methods. The second method of variance estimation is replication, in which several estimates of the population parameters under the study are generated from different, yet comparable parts of the original sample. The variability of the resulting estimates is then used to estimate the variance of the parameters of interest using one of several replication techniques, such as Balanced Repeated Replication (BRR) and Jackknife. Given that for this study data analyses will be carried out using the SAS system, the linearization technique will be used for variance estimation via survey procedures of SAS.
2.4 Survey Procedures
The Highly Impacted Population (HIP) survey will be conducted using both telephone and online data collection methods, with an experimental test of a hard copy questionnaire for households with no known telephone number. Data collection procedures for the HIP survey involve the management of three different data collection processes for three different groups: (1) ABS Matched Sample, (2) ABS Unmatched Sample, and (3) KnowledgePanel (KP).
For the ABS Matched Sample, a telephone number is located that is matched to the address. This allows us to make telephone calls directly to the household. For the ABS Unmatched Sample, there is no telephone number available. Data collection for this group will occur online, by telephone if the respondent sends in a card or leaves a telephone message requesting an interview, or by questionnaire if the household is part of an experimental sample. For KnowledgePanel cases, contact with participants is handled through KN panel relations staff, and e-mails or reminder calls are sent to respondents who have self-identified as gay or bisexual. Regardless of group, all of the recruitment materials will convey the information in both English and Spanish, and respondents can choose to complete the survey by telephone or online in English or Spanish as well.
ABS Matched Sample. Households with matched telephone numbers will receive an initial recruitment mailing with a $2 cash incentive, multiple telephone calls, and a postcard reminder. These and other recruitment materials may be found in Attachment D.
The initial recruitment mailing will contain a letter asking the reader to participate in an important survey conducted for NIH. This letter will be printed on NIH letterhead and will carry an NIH insignia on an outer envelope developed in accordance with standards specified by the NIAID communications office. The letter will include a website address, a unique identifier, and a toll-free number to call for additional information. In order to gain attention, the envelope will be oversized and printed in a bright color. Respondents will need to type in the website address and enter the unique identifier to participate in the survey.
Starting seven days after the letter is sent, we will make up to five attempts to reach and screen non-responding households by phone. Additional calls will be made to complete interviews with respondents we have identified as eligible.
A reminder postcard will be sent approximately 2 weeks later to all households that have not yet completed an interview either online or on the telephone phone. A postcard’s advantage is that it does not require the opening of an envelope to get its message across to the potential respondent. The postcards remind respondents that we have been trying to reach them by telephone and again offer the option of completing the survey online.
After a waiting period of a few days to allow some people to complete the survey at the website, we will resume dialing. In total, we will make up to 10 attempts to screen a household and up to an additional 7 attempts to complete an interview with identified eligible respondents.
Telephone numbers that are matched to addresses come from databases of listed phone numbers. Therefore, virtually all telephone numbers for households in the Matched Sample will be landline numbers. Cell-only or -mostly households will be captured in the Unmatched Sample. Initial telephone contact attempts will be made during the afternoon and early evening hours on weekdays and throughout the day on weekends to maximize the chances of including both working and non-working adults. Callbacks will be made at different times and on different days to increase the probability of finding qualified adults available for the interview. As contact efforts unfold, appointments for callbacks will be made for the convenience of all potential study respondents regardless of which part of the sample they come from. Follow up phone calls will be made to alternate phone numbers, i.e. a cell phone, as requested by an individual respondent.
For those telephone numbers where we repeatedly encounter answering machines, we will leave a message on the answering machine after the fourth attempt, explaining the purpose of the call and our desire to include that household or respondent in the survey. A second message will be left after the seventh attempt to the listing. Subsequent messages will be left as appropriate given the call history for the listing. Each message will reference the availability of the toll-free 800 number that potential respondents can call to contact the contractor directly to complete the survey, as well as the website at which the survey can be completed online. The contractor has developed a number of techniques to maximize telephone survey response, including refusal conversion techniques, and these techniques are described in detail in Section B.3.
ABS Unmatched Sample. Households without matched phone numbers will receive the same initial recruitment mailing as the Matched Sample asking respondents to complete the survey online. The letter will include a $2 incentive along with the website and unique identifier.
A reminder postcard will be sent to non-responders about 10 days after the first recruitment mailing is sent. The postcard will again provide the website address and a unique identifier
A third mailing to households that have not yet completed the survey online will be sent two weeks later with a follow up letter. The letter will look similar to the first letter, except that the language will be a bit more urgent, and it will include a request for potential respondents to provide information by telephone. This letter will include a tear-off appointment card at the bottom and a business reply envelope to encourage response among those who are not inclined to complete the survey online. No incentive will be included. The appointment card will ask for the respondent’s name, telephone number, whether the phone number is a cell phone, home phone or business number. It will ask for the preferred days and times for us to contact them. In addition, it will gather data on gender, age and also race/ethnic identification. Once the contractor has received an appointment card, eligible respondents will receive up to seven calls in order to get a completed interview.
At the end of the data collection period, as an experiment, a random sample of 2,500 households who have not yet responded will be drawn from the unmatched sample, and they will receive a mailing with a hard copy questionnaire. The purpose of the experiment is to determine whether response rates for the unmatched subsample of an ABS can be boosted using a hard copy questionnaire.
KnowledgePanel. Sampled panel members will first receive an initial e-mail invitation that is similar to the invitation letter from NIH. The e-mail will include information on how the data are being used, and where to go for further information.
If there is no response, two reminder e-mails will be sent. Finally, 6 weeks after the start of the field period, up to two reminder calls will be made to panel members who have not responded.
Non-response Data Collection. In order to provide information about non-responders, data will be collected from two sources. We will attempt to gather information for use in non-response analysis from anyone in the matched sample who refuses participation on the telephone. In addition at the end of all data collection efforts, we will send a mailing to a sample of 3,000 non-responders in the ABS samples from whom we were unable to collect non-response data over the phone. The mailing will include a $2 non-contingent incentive, along with a prepaid postcard asking the respondent to provide information about gender, age, race, ethnicity, and whether they have been close to anyone with HIV/AIDS. These data will be used to calculate non-response adjustments.
Because extensive demographic data on the non-responders from the KP are available from the Knowledge Network, no non-response data collection is needed from KP members who fail to respond to these various efforts.
For those ABS sampled persons who start the online survey but fail to complete the Web questionnaire, three e-mail reminder followups will be generated to those who provide their e-mail address when they sign into the site. Those going to the site will be encouraged to provide their e-mail address as a way of helping them should they be “unable to complete the survey.”
Attachment D contains all supporting documentation for the procedures described in this section, including the response card, invitation letter, reminder postcard and letter, initial phone call scripts, cold call refusal conversion scripts, and call with concern scripts.
Table B.2-4 summarizes the data collection process for the three groups and describes the use of the different letters, postcards, and scripts.
| Table B.2-4 Data Collection Process with Respondent Contacts | ||
| ABS Matched Sample (Addresses AND Phone Numbers) | ABS Unmatched Sample (Address with NO Phone Numbers) | KnowledgePanel (Vendor manages e-mail contact) | 
| Send Invitation Letters with Online Recruitment | Send Invitation Letters with Online Recruitment | Send Introductory E-mail | 
| Online data collection throughout field period. (see Survey Instrument) | Online data collection throughout field period (see Survey Instrument) | Online data collection throughout field period (see Survey Instrument) 
			 | 
| Start calling sample with known phone number using Screener and Survey Instrument. Voicemail Messages left. | Mail Reminder Postcard to non-respondents. | |
| Mail Reminder Postcard to non-respondents. | Online data collection. | Send Follow-up E-mail | 
| Calls made; Voicemail Messages left, online data collection. | Mail Second Invitation Letter with Response Card to non-respondents. Incoming cards will be screened and entered into CATI. 
			 | 
			 | 
| 
			 | Calls to households returning cards made according to Screener and Survey Instrument. Voicemail Messages left. | Send Follow-up E-mail | 
| Final calls made | Questionnaire mailing to random sample of 2,500 cases. | Two Reminder Calls | 
| END DATA COLLECTION | ||
| NON-RESPONSE ADJUSTMENT MAILING | ANALYSIS OF DEMOGRAPHIC CHARACTERISTICS OF NON-RESPONDERS | |
Though the target response rate for surveys is 80 percent, previous contractor experience with telephone studies indicate that a response rate of about 30 percent is more likely. High response rates minimize selection bias in survey findings, so several procedures will be implemented to maximize the response rate.
Multiple modes will be used to collect data for the study, starting with the online survey, moving to telephone interviews where possible, and ending with hard copy questionnaires for an experimental sample of addresses without a matched telephone number. Recent studies suggest that allowing people to respond using different modes does increase the response rate,10 although these is evidence that allowing choice of modes in the initial contact can nullify these gains. Sequential presentation of modes is recommended.11
Survey response rates are more robust when the research topic is salient to the respondent, when the questionnaire has been designed for maximum ease of administration, when multiple data collection modes are implemented, when the field period is extended, and when the data collection protocol is tailored through a variety of incentives and accommodations to acknowledge respondents’ cooperation and contribution. The presentation of the survey is also important, so that respondents can differentiate it from other mail and research requests.
The invitation letters, cards, and emails will indicate that the study is sponsored by NIAID, a prestigious NIH institute known to be at the forefront of HIV/AIDS research. Association with NIAID is expected to improve response rates. The envelope will be colorful and oversized, so that it stands out from all other correspondence. Letters will be sent on NIH letterhead and will be signed by the NIAID Project officer. Telephone scripts include the NIH name, mention of the incentive, and a place to call.
The initial letter will include a $2 incentive as a gesture of goodwill that will gain the attention of potential respondents. Recent methodological studies with large telephone samples indicate significant improvement in response rates related to small, noncontingent incentives.12,13,14 Payments made after completion of the survey appears to be less effective in improving response rate. 15,16
Mailings will include a website address where the respondent can complete the survey online. Online administration of the survey is expected to greatly increase the ease of data collection for persons who are computer literate.
It has been shown that response rates improve with the number of contacts between researchers and potential respondents.11 Members of the matched sample will be provided with multiple opportunities to be interviewed by telephone after initial non-response to the online survey. Potential respondents will be called on multiple occasions, and will be able to call in for an interview, or return a card indicating convenient time for an interviewer to call. In addition, messages left on answering machines will provide a reminder about the online survey.
Households in the unmatched sample will be sent multiple mailings requesting that they respond online or that they call in for an interview. A random sample of unmatched non-responders will be sent hard copy questionnaires to test whether this method increases response rate.
In the matched sample, follow up calls will be synchronized with reminder mailings to maximize response rates. Calls will be made at different times and on different days to increase the probability of finding qualified adults to complete the survey, and where possible, appointments will be made at dates and times specified by the respondent to maximize respondent convenience and cooperation. Telephone messages that include the website of the ABS sample’s online survey mode will be left for non-responders.
A random sample of non-responders within the unmatched sample will be sent a questionnaire in order to test whether inclusion of another mode improves response rates among these reluctant responders.
Reminder emails will be sent to those who start the online survey but fail to complete it. Three e-mail reminder followups will be generated to those who provide their e-mail address when they sign into the site. Those going to the site will be encouraged to provide their e-mail address as a way of helping them “should they be unable to complete the survey.”
Refusal conversion for all matched cases will be conducted. Refusal conversion for telephone survey has become a critical component of successful survey efforts. There has been a significant decline in response rates when conducting virtually all telephone surveys among the general public in the United States over the past decade. The following are procedures that we will employ to increase the rate of response on both the telephone and online surveys.
Most telephone refusals occur at the onset of the interview attempt. Although some are not preventable, our experience is that a certain proportion can be persuaded to cooperate. Training will be provided on how to adjust a refusal script according to the mood of the person making the refusal. For example, those who appear to be initially uninterested will be approached differently from those who are diffident and lacking confidence. This procedure, coupled with continued training on how to make initial contact and the emphasis on the importance of minimizing refusals, has proven effective in holding down the rate of initial refusals.
When refusals are encountered, procedures will be established for interviewers to record their impressions of the respondent’s reason for refusing and any other information that may be relevant in helping to gain a completed interview in a subsequent attempt, including the name of the interviewer who obtained the refusal. Initial refusals other than those adamant about not being called again (“hard refusals”) will be called again. Under this approach, even though the household resulted in an initial refusal, a second “cold call” attempt will be made to complete the interview as if the initial refusal had never occurred. Such calls are typically made at different times of day and on different days of the week than when the initial refusal occurred. It has been our experience that this form of “cold call” refusal conversion is successful in converting about 10 percent of all initial refusers.
For those households that continue to refuse, or for those where a respondent began the survey but broke off (but again excluding “hard refusals” adamant about not being called again), a specially trained team of refusal conversion interviewers would approach these households a third time, using a “call with concern” procedure. In these calls, interviewers consult previous details about the prior refusals to provide them with information that might be useful in helping them to convert refusals. If all attempts fail, refusal conversion interviewers will attempt to gather information on year of birth, gender, race, ethnicity, and whether someone close to them has been infected with HIV/AIDS. This information will be used for non-response analysis.
A number of techniques will be utilized to maximize survey response among non-English speakers in the telephone survey. One procedure is to make available the Spanish language version to interviewers at the onset of data collection. Our subcontractor’s (Field Research Corporation) CATI system is designed to enable an interviewer to seamlessly switch between the English and Spanish language versions of the questionnaire during the call, enabling all bilingual interviewers to make the initial household approach in either language. Thus, by employing a large number of the bilingual interviewers fluent in both English and Spanish, many of these initial contacts can be handled without callbacks and can be converted immediately into completed interviews during the initial call.
Another procedure that has proven to have a positive impact on improving response rates when calling non-English language households involves the management of the sample. The sample management protocols assign in-language callbacks to interviewers fluent in each language, enabling the prompt scheduling of callbacks to households that require a non-English language interviewer. And since all interviews in all languages are conducted in-house from the survey data collection subcontractor’s (Field Research Corporation) own central location interviewing call centers, we are able to maintain maximum control over the management of the non-English samples and their assignment to appropriate interviewers.
Consistent with the response rate calculations approved by the American Association for Public Opinion Research (AAPOR), response rates for this study will be calculated as follows:
Number of Completed Surveys
(Number of Completed Surveys + Number of Non-respondents)
When constructing the survey instrument, items used previously in other surveys by other NIH Institutes and Centers or organizations were carefully evaluated for inclusion. The survey instrument was tested with cognitive interviews with nine respondents who are similar to the ones that will provide respondents for the survey. In response to their comments, questions were revised, dropped, or combined; response categories were added to several items; and several small wording changes were made.
A pre-test of the online program and procedures was conducted with nine individuals to verify procedures and the fidelity of the instrument.
In order to test whether hard copy questionnaire mailings might significantly improve response rates for the unmatched sample, questionnaires will be mailed to a random sample of 2,500 non-responders in that group near the end of the data collection process.
The contractor analyzing information for the NHVREI will be NOVA Research Company (NOVA). NOVA will subcontract data collection to Field Research. Responsibility for collecting and analyzing information obtained through the methodologies described above will rest with NOVA. All data collection and analysis will be performed in compliance with OMB, Privacy Act, and Protection of Human Subjects requirements.
1 AA, H/L, and Gen Pop data accessed on August 3, 2009, at the U.S. Census http://www.census.gov/popest/national/asrh/.
2 Michael, R., et al., Sex in America: A definitive survey 1994, Boston: Little, Brown.
3 Blumberg, S.J. and J.V. Luke (2009) Wireless Substitution: Early Release of Estimates From the National Health Interview Survey, July-December 2008. http://www.cdc.gov/nchs/data/nhis/earlyrelease/wireless200905.htm. Accessed August 2, 2009
4General population surveys with questions on sexual orientation often find a larger proportion, especially if they make use of T-ACASI interviewing methods, but the contractor has been involved with all the other stand-alone LGBT surveys that involve screening for sexual orientation, and, in this survey setting, the proportion is a smaller one.
5The effective sample size is the sample size after weighting.
6We are assuming that African Americans will respond at a lower rate than other respondents.
7Our two augment sample frames will be constructed to be mutually exclusive of one another. This way, proper selection probabilities will be available for each record, and all selection biases will be completely eliminated.
8The full set of response options to that question consist of the following: heterosexual or straight, gay, lesbian, bisexual, or other (please specify).
9 Fahimi, M. (1994). “Post-stratification of Pooled Survey Data.” Proceedings of the American Statistical Association, Survey Research Methods Section, Toronto, Canada.
	
10 Link, MW Mokdad, A. Are web and mail modes feasible options for the Behavioral Risk Factor Surveillance System? in Eighth Conference on Health Survey Research Methods. Cohen SB, Lepkowski JM, eds. Hyattsville, MD: National Center for Health Statistics. 2004.
11 Dillman, D.A., Internet, Mail, and Mixed Mode Surveys: The Tailored Design Approach. Third ed. 2009, Hoboken, NJ: John Wiley and Sons.
12 California Health Interview Survey. CHIS 2007 Methodology Series: Report 4 – Response Rates. Los Angeles, CA: UCLA Center for Health Policy Research, 2009. Accessed at http://www.chis.ucla.edu/pdf/CHIS2007_method4.pdf on April 1, 2010.
13 Health Information National Trends Survey (HINTS) 2007 FINAL REPORT. Report submitted to National Cancer Institute. Rockville, MD: Westat. Accessed at http://hints.cancer.gov/docs/HINTS2007FinalReport.pdf on April 1, 2010.
14 Brick, JM.; Hagedorn, MC; Montaquila, J; Roth, SB, & Chapman, C. Impact of Monetary Incentives and Mailing Procedures: An Experiment in a Federally Sponsored Telephone Survey (NCES 2006-066). 2006. U.S. Department of Education. Washington, DC: National Center for Education Statistics. Accessed at http://www.eric.ed.gov:80/ERICDocs/data/ericdocs2sql/content_storage_01/0000019b/80/29/df/37.pdf on March 30, 2010.
15Health Information National Trends Survey (HINTS) 2005 FINAL REPORT. Report submitted to National Cancer Institute. Rockville, MD: Westat. Accessed at http://hints.cancer.gov/docs/HINTS_2005_Final_Report.pdf on March 30, 2010.
16 Carlson, BL CyBulski, K & Barson, T. Which Incentives Work Best for Respondents in Today’s RDD Surveys? Presented at the Joint Statistical Meeting, 2008, Denver Colorado. Accessed at http://www.amstat.org/sections/srms/proceedings/y2008/Files/301131.pdf on March 30, 2010.
	 
		
	
| File Type | application/msword | 
| File Title | Supporting Statement for | 
| Author | CMcLeod | 
| Last Modified By | Nolen Morton | 
| File Modified | 2010-08-06 | 
| File Created | 2010-08-04 |