Sample Weighting

Attachment D Sampling_Weighting.docx

Federal Statistical System Public Opinion Survey

Sample Weighting

OMB: 0607-0969

Document [docx]
Download: docx | pdf

Attachment D:

Sample Design and Weighting Procedures

The sample for the Federal Statistical System (FSS) Public Opinion survey conducted by Gallup for the Census Bureau is generated by sub-sampling the sample for the Gallup/Healthways’ Nightly 1,000 (also known as the G1K) survey. The G1K survey is a nationally representative study assessing the well-being of adults ages 18+ and completes roughly 1,000 interviews a night, 350 nights per year. In addition to items focused on emotional and physical wellbeing, the study can also serve as an omnibus for asking additional questions of interest. For the FSS Public Opinion survey, the Census Bureau added relevant questions to a subsample of roughly 200 interviews per night to better understand Americans’ knowledge of and attitudes toward the FSS as a means to improve the agency’s overall survey response rates and data quality.

The sample for the G1K survey employs a two-stage procedure to achieve a random, representative sample of adults, 18 years of age or older. The design used by Gallup is a dual-frame design consisting of (i) a sample of listed landline numbers and (ii) a sample of cell phone numbers drawn from the telephone exchanges (dedicated exchanges) that are set aside for cellular providers. The sample for the G1K survey is obtained from Survey Sampling Inc. (SSI).

The sample of listed landline numbers is a simple random sample drawn from the universe of all listed landline telephone numbers at the national (50 states plus DC) level. The cell phone sample is also drawn as a simple random sample from all dedicated exchanges at the national level. The sample allocation for both samples (landline and cell) is therefore expected to be proportional across geographic regions (across the four census regions, for example). The landline numbers in the daily G1K sample roughly accounts for about 60 percent of the total daily sample while the rest (about 40 percent) are cell phone numbers. For sample release, random subsamples (replicate samples) are formed and released sequentially based on the progress of interviewing. The goal is to release an optimum amount of sample each time to achieve the maximum possible response rate while completing the targeted number of interviews on a daily basis.

It may be noted that Gallup is using only listed landline samples, excluding unlisted landlines from the landline sampling frame for the G1K survey. Thomas M. Guterbock et.al. (Social Science Research 40 (2011) 860-872: “Who needs RDD? Combining directory listings with cell phone exchanges for an alternative telephone sampling frame”) examines the feasibility of combining the EWP (Electronic white pages) sample with cell phone RDD (Random Digit Dialing), eliminating the ordinary RDD component from the sampling frame. This method fails to cover only one segment of the telephone population: unlisted landline households that have no cell phone. They analyzed data from the 2006 National Health Interview Study to estimate the size of this segment and its demographic profile. Trend data from the NHIS were used to assess how these biases were changing. They found that the proposed alternative “Listed + cell” sampling frame provided relatively small bias compared to “RDD + cell” and the portion of the telephone universe that was excluded in the “listed + cell” design was getting smaller all the time, therefore its bias relative to the “RDD + cell” design was decreasing over time. Overall, the “listed + cell” design turned out to be a useful alternative. Based on these findings and data resulting from Gallup’s internal studies, the “listed + cell” design was considered optimal for G1K sampling to increase overall sampling efficiency and implemented in late 2011.


The second stage of selection (within household selection) occurs at the household level for those selected from the landline frame. Those selected from the cell phone frame are presumed to be the only user of the cell phone and hence no within household selection is attempted for those respondents. Once a telephone number is selected for inclusion, one person age 18 or older living in that household is randomly selected to participate. Within household selection is made using the most recent birthday method, to produce a random selection of household members, and is considered much less intrusive than the Kish grid selection that requires enumeration of all household members in order to make a respondent selection. Once a person is selected for inclusion in the study,no substitution of the respondent is allowed within the household. .

Finally, about 20 percent (1 out of 5) of the G1K sample are chosen at random for the Census FSS study based on a random selection procedure. This randomization is carried out prior to the release of the G1K sample and cases selected for the Census study are appropriately marked. During the data collection phase, the cases sampled for the Census study receive the questions for the FSS Public Opinion study. Following this scheme, roughly 200 surveys are expected to be completed daily for the census study.

Weighting of Sample data

Sampling weights are generated for the G1K survey to minimize bias in sample based estimates. The weights are necessary to correct for unequal selection probability and also to adjust for the effects of nonresponse and under-coverage. Sampling weights are attached to each completed survey record and the final weight assigned to any case is the product of the weights generated at several stages of the weighting process.


Gallup computes selection weights for the Gallup Daily tracking (G1K) to compensate for disproportionalities in probabilities of selection based on household size (because Gallup only interviews one adult per landline household) and telephone status (landline only, cell phone only, and dual users that are either cell only or cell mostly by listed/unlisted landline status).


  1. The weighting factor assigned to correct for within household sampling is equal to the number of adult members living in the sampled household. In order to avoid extreme weights, this number is truncated at 3.


  1. At the next step, Gallup uses the latest available estimates from the National Health Interview Survey (NHIS) conducted by the National Center for Health Statistics to adjust the target proportions by telephone status (landline only, cell only, and four categories of dual users – cell phone mostly and not cell mostly by listed/unlisted landline status). Gallup uses internal data from list-assisted landline surveys to determine the target proportions for dual user categories by listed/unlisted landline status.



  1. Next, Gallup computes post-stratification weights based on targets from the 2011 Current Population Survey (CPS) Annual Social and Economic Supplement. An iterative proportional fitting (i.e., raking) algorithm is used to carry-out the post-stratification adjustments so that the final weighted data match national targets for the following cross-classified cells:


  1. Census region by age category (18-29, 30-49, 50-64, 65+) by gender

  2. Education (less than high school, high school graduate, some college, college graduate) by age category

  3. Race (black/non-black) by gender

  4. Ethnicity (Hispanic/non-Hispanic) by gender

Once the raking algorithm converges, a trimming rule is applied. A maximum weight of 3 and a minimum weight of 0.25 are the cutoffs for trimming, which generally fall between 5% and 10% of each tail of the distribution of weights. After the weights are finalized for all completed surveys, the sum of weights across all completed G1K surveys is normalized to match the total number of completed surveys (roughly about 1000 per day). Finally, the 200 or so nightly interviews marked for the FSS study conducted for the Census Bureau simply inherit the weights that get assigned to these cases as a result of the overall G1K survey weighting process described above.

References:

Thomas M. Guterbock, Abdoulaye, Diop,James, M. Ellis, John Lee Holmes, Kien Trung Le, 2011. “Who needs RDD? Combining directory listings with cell phone exchanges for an alternative telephone sampling frame”: Social Science Research 40 (2011) 860-872.

File Typeapplication/vnd.openxmlformats-officedocument.wordprocessingml.document
AuthorChattopadhyay, Manas
File Modified0000-00-00
File Created2021-01-31

© 2024 OMB.report | Privacy Policy