Consumer Expenditure Surveys
1220-0050
July 2018
Supporting Statement
B. Collection of Information Employing Statistical Methods
1. Sampling Method
The Consumer Expenditure (CE) Survey is a nationwide household survey conducted by the U.S. Bureau of Labor Statistics to find out how Americans spend their money. The CE Survey actually consists of two sub-surveys, a Quarterly Interview survey (CEQ), and a two-week Diary survey (CED). The Interview survey collects detailed expenditure data on large expenditures such as property, automobiles, and major appliances, as well as on recurring expenditures such as rent, utilities, and insurance premiums. By contrast, the Diary survey collects detailed expenditure data on small, frequently purchased items such as food and apparel. The data from the two surveys are then combined to provide a complete picture of consumer expenditures in the United States.
The data for both surveys are collected from a representative sample of households around the country. Both surveys have the same sample design, which is a two-stage sampling process. In the first stage a representative sample of counties from around the United States is selected for the survey. In the second stage a representative sample of households is selected from those counties. This two-stage process is designed to generate a sample of households in which every demographic group and every wealth level is well-represented in the survey. The rest of this section describes these two sampling stages in more detail.
Primary Sampling Units (PSUs)
In the first stage of sampling all 3,143 counties or county equivalents in the United States are partitioned into small clusters called “primary sampling units” (PSUs) from which a representative sample of 91 of them are randomly selected for the survey. The clusters are the “core-based statistical areas” defined by the Office of Management and Budget (OMB), and they range in size from 1 to 29 counties with the average size being 5 counties. The same sample of 91 PSUs is used in both the CEQ and CED surveys. The 91 PSUs fall into three categories:
PSU “size class” |
Number of PSUs |
Description |
S |
23 |
Large Metropolitan Core Based Statistical Areas (self-representing PSUs) |
N |
52 |
Small Metropolitan Core Based Statistical Areas and Micropolitan Core Based Statistical Areas (non-self-representing “urban” PSUs) |
R |
16 |
Non-Core Based Statistical Areas (non-self-representing “rural” PSUs) |
The BLS selected these PSUs from a stratified sample design in which all 23 “S” PSUs were selected for the survey with certainty, while all the non-self-representing PSUs (the N and R PSUs) were stratified into 68 strata using a 4-variable model whose independent variables were latitude, longitude, median household income, and median household property value. Then one PSU was randomly selected from each stratum with its probability of selection being proportional to its population.
One of CE’s major customers is the Consumer Price Index (CPI) which is an urban survey that uses CE’s data for its expenditure weights. All 91 PSUs are used by the CE survey, but only the 75 urban PSUs (the 23 “S” PSUs and the 52 “N” PSUs) are used by the CPI.
Sampling Households Within PSUs
After selecting a sample of PSUs, a sample of households is then selected from the civilian non-institutional portion of their populations. This includes people living in houses, condominiums, and apartments, as well as people living in group quarters such as college dormitories or boarding houses. However, it excludes the non-civilian and institutional portions of the population, such as military personnel living on base, nursing home residents, and prison inmates.
Addresses for the CEQ and CED surveys are selected from two sampling frames maintained by the Census Bureau: the Unit frame and the Group Quarters (GQ) frame. Both frames are derived from the Master Address File (MAF), which is basically a list of all residential addresses identified in the 2010 census and is updated twice per year with information from the U.S. Postal Service. The Unit frame is the larger of the two frames and it contains both existing housing units and newly constructed housing units. It has approximately 99% of the MAF’s civilian non-institutional addresses and is updated twice per year. The GQ frame is also derived from the MAF but it is much smaller; it has the remaining 1% of the MAF’s civilian non-institutional addresses and is updated every three years.
In each PSU, a “systematic sample” of households is selected from the two frames. The first step in the selection process is sorting the households by variables that are correlated with their expenditures. The purpose of this is to ensure that households of every wealth level are well-represented in the sample. In this systematic sampling process the first household in the sample is selected from the sorted list using a random number generator. Then after the initial household is selected every k-th household down the list is selected where “k” is the PSU’s sampling interval. The Unit and GQ frames have different sorting variables, but they have the same sampling interval.
Table 1 below shows how the households are sorted in the Unit frame. It has codes ranging from 10 to 99 with the lower codes being for low-wealth households, and the higher codes being for high-wealth households. For the Unit frame, the sorting or “stratification” variable is created from the number of occupants in each household, their housing tenure (owner/renter), and the market value of their homes (for owners) or the rental value of their apartment or home (for renters). These variables are used because they are correlated with expenditures: households with more people tend to be wealthier than those with fewer people; homeowners tend to be wealthier than renters; and people living in high-price housing units tend to be wealthier than people living in low-price housing units.
All the renters are at one end of the stratification and all the owners are at the other end of the stratification. The renters and owners are further subdivided into quartiles based on monthly rental and property values in order to ensure that households of every wealth level are well represented in the survey. Vacant housing units are put in the middle column for the number of household occupants because although they were vacant at the time of the decennial census, when CE’s field representatives visit them most will be occupied and they could be in any of the four non-zero categories. Thus the middle column is their “expected” location.
Table 1. CE Unit Frame Stratification Code Values
Renter/Owner Quartile |
Number of Occupants |
||||
|
1 person |
2 persons |
Vacant |
3 persons |
4+ persons |
Renters 1st Quartile |
10 |
11 |
12 |
13 |
14 |
Renters 2nd Quartile |
25 |
24 |
23 |
22 |
21 |
Renters 3rd Quartile |
30 |
31 |
32 |
33 |
34 |
Renters 4th Quartile |
45 |
44 |
43 |
42 |
41 |
Owners 1st Quartile |
50 |
51 |
52 |
53 |
54 |
Owners 2nd Quartile |
65 |
64 |
63 |
62 |
61 |
Owners 3rd Quartile |
70 |
71 |
72 |
73 |
74 |
Owners 4th Quartile |
85 |
84 |
83 |
82 |
81 |
Other |
|
|
99 |
|
|
To draw a systematic sample in the Unit frame, the addresses are sorted first by PSU, then by State FIPS code, County FIPS code, the CE stratification variable described above, Census Tract code, Census Block code, Street name, Street number, and MAFID code.
To draw a systematic sample in the GQ frame, the addresses are sorted first by PSU, then by State FIPS code, County FIPS code, Census Tract code, CHPCT (the percent of people in the tract living in college housing), and Census Block code. CHPCT is used because people living in college housing are very different than the rest of the people in the GQ frame, so using it as a stratification variable helps produce a more representative sample.
For more information on the sample design in general, please see the paper by Susan King on “Selecting a Sample of Households for the Consumer Expenditure Survey” (Attachment P); or the paper by Danielle Neiman et. al., “Review of the 2010 Sample Redesign of the Consumer Expenditure Survey” (Attachment V). For more information on the geographic portion of CE’s sample design, please see the memorandum from Jay Ryan to Richard Schwartz on “PSUs for the Consumer Expenditure Survey’s 2010 Census-Based Sample Design,” December 18, 2012 (Attachment T).
Consumer Units
A consumer unit (CU) is the unit from which the CE seeks expenditure reports. It is basically the same thing as a “household,” although there are some technical differences. Technically a CU consists of 1) all members of a housing unit who are related by blood, marriage, adoption, or some other legal arrangement such as foster children; 2) two or more unrelated people living together who pool their incomes to make joint expenditure decisions; 3) a single person sharing a housing unit with unrelated people but who is financially independent of them; or 4) a person living alone.1 Approximately 99 percent of all occupied housing units are occupied by one CU, and there are approximately 130 million CUs in the United States. The following table shows the estimated number of CUs in all 91 strata from which CE’s sample of 91 PSUs was selected.2
Estimated Number of CUs in CE’s 91 Strata
Stratum Code |
Estimated Number of CUs in the Stratum |
S11A |
1,916,829 |
S12A |
8,239,029 |
S12B |
2,511,760 |
S23A |
3,983,681 |
S23B |
1,808,974 |
S24A |
1,410,066 |
S24B |
1,173,786 |
S35A |
2,373,185 |
S35B |
2,343,038 |
S35C |
2,226,023 |
S35D |
1,171,909 |
S35E |
1,141,275 |
S37A |
2,705,813 |
S37B |
2,492,843 |
S48A |
1,765,452 |
S48B |
1,070,955 |
S49A |
5,401,694 |
S49B |
1,825,454 |
S49C |
1,778,910 |
S49D |
1,448,362 |
S49E |
1,303,309 |
S49F |
572,767 |
S49G |
220,279 |
N11B |
2,107,733 |
N11C |
1,782,731 |
N12C |
1,711,973 |
N12D |
1,466,621 |
N12E |
1,652,789 |
N12F |
1,499,951 |
N23C |
1,429,854 |
N23D |
1,371,790 |
N23E |
1,582,553 |
N23F |
1,371,175 |
N23G |
1,652,369 |
N23H |
1,646,840 |
N23I |
1,576,918 |
N23J |
1,443,122 |
N24C |
1,252,236 |
N24D |
1,196,973 |
N24E |
1,384,575 |
N24F |
1,241,240 |
N35F |
1,277,976 |
N35G |
1,112,833 |
N35H |
1,274,905 |
N35I |
1,073,353 |
N35J |
1,302,974 |
N35K |
1,110,367 |
N35L |
1,301,557 |
N35M |
1,081,592 |
N35N |
1,226,603 |
N35O |
1,152,152 |
N35P |
1,305,536 |
N35Q |
1,079,215 |
N36A |
1,065,120 |
N36B |
1,045,744 |
N36C |
1,103,424 |
N36D |
1,179,553 |
N36E |
1,073,872 |
N36F |
1,009,410 |
N37C |
1,025,739 |
N37D |
1,184,416 |
N37E |
1,071,009 |
N37F |
1,029,420 |
N37G |
1,086,768 |
N37H |
1,160,487 |
N37I |
1,103,594 |
N37J |
1,200,835 |
N48C |
1,359,161 |
N48D |
1,568,137 |
N48E |
1,617,161 |
N48F |
1,350,234 |
N49H |
2,193,028 |
N49I |
2,174,208 |
N49J |
1,946,697 |
N49K |
1,837,364 |
R11D |
274,844 |
R12G |
347,740 |
R23K |
676,088 |
R23L |
569,043 |
R24G |
773,937 |
R24H |
651,715 |
R35R |
649,702 |
R35S |
780,518 |
R36G |
660,108 |
R36H |
592,418 |
R37K |
553,860 |
R37L |
668,619 |
R48G |
202,807 |
R48H |
168,146 |
R48I |
188,377 |
R49L |
300,802 |
Total |
130,000,000 |
Response Rates
The table below shows the expected annual sample sizes and response rates for the CEQ and CED surveys in 2019-2021.
Each year the CEQ’s sample will have approximately 48,000 addresses. Of those addresses, 83% are expected to be occupied housing units, and the other 17% are expected to be “Type B/C” noninterviews, which are addresses that are not occupied housing units (they are nonexistent, nonresidential, vacant, demolished, etc.). Of the occupied housing units, 59% are expected to complete an interview, and the other 41% are expected to be “Type A” noninterviews, which are occupied housing units that do not participate in the survey. This is expected to yield approximately 23,508 completed interviews per year.
Similarly, each year the CED’s sample will have approximately 12,000 addresses, of which 83% are expected to be occupied housing units, and the other 17% are expected to be “Type B/C” noninterviews. Of the occupied housing units, 55% are expected to complete their diaries, and the other 45% are expected to be “Type A” noninterviews. This is expected to yield approximately 10,960 (= 5,480 × 2) weekly diaries per year.
Category |
Quarterly Interview |
Diary |
Total Sample Size (addresses) |
48,000 |
12,000 |
|
|
|
Type B and C Noninterviews (vacant, demolished, etc.) |
|
|
Number |
8,160 |
2,040 |
Percent of Total Sample |
17.0 |
17.0 |
|
|
|
Eligible Units (occupied housing units) |
|
|
Number |
39,840 |
9,960 |
Percent of Total Sample |
83.0 |
83.0 |
|
|
|
Type A Noninterviews |
|
|
Number |
16,334 |
4,482 |
Percent of Eligible Units |
41.0 |
45.0 |
|
|
|
Completed Interviews |
|
|
Number |
23,508 |
5,478 |
Percent of Eligible Units (Response Rate) |
59.0 |
55.0 |
The response rates shown above are the CEQ’s and CED’s actual response rates over the past five years (2013-2017) minus 5 percentage points. Response rates have been decreasing over time, so the 5-year historical response rates are reduced by 5 percentage points to account for the downward trend.
Starting in 2015 the CEQ and CED have been drawing their samples of addresses from a new sampling frame called the Master Address File (MAF), which is basically a list of all addresses from the 2010 census, and it is updated twice per year with information from the U.S. Postal Service’s Delivery Sequence File. The MAF is a higher quality sampling frame than the old sampling frames used before 2015, as demonstrated by the fact that the Type B/C rate is now about three percentage points lower than in the old sampling frames (17% vs. 20%).
In 2008 CE staff conducted a nonresponse bias study to determine whether the missing data from nonrespondents generated any bias in the CEQ’s published estimates. Their study was undertaken in response to an OMB directive. Results from four individual studies were synthesized, and they concluded that no bias was generated in spite of the fact that CE’s data are not “missing completely at random (MCAR).” As they said, “the results from these four studies provide a counterexample to the commonly held belief that if a survey’s data are not missing completely at random then its estimates are subject to nonresponse bias.” In other words, CE’s nonresponse weighting adjustments are working well.
For more information on the calculation of response rates, see the memorandum from Sharon Krieger to David Swanson on “Response Rates in the Consumer Expenditure Survey” (2016) (Attachment Q). For more information on the nonresponse bias studies, see “Assessing Nonresponse Bias in the Consumer Expenditure Interview Survey” (Attachment R).
2. Collection Methods
Field representatives from the U.S. Census Bureau, under contract with BLS, personally visit the households in the CEQ’s and CED’s samples to collect the data. Prior to the first household visit, respondents are sent an advanced letter informing them that they have been selected for the survey and asking them for their cooperation. For subsequent household visits in the CEQ survey, respondents are sent an advanced letter reminding them that is has been 3 months since they last participated in the survey and asking for their cooperation again.
Field representatives visit each household in the CEQ’s sample every 3 months for 4 consecutive quarters to collect information on the expenditures they made during the previous 3 months. The field representatives enter the household’s responses into a laptop computer. After participating in the survey for 4 quarters, the household is dropped from the survey and replaced by another household. The households in the CEQ survey are on a rotating schedule with approximately one-fourth of the households in the sample being new to the survey each quarter.
For the CED survey, field representatives visit each household in the sample two times to collect information on the expenditures they make during a 2-week period. On the first visit the field representatives introduce themselves, explain the survey, and leave two weekly diaries, one for each week of the survey period. The household members are asked to record all their expenditures over the 2-week period in those diaries. On the second visit, the field representatives pick up the two diaries and thank the household for participating in the survey. After participating in the survey for two weeks, the household is dropped from the survey and replaced by another household.
After completing the second week of the CED survey and the fourth quarter of the CEQ survey, the households are sent a Thank You letter and a certificate of appreciation for their participation in the survey.
Estimation
The estimation procedure for both the CEQ and CED follow well-established statistical principles. The final weight for each sample CU is the product of its base weight (which is the inverse of the CU’s probability of selection); an adjustment factor to account for noninterviews; and a calibration adjustment factor that post-stratifies the weights to account for population undercoverage. A typical base weight for a CU in the CEQ is approximately 10,000, which means it represents 10,000 CUs – itself plus 9,999 other CUs that were not selected for the survey. A typical final weight is approximately 18,000, which means it represents 18,000 CUs – itself plus 17,999 other CUs that were not selected for the survey and/or did not participate in the survey.
For additional information on CE’s sample design and estimation methodology, please refer to “Chapter 16, Consumer Expenditures and Income” in the BLS Handbook of Methods (Attachment S); Jay Ryan’s memorandum to Richard Schwartz on “PSUs for the Consumer Expenditure Survey’s 2010 Census-Based Sample Design,” December 18, 2012 (Attachment T); and Ruth Ann Killion’s memorandum to Jay Ryan on “Consumer Expenditure Surveys Sample Allocation for Interview Year 2016,” February 11, 2015 (Attachment U).
3. Methods to Maximize Response Rates
Keeping the CEQ’s and CED’s response rates as high as possible requires special efforts, particularly from the Census Bureau’s field staff. Every refusal case is sent a letter trying to persuade it to participate in the survey, and then a program supervisor, supervisory field representative, or senior interviewer is assigned to the case for follow-up “refusal conversion” efforts. Of course refusal conversion efforts take time and cost money, so regional office staff try to decide which cases to work on and how much effort to put into them based on cost-effectiveness considerations.
Special computer processing techniques are also used in the CEQ to reduce respondent burden, which in turn helps keep response rates up. For example, some data collected in one interview are carried forward to subsequent interviews, such as data on household members and their personal characteristics, along with data on their properties, mortgages, vehicles, and insurance policies. Minimizing respondent burden, including interview length, are important features in the effort to keep response rates up.
When field staff still cannot convert noninterviews to interviews, the estimation process has a noninterview adjustment to account for them. As mentioned above, every CU in the sample has a base weight equal to the number of CUs in the population it represents. In this process the respondent CUs have their weights increased to account for the nonrespondent CUs. In particular, the total sample of CUs is partitioned into 192 subsets based on their region, CU size, income, and number of contact attempts.3 Then within each subset the base weights of the respondents are increased by multiplying them by a factor equal to the sum of the base weights for all CUs (both respondents and nonrespondents) divided by the sum of the base weights from just the respondent CUs. This makes the final weights of the respondents add up to the total number of CUs in the population.
4. Testing Plans
Subject to resource availability, CE plans to conduct the following study (prior to the expiration of the clearance). A full package will be submitted for the proposed study should funding and resources become available.
Large-Scale Online Diary Feasibility Test |
Diary (Redesign) |
The purpose of this project is to field- the redesigned online diary test; incorporating lessons learned from the web and individual diaries test, the proof-of-concept test, and cognitive lab studies and additional research. The test will incorporate a large sample size to detect significant differences between the test and production control. Findings from this test will be used in deciding on and planning for implementation of online diaries into production as part of a phased in implementation of the CE redesign plan. |
5. Statistical Contacts
The Census Bureau will collect the data. Within the Census Bureau, you may consult the following individuals regarding their area of expertise for further information.
Sample Design: |
Stephen Ash |
(301) 763-4294 |
Data Collection: |
Jennifer Epps |
(301) 763-5342 |
1 Unrelated people who share a housing unit are considered to be separate CUs if they are responsible for paying their own expenses in at least two of these three categories: shelter, food, and all other expenses. Likewise college students living away from home are considered to be separate CUs from their parents if they are responsible for paying their own expenses in at least two of these three categories.
2 The number of CUs comes from combining information about the total number of housing units in the Census Bureau’s sampling frames (i.e., the MAF) with the observations made by CE’s field representatives of the number of CUs living in those housing units. The average number of CUs per occupied housing unit is approximately 1.015. The number of CUs per stratum shown in the table above comes from allocating the nationwide total of 130 million CUs by each stratum’s proportion of the nationwide population in the 2010 census.
3 There are 4 regions of the country, 4 CU size classes, 3 income classes, and 4 contact attempt classes, making 192 = 4 x 4 x 3 x 4 subsets into which the sample is partitioned. For nonrespondents the number of people in the CU is obtained from data collected in previous interviews or from talking to their neighbors. For all CUs (both respondents and nonrespondents) their income is estimated from a publicly available database from the IRS which has the average household income by zipcode. In the nonresponse adjustment process every CU is assumed to have its zipcode’s average income value.
File Type | application/vnd.openxmlformats-officedocument.wordprocessingml.document |
File Title | Changes in section A |
Author | FRIEDLANDER_M |
File Modified | 0000-00-00 |
File Created | 2021-01-20 |