Download:
pdf |
pdfOctober 2016
The 2018 revision of the Consumer Price Index
geographic sample
The Consumer Price Index (CPI) program updates its
sample of geographic areas on the basis of the most
recent decennial census, to ensure that the sample
accurately reflects shifts in the U.S. population. This
article describes CPI’s latest area-sample redesign, which
will be used with the introduction of 2018 price indexes.
UPDATE: SEPTEMBER 14, 2017
The new area design implementation plan to introduce
new primary sampling units (PSUs) in four waves over
a 4-year period beginning in January 2018 has been
Steven P. Paben
paben.steven@bls.gov
modified as follows:
• The introduction of new PSUs in waves 2–4 has
been delayed by 1 year.
• Three new PSUs in wave 2 will continue to be
Steven P. Paben is the Chief of the Division of
Price Statistical Methods, Office of Prices and
Living Conditions, U.S. Bureau of Labor
Statistics.
“proxied” by a dropping PSU; these three PSUs
William H. Johnson
johnson.bill@bls.gov
will now be imputed for the first 2 years of the
new area design.
In addition, BLS will continue to publish monthly
region-size class indexes for A- and B/C-sized cities in
the four census regions.
William H. Johnson is a mathematical
statistician in the Division of Price Statistical
Methods, Office of Prices and Living Conditions,
U.S. Bureau of Labor Statistics.
John F. Schilp
schilp.john@bls.gov
The CPI sample-design process involves multiple stages.
In the first stage, a sample of geographic areas is
selected. In subsequent stages, a sample of outlets in
which area residents make retail purchases, a sample of
specific retail goods and services, and a sample of
1
John F. Schilp is a mathematical statistician in
the Division of Price Statistical Methods, Office
of Prices and Living Conditions, U.S. Bureau of
Labor Statistics.
U.S. BUREAU OF LABOR STATISTICS
MONTHLY LABOR REVIEW
residential housing units are selected. While these latter samples are rotated on a regular basis, the geographic
sample has traditionally been rotated once every 10 years. The 2018 area revision will mark the geographic
sample’s first rotation since 1998.1
Historically, a new area sample had been selected and implemented after each decennial census. This selection
was done jointly with the Consumer Expenditure Survey (CE) program, from which CPI obtains expenditure
weights for price indexes and selection probabilities for goods and services. Because all new geographic areas
were rotated into the CPI all at once, field collection had to continue in old sampling areas while survey
operations were starting up in new areas. This practice typically caused a spike in field collection costs, which
had to be covered through a special funding initiative. Effective with the 2018 geographic redesign, CPI will
rotate its sample to new geographic areas on a continuous basis, over a period of consecutive years, until all
new areas have been brought into the sample.2 This approach is expected to be more cost effective.
In general, the process used to select the 2018 area sample is similar to the process used in the 1998 area
revision.3 The basic steps in both cases include the following:
• Determine sample classification variables
• Construct primary sampling units (PSUs)
• Determine the number of sampled PSUs
• Determine stratification variables
• Allocate sample4 and assign PSUs to strata
• Select a sample of PSUs
Despite these similarities, the new process has introduced some notable methodological changes within each of
the basic steps. First, the sample classification structure has been changed. The 1998 design classified areas
into four Census regions by two size classes for a total of eight groups; the 2018 design classifies these areas
into nine Census divisions.5 Second, the area definitions of PSUs have been updated to reflect the most recent
Office of Management and Budget’s (OMB) area definitions.6 Third, in the new design, the number of sampled
PSUs in the CPI has been reduced from 87 to 75. Finally, changes were made to the stratification variables and
the sampling process for selecting PSUs. The purpose of this article is to provide a more detailed explanation of
the aforementioned methodological changes in the 2018 area revision and to describe the plan for rotating
PSUs to the new area sample over a 4-year transition period.
Determine sample classification variables
In the CPI, geographic sample variables represent one dimension of the overall index classification structure. In
the current area design, the urban portion of the United States is divided into 38 geographic areas, called index
areas. In addition, the set of all goods and services purchased by consumers is divided into 211 categories,
called item strata. Combining these two dimensions results in 8,018 (38 × 211) item–area combinations, or basic
cells. Resource constraints can limit the size of the sample in each of these basic cells and lead to a small
sample measurement bias.7 Previous research performed by the U.S. Bureau of Labor Statistics (BLS) found
that, because of a deficient sample size in some basic item–area cells, the all-items CPI for All Urban
Consumers (CPI-U) exhibited a finite sample bias of 0.2 to 0.3 percent per year. Since the magnitude of the bias
2
U.S. BUREAU OF LABOR STATISTICS
MONTHLY LABOR REVIEW
is inversely related to sample size, an increase in the number of price quotes per item–area cell would
proportionally reduce the bias in the sample mean for each cell and, in turn, lower the finite sample bias for the
overall index.8 For the new area design, CPI made a conscious effort to partially address this issue by reducing
the number of index areas, thereby increasing the average number of price quotes per basic item–area cell.
In the 1998 sample design, areas were first classified by location, on the basis of one of four Census regions:
Northeast, Midwest, South, and West. Then, each area was classified into one of three population-size classes:
self-representing areas (A-size), medium nonself-representing areas (B-size), and small nonself-representing
areas (C-size).9 In the 2018 sample design, areas were first classified by location, into one of nine Census
divisions: New England, Middle Atlantic, East North Central, West North Central, South Atlantic, East South
Central, West South Central, Mountain, and Pacific. The Census divisions represent a further breakdown of
Census regions. (See figure 1.) In addition, each area was classified into one of two population-size classes—
self-representing or nonself-representing—with the use of the size cutoff described later in the article, in the
section on determining the number of sampled PSUs.
The main impetus for using the nine Census divisions instead of the Census regions and size classes from the
1998 sample design was to create and support indexes that are more locally defined. In order to maintain
approximately the same number of classification groups for nonself-representing areas, CPI combined the B-
3
U.S. BUREAU OF LABOR STATISTICS
MONTHLY LABOR REVIEW
and C-size classes. The proportion of medium- and small-size areas within a Census division was determined
through a process called controlled selection (see section on sample selection). A BLS study found that the use
of by-division indexes had little effect on estimates for the U.S. all-items index, regional CPI indexes, and 12month standard errors.10
Given the change in classification for the nonself-representing areas (from Census region and size class to
Census division), the only way to reduce the number of index areas was to decrease the existing number (31) of
self-representing areas.11
Construct primary sampling units
After each decennial census, OMB releases a new set of definitions for statistical areas. The current definitions
assign counties surrounding an urban core area to geographic entities called Core-Based Statistical Areas
(CBSAs). The assignment is based on each county’s degree of economic and social integration (as measured
by commuting patterns) to the urban core. There are two types of CBSAs: metropolitan and micropolitan. A
metropolitan CBSA has an urban core of more than 50,000 people, and a micropolitan CBSA has an urban core
of 10,000 to 50,000 people. CBSAs may cross state borders. In addition, OMB defines Combined Statistical
Areas (CSAs), which are combinations of two or more CBSAs.
In the 1998 area sample design, the CPI program distinguished among A-, B-, and C-sized PSUs. The B-sized
PSUs were Metropolitan Statistical Areas (MSAs), defined by OMB in 1993; the C-sized PSUs were urban parts
of non-MSA areas; and the A-sized PSUs were an MSA mixture, in which some MSAs were combined to
maintain continuity with area definitions from the 1987 CPI geographic revision. Because CBSAs are the
conceptual successor of earlier metropolitan area definitions, using the metropolitan and micropolitan CBSA
definitions for nonself-representing areas was a natural choice. However, there was a question whether the CSA
definitions should be used for self-representing PSUs. In some cases, the A-sized PSUs in the 1998 sample
more closely resembled CSA definitions; in other cases, they more closely resembled the new metropolitan
CBSA definitions. The problem with CSA definitions is that they often create a very large geographic area in
which CPI has to conduct field operations and collect prices. For this reason, CPI decided to strictly adhere to
the new metropolitan CBSA definitions for self-representing PSUs.
Currently, BLS publishes the CPI-U, which covers approximately 87 percent of the U.S. population. With the
introduction of the CBSA concept to the CPI, the CPI-U coverage will increase to 94 percent of the U.S.
population reflected in the 2010 census.12 The area sample frame will comprise 381 metropolitan CBSAs,
representing approximately 85 percent of the population, and 536 micropolitan CBSAs, representing
approximately 9 percent of the population.
Determine the number of sampled PSUs
For the area sample, CPI has traditionally selected one PSU per stratum. The number of strata determines the
total number of PSUs in the sample. Specifying the number of strata depends on a variety of factors, including
that number’s expected overall impact on the accuracy of the U.S. all-items CPI and the total budget available
for data collection. With respect to accuracy, special consideration is given to the expected impact on sampling
variance and the expected impact on bias.
4
U.S. BUREAU OF LABOR STATISTICS
MONTHLY LABOR REVIEW
Currently, the CPI has 87 urban PSUs, whereas the CE has 91 PSUs (75 urban and 16 rural).13 With the 2018
area revision, the CPI program will reduce the total number of PSUs in the CPI, to 75. This reduction will
eventually allow CPI to collect prices in the same set of PSUs as that used by the CE program to collect
expenditure information.14 In addition, the reduction will lower the overall cost of implementing the new sample,
because of an expected increase in the percentage of overlapping areas and a decrease in the number of new
areas. Most importantly, the change will increase the average number of price quotes per index area and,
therefore, help address the small sample bias created by basic item–area indexes. Of course, maintaining the
same number of total quotes in the CPI and reducing the number of PSUs would increase the standard error of
CPI estimates, because of the loss of information in the PSU component of variance. However, using variance
models, CPI estimated that the 6-month standard error of the U.S. all-items CPI would see a modest increase of
2 to 5 percent.15 These estimates were based on the assumption that the total number of quotes in the CPI
would be maintained and that the number of self-representing PSUs would be reduced from 31 to 23.
To determine the ideal number of self-representing PSUs in the new area sample with 75 PSUs, the CPI
program again used variance models to simulate 6-month variance estimates in the U.S. all-items CPI for
different population-size cutoffs. The simulation showed that, for any population cutoff between 2.0 and 3.0
million, the range of the modeled 6-month standard errors was extremely narrow. (The largest simulated
difference was around 1 percent, which is the difference between, say, a standard error of .0657 and .0650.)
This narrow range gave CPI some flexibility in determining the exact population cutoff. The cutoff was ultimately
set at 2.5 million, which resulted in 23 self-representing PSUs. These PSUs include 21 units whose population is
greater than 2.5 million and 2 additional units—Anchorage, AK, and Honolulu, HI. Anchorage represents all
CBSAs in Alaska, and Honolulu represents all CBSAs in Hawaii. These CBSAs are unique because the
locations of both states make price change in their markets geographically isolated from that in other markets.
For this reason, the CBSAs in Alaska and Hawaii are treated as separate geographic strata.
With 23 self-representing PSUs and nine Census divisions, the new area design will yield 6,752 basic indexes
(32 index areas by 211 item strata) for the U.S. all-items CPI. This reduction (approximately 16 percent) in the
number of basic indexes will help address the small sample bias in index estimates.
Determine stratification variables
The goal of area stratification is to reduce the overall sampling variance in the CPI. This is achieved by grouping
nonself-representing PSUs whose characteristics are similar and highly correlated with price change and
consumption behavior. In the 1998 sample design, four independent variables were used for stratifying the
nonself-representing PSUs: normalized (centered and scaled by the range) longitude, the square of normalized
longitude, normalized latitude, and percent urban. Instead of simply repeating the stratification from the previous
area revision, CPI reanalyzed the stratification process from scratch. To determine the best possible area
stratification, the effort involved not only the reassessment of geographic variables, such as latitude and
longitude, but also an analysis of potential demographic variables. The decision to investigate demographic
variables (besides percent urban) was highly influenced by the introduction of the American Community Survey
(ACS), which replaced the decennial census long-form survey.16 The ACS introduced rolling 3-year estimates
that would cover every community with a population greater than 20,000 and 5-year estimates that would cover
5
U.S. BUREAU OF LABOR STATISTICS
MONTHLY LABOR REVIEW
every community in the nation.17 Previously, the decennial census provided only a snapshot of the demographic
variables.
The ACS presented 52 main topics of social characteristics in its 3-year estimates for 2005–07. It would have
been difficult to investigate all of these variables when modeling CPI percent-change estimates. Therefore, the
first task was to limit the set of ACS variables to a manageable list, and the second task was to aggregate these
statistics to the CBSA level. This approach produced 23 variables that were used in the first stage of modeling.
The first task involved eliminating variables that were thought to have little explanatory power in the CPI. The
guiding philosophy was to take ACS variables—such as race, educational attainment, and property statistics—
that could affect price change. Housing variables were selected because indexes for shelter make up a large
proportion of the CPI. The second task involved aggregating the county-level ACS data to the CBSA level. If
median statistics for all member counties were available, they were averaged. Examples of such averages
include the average median property value and the average median income for a CBSA. Other CBSA statistics,
expressed as an average or a percentage, were calculated with the use of a weighted average based on the
requisite county’s population. These calculations yielded statistics such as the median household property value
and the percentage of the CBSA population that is Native American.
The final list of housing and demographic variables considered for potential inclusion in the stratification model
was as follows:
• Housing—average median household property value, average number of vehicles per household, percent
of family households, percent of occupied housing units, percent of owner housing units, and housing
units per square mile
• Population density—population per square mile and percent urban
• Age—percent of people in their twenties and percent of people ages 35 to 44
• Race and gender—percent male, percent African American, percent Native American, and percent Asian
• Income—percent in poverty, average median household income, and average total median earnings
• Education18—percent with less than 9th-grade education, percent with 9th- to 12th-grade education,
percent with a high school diploma, percent with some college, percent with an associate’s degree,
percent with a college degree, and percent with a graduate degree
To model these stratification variables, CPI developed a series of nonoverlapping all-items price relatives19 for
each PSU in the current area sample, using the same timeframe as that for the ACS demographics. The ACS
variables investigated spanned the period 2005–07 and were released in December 2008. Unofficial price
relatives for B- and C-sized PSUs were produced to serve as responses in the modeling procedure and used in
conjunction with the existing price relatives for A-sized PSUs. The 12-month (December to December) price
relatives used were for 2004–05, 2005–06, 2006–07, and 2007–08. This approach provided four responses for
each PSU, along with a set of covariates, to be used in a repeated-measures modeling procedure.
A backward-elimination process was used to limit the set of variables to those with small p-values ( p < .05).
Anchorage and Honolulu could be included because the initial model did not contain longitude and latitude.
Being obvious geographic outliers, these two areas were later removed from models that included latitude and
longitude. The regression variables, along with their resulting p-values (Pr > F), were treated as “effects.” It is
6
U.S. BUREAU OF LABOR STATISTICS
MONTHLY LABOR REVIEW
customary to first remove the effect with the highest p-value, rerun the model (which would slightly change the p
-values), and then repeat the steps for the effect with the next-highest p-value. This backward-elimination
process of removing one variable at a time continues until all remaining effects have small p-values. CPI used
this process to arrive at the final model of all significant variables. (See table 1.)
Table 1. Initial demographic model based on 87 current CPI PSUs
Effect
Numerator
Denominator
DF
DF
F-value p-value
Period
1
81 173.42
Census region(1)
3
81
10.27
Number of households
Household property value
1
1
81
81
12.51
11.51
<.
0001
<.
0001
.0007
.0011
Notes:
(1) Census region was used in lieu of Census division for two reasons. First, the current sample design was not intended to support division estimates.
Second, the determination of the stratification model was completed before it was decided to implement the Census divisions.
Source: U.S. Bureau of Labor Statistics.
To determine the predictive power of a particular model, CPI used linear regression to examine each 1-year allitems price relative in the 2005–08 timespan. The first linear model used the 1-year price relative for December
2004 to December 2005 as the dependent variable, the second used the 1-year price relative for January 2005
to January 2006, the third for February 2005 to February 2006, and so forth, until every 1-year price relative was
used. These linear models produced a distribution of R-squared statistics. The mean of this distribution was
calculated to describe the predictive power of each model. This process was repeated for lower level 1-year
price indexes for energy, local services, food and beverages, and local shelter. The backward-elimination
process, described earlier, was implemented for each of these subindexes. The R-squared statistic was again
calculated to determine the predictive quality of the final set of variables for each subindex. However, none of
these variables proved to be highly predictive, as they all had R-squared statistics smaller than 0.3. Models also
were evaluated for each of the four Census regions, again with little success. In addition to the variables from
the ACS, the following stratification variables from the previous area revision were considered for each PSU in
the current sample: longitude, latitude, longitude squared, latitude squared, and percent urban. However, these
variables also had very little predictive power and were not always significant in the all-items model.
Once several final models were derived, the PSUs included in these models were stratified with an “equal
population” constraint, under which each stratum would have a population within 10 percent of the mean of all
strata in the index area. Because this constraint conflicted with the variable for number of households, which is
highly correlated with population, that variable was excluded from the models.
In addition, since the area sample design is also intended to support the CE, ACS variables were investigated
for correlation with expenditure estimates. Average median household income and average median property
value were, by far, the best demographic predictors of consumer expenditures (the two-variable model
investigated for the 2005–08 timeframe had an R-squared statistic of 0.65).
7
U.S. BUREAU OF LABOR STATISTICS
MONTHLY LABOR REVIEW
Finally, three separate stratification models were investigated: a seven-variable model, a six-variable model, and
a four-variable model. The seven-variable model included percent urban, income, property value, longitude,
latitude, longitude squared, and latitude squared. The six-variable model contained the same variables, except
for percent urban, and the four-variable model excluded longitude squared, latitude squared, and percent urban.
Ultimately, the four-variable model (longitude, latitude, median property value, and median household income)
was selected, because longitude squared and latitude squared added little predictive value. None of the four
variables in the model turned out to be an influential predictor of CPI price change over time and across regions.
Table 2 shows the model’s predictive accuracy (R-squared) over time for each Census region and subcategory
investigated.
Table 2. R-squared for Census regions and various subcategories, final model
Census region
Index
Total
NortheastMidwestSouth West
.
158
.
216
.
164
.
276
.
076
All items
Energy
Food and beverages
Housing
Local services
.311
.224 .160
.529
.162 .261
.338
.174 .153
.469
.206 .236
.201
.098 .207
.
330
.
483
.
392
.
407
.
173
Source: U.S. Bureau of Labor Statistics.
The final stratification model of longitude, latitude, median property value, and median household income is a
compromise between having significant predictors for the CE and retaining geographic variables used in the
previous area design. The addition of the income and property-value variables will greatly enhance the area
stratification for the CE and substantially reduce the between-PSU variance. It will not, however, significantly
increase the explanatory power of the model for price change. Despite this, producing more reliable consumer
expenditure estimates will help in the calculation of the CPI. Because the new stratification model is very
different from its predecessor, a sample-overlap procedure will be used to retain as many nonself-representing
PSUs from the 1998 area sample as possible.
Allocate sample and assign PSUs to strata
To allocate sample to the nonself-representing PSUs, CPI excluded the population for the self-representing
PSUs for each Census division. Table 3 presents the proportional-to-population-size sample allocation, by
Census division, for the 2018 geographic area design. There are 23 self-representing PSUs, which account for
approximately 39 percent of the total U.S. population and about 42 percent of the CPI-U population. There are
52 nonself-representing PSUs, which represent the remaining 58 percent of the CPI-U population and include
both metropolitan and micropolitan areas.
8
U.S. BUREAU OF LABOR STATISTICS
MONTHLY LABOR REVIEW
Table 3. Distribution of selected sample units, by Census division, 2018 revision
NonselfCensus division
Self-
representing representing Total
PSUs
Total
1—Northeast
2—Middle Atlantic
3—East North Central
4—West North Central
5—South Atlantic
6—East South Central
7—West South Central
8—Mountain
9—Pacific
PSUs
52
2
4
8
4
12
6
8
4
4
23
1
2
2
2
5
0
2
6
7
75
3
6
10
6
17
6
10
10
11
Source: U.S. Bureau of Labor Statistics.
The next phase of the selection process was to assign the nonself-representing PSUs within each Census
division to strata based on a model of the four stratification variables (latitude, longitude, median household
income, and median property value).20 The primary objective of the PSU stratification was to minimize the
between-PSU component of variance by making the PSUs within each stratum as homogeneous as possible
with respect to the four stratification variables. In addition, to further minimize the variance, strata within each
Census division had to be kept with approximately the same population. In the 1998 design, this type of
constrained clustering problem was solved with a Friedman-Rubin hill-climbing algorithm.21 For the 2018 design,
CPI developed a new heuristic stratification algorithm based on k-means clustering and zero–one integer linear
programming.22
Select a sample of PSUs
The final step of the selection process was to select one PSU per stratum. However, before making that final
selection, CPI had to employ two special selection procedures: a sample-overlap procedure and a controlledselection procedure.
Sample-overlap methodology
In the 1998 design, a sample-overlap procedure was used to select the nonself-representing areas for the CPI
and CE surveys.23 Sample-overlap procedures increase the expected number of nonself-representing
geographic areas that would be reselected in the new design. Because the use of an overlap procedure results
in fewer new areas that need to be rotated in the sample and in fewer existing areas that need to be rotated out
of the sample, it lowers the expected costs of operational changes (e.g., hiring and training of new field staff)
associated with the new area design.
Two different sample-overlap procedures were considered: one proposed by Walter Perkins and one by
Lawrence Ernst.24 The Perkins procedure is a heuristic method that was used in previous redesigns. The Ernst
procedure uses linear programming. Because linear programming is employed in optimization, using the Ernst
9
U.S. BUREAU OF LABOR STATISTICS
MONTHLY LABOR REVIEW
procedure would result in a higher expected number of overlapping PSUs and, consequently, lower overall cost
of switching to the new area design for both the CPI and the CE. Only nonself-representing metropolitan PSUs
were deemed eligible for the procedure. Micropolitan areas were deemed ineligible because they did not have
enough renters for the CPI Housing Survey. Each micropolitan area must have enough renters for two samples
(of six panels each) during the decade between area redesigns. In the past, the CPI program sometimes had to
extend the area definition for the CPI Housing Survey, to include outlying rural counties and, thus, ensure that
the survey had enough renters. Table 4 allows a comparison between the expected number of overlapping
PSUs calculated with the two procedures and the expected number of nonself-representing areas selected
independently.
Table 4. Expected sample overlap of PSUs, by census division(1)
SampleCensus division
PSU
Independent
overlap
design
selection
procedure
Perkins Ernst
Total
1—Northeast
2—Middle Atlantic
3—East North Central
4—West North Central
5—South Atlantic
6—East South Central
7—West South Central
8—Mountain
9—Pacific
58
2
4
8
4
14
6
8
6
6
13.2
.5
.7
2.9
.8
3.2
.8
2.1
1.4
.9
19.3 28.6
.7 1.0
1.2 1.7
3.6 4.8
1.4 2.1
4.5 7.0
1.1 2.2
2.6 4.4
2.6 3.2
1.6 2.2
Notes:
(1) This analysis was done for a total of 87 PSUs (58 nonself-representing) before it was decided to move to the final design of 75 PSUs (52 nonself-
representing).
Source: U.S. Bureau of Labor Statistics.
Ultimately, the Ernst procedure was selected and used, because it increased the overlap between the PSUs in
the new area sample and those in the 1998 sample. The outcome of the procedure gave a new set of selection
probabilities, which were used to select the sample.
Controlled selection
Controlled selection is a process of selecting a random sample of PSUs such that the probability of selecting
certain preferred combinations of PSUs increases and the probability of selecting nonpreferred combinations of
PSUs decreases. This is accomplished by controlling the interaction among the PSU selections in different
strata. Given that only one sample is ultimately selected, there may be important reasons for preferring some
possible sample outcomes over others. It is usually judged that balancing the sample with respect to one or
more additional variables (besides those from the strata) would increase the degree of confidence in the
inferences made about the population. If information on such additional variables is available, then the
population may be crossclassified and an implicit stratification achieved with reference to each of these
10
U.S. BUREAU OF LABOR STATISTICS
MONTHLY LABOR REVIEW
additional variables. However, multiple crossclassifications can be overly restrictive, making it impossible to
select a sample that meets all of the constraints.
Controlled selection can be used to control for a variety of variables. In the 1998 design, the number of PSUs
per state and the number of PSU overlaps were controlled. For example, if Florida, given its population share in
the South, was expecting 2.3 PSUs from that region, then controlled selection gave a 30-percent chance of the
state getting 3 PSUs and a 70-percent chance of it getting 2 PSUs. In this case, controlled selection eliminated
the possibility of selecting samples with fewer than two PSUs or more than three PSUs.
For the 2018 redesign, the greatest concern was about controlling the number of micropolitan areas selected in
the sample; controlling by state was deemed a secondary concern. The change in classification for the nonselfrepresenting index areas (from Census region and size class to Census division) meant that controlling the
number of PSUs per state would be of less value.
Controlled selection is a computationally intensive process. The solution time for a controlled-selection problem
increases exponentially with the size of the problem (e.g., the number of strata). Using the software package
SOCSLP, CPI was unable to solve a two-variable controlled-selection problem for the South region.25 Therefore,
only micropolitan status was used as a control variable, and this was done at the Census-region level.
Sample outcome
After adjusting the sample selection probabilities with the use of the Ernst sample-overlap procedure and
employing controlled selection for the micropolitan areas, CPI randomly selected one PSU per stratum. The
resulting (final) area sample for the 2018 revision is shown in the appendix. Thirty-three of the 87 PSUs in the
1998 design will be dropped from the CPI. Two of these exclusions are due to treating the New York, NY, CBSA
as one PSU; previously, that CBSA was treated as three PSUs. Meanwhile, only 21 of the 75 PSUs in the 2018
design will be considered new areas. Of the 21 new areas, 14 are metropolitan CBSAs and 7 are micropolitan
CBSAs.
In January 2018, CPI will begin publishing indexes for the nine Census divisions, in addition to releasing the
current four regional estimates. However, given the new area design, CPI will no longer publish “region by city
size” index estimates. Because of the reduction in the number of self-representing areas, the program will be
unable to support separate index estimates for Cincinnati, OH; Cleveland, OH; Milwaukee, WI; Pittsburgh, PA;
and Portland, OR.26 One other formerly published area, Kansas City, MO, was not reselected as part of the
2018 area revision. The remaining 23 self-representing areas listed in the appendix (denoted by an “S” in the
PSU code) will continue to have area indexes published under the new area design.
New area design implementation plan
After selecting the final area design, BLS determined the process for implementing the new geographic sample
into the four surveys used to construct the CPI. The four surveys are the CE, the Telephone Point-of-Purchase
Survey (TPOPS), the Commodity and Services (C&S) survey, and the Housing Survey. In all previous CPI
geographic revisions, the conversion process occurred all at once: that is, the administration of each survey
switched from the old area sample to the new area sample in its entirety, albeit at different points in time. For
example, for the 1998 revision, the CE was switched to the new sample design in 1996; TPOPS was used to
11
U.S. BUREAU OF LABOR STATISTICS
MONTHLY LABOR REVIEW
identify outlet frames in new PSUs during the 1995–96 period; and the initial round of data collection for the
Housing and C&S surveys was completed by the fall of 1997, so that the CPI could be computed by January
1998, on the basis of the new area design.
For the 2018 area revision, the CE fully converted to the new sample in 2015. However, for the other three
surveys (which are directly managed by the CPI program), the 21 new PSUs have been divided into groups
whereby the new PSUs will be introduced over a 4-year span. This rotation process will distribute the cost of
introducing new PSUs into the Housing and C&S surveys, avoiding a spike in data collection costs before the
full CPI conversion to the new area design.
The calculation of price indexes under the new area design will begin in January 2018, with the introduction of
the first set of new PSUs into the sample. All late-dropping PSUs (i.e., existing PSUs scheduled to be rotated
out of the sample late in the implementation process) will be used as proxy candidates for late-rotating new
PSUs, until the complete set of new PSUs has been rotated into the sample. An ideal proxy for a given new
PSU was considered to be one of the dropping PSUs within a new PSU geographic stratum. If such dropping
PSU were not available, a proxy was identified through nearest neighbor rules, with the constraint that the proxy
falls within 200 miles of the new PSU. If no eligible proxy existed, the new PSU was considered to be a
“geographic hole” within the new area structure. There were eight new PSUs with no eligible proxy. Therefore,
they were given priority in the rotation schedule.
12
U.S. BUREAU OF LABOR STATISTICS
MONTHLY LABOR REVIEW
In devising the rotation schedule, CPI determined the following field operational constraints: (1) no more than six
new PSUs could be rotated in a calendar year, and (2) no more than two new PSUs could be rotated in any of
the six BLS regional offices in a calendar year. Because there are 21 new PSUs, the new PSUs will be rotated
across four groups, or waves, over a 4-year period. Six new PSUs will be introduced in each of the first three
waves, and three new PSUs will be introduced in the fourth, and final, wave. Figure 2 shows the timeline for
introducing wave-1 PSUs into the various components of the CPI. The milestones for each successive wave
begin exactly 1 year after the corresponding milestones for the previous wave have been completed; according
to this schedule, all waves will be completed by the end of 2023. Because the publication of indexes based on
the new area design will begin in January 2018, the new PSUs for waves 2–4 will be “proxied” by a dropping
PSU. All dropping PSUs that were not designated as proxies (18 PSUs) will be dropped with wave 1. Three new
PSUs are considered geographic holes and are part of wave 2. These PSUs will be entirely imputed for the first
year of the new area design. The appendix indicates the respective wave during which each new PSU will enter
the index.
Appendix
Final CPI geographic sample, 2018 revision
PSU code (1)
PSU name
PSU definition (state and county)
Stratum
population
Percent of
index
population
Region 1—Northeast, Division 1—New England
S11A
Boston–Cambridge–
Newton, MA–NH
Hartford–West
Hartford–East
Hartford, CT
N11C
Springfield, MA
Region 1—Northeast, Division 2—Middle Atlantic
N11B
MA: Essex, Middlesex, Norfolk, Plymouth,
Suffolk
NH: Rockingham, Strafford
4,552,402
1.57
CT: Hartford, Middlesex, Tolland
5,005,793
1.73
MA: Hampden, Hampshire
4,233,926
1.46
19,567,410
6.76
5,965,343
2.06
4,065,877
1.40
NJ: Bergen, Essex, Hudson, Hunterdon,
Middlesex, Monmouth, Morris, Ocean,
Passaic, Somerset, Sussex, Union
NY: Bronx, Dutchess, Kings, Nassau, New
York, Orange, Putnam, Queens, Richmond,
Rockland, Suffolk, Westchester
PA: Pike
DE: New Castle
MD: Cecil
NJ: Burlington, Camden, Gloucester, Salem
PA: Bucks, Chester, Delaware, Montgomery,
Philadelphia
PA: Allegheny, Armstrong, Beaver, Butler,
Fayette, Washington, Westmoreland
S12A
New York–Newark–
Jersey City, NY–NJ–
PA
S12B
Philadelphia–
Camden–
Wilmington, PA–NJ–
DE–MD
N12C
Pittsburgh, PA
N12D
Buffalo–
Cheektowaga–
Niagara Falls, NY
NY: Erie, Niagara
3,483,174
1.20
N12E
W1Rochester,
NY: Livingston, Monroe, Ontario, Orleans,
Wayne, Yates
3,925,318
1.36
NY
See footnotes at end of table.
13
U.S. BUREAU OF LABOR STATISTICS
MONTHLY LABOR REVIEW
Final CPI geographic sample, 2018 revision
PSU code
(1)
PSU name
PSU definition (state and county)
N12F
Reading, PA
PA: Berks
Region 2—Midwest, Division 3—East North Central
IL: Cook, De Kalb, Du Page, Grundy, Kane,
Chicago–Naperville– Kendall, Lake, McHenry, Will
S23A
Elgin, IL–IN–WI
IN: Jasper, Lake, Newton, Porter
WI: Kenosha
Detroit–Warren–
MI: Lapeer, Livingston, Macomb, Oakland, St.
S23B
Dearborn, MI
Clair, Wayne
IN: Dearborn, Ohio, Union
KY: Boone, Bracken, Campbell, Gallatin,
Cincinnati, OH–KY–
N23C
Grant, Kenton, Pendleton
IN
OH: Brown, Butler, Clermont, Hamilton,
Warren
Cleveland–Elyria,
OH: Cuyahoga, Geauga, Lake, Lorain,
N23D
OH
Medina
OH: Delaware, Fairfield, Franklin, Hocking,
N23E
Columbus, OH
Licking, Madison, Morrow, Perry, Pickaway,
Union
Milwaukee–
WI: Milwaukee, Ozaukee, Washington,
N23F
Waukesha–West
Waukesha
Allis, WI
N23G
Dayton, OH
OH: Greene, Miami, Montgomery
N23H
N23I
N23J
W1Flint,
MI
W2Janesville–Beloit,
WI
Stratum
population
Percent of
index
population
3,562,332
1.23
9,461,105
3.27
4,296,250
1.48
3,395,853
1.17
3,257,953
1.12
3,758,510
1.30
3,256,494
1.12
3,924,320
1.36
MI: Genesee
3,911,189
1.35
WI: Rock
3,745,126
1.29
3,427,365
1.18
3,348,859
1.16
2,787,701
.96
2,974,017
1.03
2,842,770
.98
3,288,318
1.14
2,947,903
1.02
5,636,232
1.95
W3Frankfort,
IN: Clinton
IN
Region 2—Midwest, Division 4—West North Central
MN: Anoka, Carver, Chisago, Dakota,
Hennepin, Isanti, Le Sueur, Mille Lacs,
Minneapolis–St.
Ramsey, Scott, Sherburne, Sibley,
S24A
Paul–Bloomington,
Washington, Wright
MN–WI
WI: Pierce, St. Croix
IL: Bond, Calhoun, Clinton, Jersey, Macoupin,
Madison, Monroe, St. Clair
S24B
St. Louis, MO–IL
MO: Franklin, Jefferson, Lincoln, St. Charles,
St. Louis, St. Louis City, Warren
IA: Harrison, Mills, Pottawattamie
W2Omaha–Council
N24C
NE: Cass, Douglas, Sarpy, Saunders,
Bluffs, NE–IA
Washington
KS: Butler, Harvey, Kingman, Sedgwick,
W2Wichita, KS
N24D
Sumner
N24E
Lincoln, NE
NE: Lancaster, Seward
W3Wahpeton, ND–
MN: Wilkin
N24F
ND: Richland
MN
Region 3—South, Division 5—South Atlantic
Washington–
DC: District of Columbia
Arlington–
S35A
MD: Calvert, Charles, Frederick, Montgomery,
Alexandria, DC–VA– Prince George’s
MD–WV
See footnotes at end of table.
14
U.S. BUREAU OF LABOR STATISTICS
MONTHLY LABOR REVIEW
Final CPI geographic sample, 2018 revision
PSU code
(1)
PSU name
PSU definition (state and county)
Stratum
population
Percent of
index
population
VA: Alexandria City, Arlington, Clarke,
Culpeper, Fairfax, Fairfax City, Falls Church
City, Fauquier, Fredericksburg City, Loudoun,
Manassas City, Manassas Park City, Prince
William, Rappahannock, Spotsylvania,
Stafford, Warren
WV: Jefferson
S35B
S35C
S35D
S35E
Miami–Fort
Lauderdale–West
Palm Beach, FL
FL: Broward, Miami–Dade, Palm Beach
5,564,635
1.92
Atlanta–Sandy
Springs–Roswell,
GA
GA: Barrow, Bartow, Butts, Carroll, Cherokee,
Clayton, Cobb, Coweta, Dawson, DeKalb,
Douglas, Fayette, Forsyth, Fulton, Gwinnett,
Haralson, Heard, Henry, Jasper, Lamar,
Meriwether, Morgan, Newton, Paulding,
Pickens, Pike, Rockdale, Spalding, Walton
5,286,728
1.83
FL: Hernando, Hillsborough, Pasco, Pinellas
2,783,243
.96
MD: Anne Arundel, Baltimore, Baltimore City,
Carroll, Harford, Howard, Queen Anne’s
2,710,489
.94
NC: Cabarrus, Gaston, Iredell, Lincoln,
Mecklenburg, Rowan, Union
SC: Chester, Lancaster, York
3,035,149
1.05
FL: Lake, Orange, Osceola, Seminole
2,642,941
.91
3,027,856
1.05
2,549,176
.88
SC: Anderson, Greenville, Laurens, Pickens
3,094,518
1.07
NC: Davidson, Davie, Forsyth, Stokes, Yadkin
2,637,083
.91
FL: Lee
3,091,153
1.07
FL: Marion
FL: Alachua, Gilchrist
2,568,744
2,913,140
.89
1.01
NC: New Hanover, Pender
2,736,321
.94
Tampa–St.
Petersburg–
Clearwater, FL
Baltimore–
Columbia–Towson,
MD
W3Charlotte–
N35F
Concord–Gastonia,
NC–SC
W1Orlando–
N35G
Kissimmee–Sanford,
FL
N35H
Richmond, VA
N35I
Raleigh, NC
Greenville–
Anderson–Mauldin,
SC
N35J
N35K
W3Winston–Salem,
VA: Amelia, Caroline, Charles City,
Chesterfield, Colonial Heights City, Dinwiddie,
Goochland, Hanover, Henrico, Hopewell City,
King William, New Kent, Petersburg City,
Powhatan, Prince George, Richmond City,
Sussex
NC: Franklin, Johnston, Wake
N35M
N35N
NC
Cape Coral–Fort
Myers, FL
Ocala, FL
Gainesville, FL
N35O
W2Wilmington,
N35P
W2Jacksonville,
NC: Onslow
3,100,604
1.07
N35Q
W1Clarksburg,
WV: Doddridge, Harrison, Taylor
2,563,098
.89
W4Louisville/
IN: Clark, Floyd, Harrison, Scott, Washington
KY: Bullitt, Henry, Jefferson, Oldham, Shelby,
Spencer, Trimble
2,529,624
.87
N35L
NC
NC
WV
Region 3—South, Division 6—East South Central
N36A
Jefferson County,
KY–IN
See footnotes at end of table.
15
U.S. BUREAU OF LABOR STATISTICS
MONTHLY LABOR REVIEW
Final CPI geographic sample, 2018 revision
PSU code
(1)
N36B
N36C
N36D
N36E
N36F
PSU name
PSU definition (state and county)
Birmingham–Hoover, AL: Bibb, Blount, Chilton, Jefferson, Shelby,
AL
St. Clair, Walker
Chattanooga, TN–
GA: Catoosa, Dade, Walker
GA
TN: Hamilton, Marion, Sequatchie
W4Huntsville, AL
AL: Limestone, Madison
Florence–Muscle
AL: Colbert, Lauderdale
Shoals, AL
W1Meridian, MS
MS: Clarke, Kemper, Lauderdale
Stratum
population
Percent of
index
population
2,483,606
.86
2,620,595
.90
2,801,399
.97
2,550,408
.88
2,397,313
.83
6,426,214
2.22
5,920,416
2.04
2,436,095
.84
2,812,948
.97
2,543,610
.88
2,444,837
.84
2,581,037
.89
Region 3—South, Division 7—West South Central
S37A
S37B
N37C
Dallas–Fort Worth–
Arlington, TX
Houston–The
Woodlands–Sugar
Land, TX
San Antonio–New
Braunfels, TX
N37D
Oklahoma City, OK
N37E
Baton Rouge, LA
N37F
Lafayette, LA
N37G
Brownsville–
Harlingen, TX
N37H
Amarillo, TX
N37I
N37J
2,756,117
.95
W2Russellville, AR
2,620,998
.91
W3Paris,
TX: Lamar
2,851,943
.98
AZ: Maricopa, Pinal
4,192,887
1.45
CO: Adams, Arapahoe, Broomfield, Clear
Creek, Denver, Douglas, Elbert, Gilpin,
Jefferson, Park
2,543,482
.88
NV: Clark
3,227,960
1.11
UT: Juab, Utah
AZ: Yuma
3,724,271
3,840,701
1.29
1.33
UT: Washington
3,206,759
1.11
12,828,837
4.43
4,335,391
1.50
Denver–Aurora–
Lakewood, CO
N48D
N48E
Las Vegas–
Henderson–
Paradise, NV
Provo–Orem, UT
Yuma, AZ
N48F
W3St.
N48C
TX: Cameron
TX: Armstrong, Carson, Oldham, Potter,
Randall
AR: Pope, Yell
TX
Region 4—West, Division 8—Mountain
Phoenix–Mesa–
S48A
Scottsdale, AZ
S48B
TX: Collin, Dallas, Denton, Ellis, Hood, Hunt,
Johnson, Kaufman, Parker, Rockwall,
Somervell, Tarrant, Wise
TX: Austin, Brazoria, Chambers, Fort Bend,
Galveston, Harris, Liberty, Montgomery,
Waller
TX: Atascosa, Bandera, Bexar, Comal,
Guadalupe, Kendall, Medina, Wilson
OK: Canadian, Cleveland, Grady, Lincoln,
Logan, McClain, Oklahoma
LA: Ascension, East Baton Rouge, East
Feliciana, Iberville, Livingston, Pointe Coupee,
St. Helena, West Baton Rouge, West
Feliciana
LA: Acadia, Iberia, Lafayette, St. Martin,
Vermilion
George, UT
Region 4—West, Division 9—Pacific
Los Angeles–Long
S49A
Beach–Anaheim, CA
San Francisco–
S49B
Oakland–Hayward,
CA
CA: Los Angeles, Orange
CA: Alameda, Contra Costa, Marin, San
Francisco, San Mateo
See footnotes at end of table.
16
U.S. BUREAU OF LABOR STATISTICS
MONTHLY LABOR REVIEW
Final CPI geographic sample, 2018 revision
PSU code
(1)
S49C
S49D
S49E
S49F
S49G
PSU name
Riverside–San
Bernardino–Ontario,
CA
Seattle–Tacoma–
Bellevue, WA
San Diego–
Carlsbad, CA
Honolulu, HI
Anchorage, AK
N49H
Portland–
Vancouver–
Hillsboro, OR–WA
N49I
W1Santa
N49J
Rosa, CA
Chico, CA
N49K
W4Moses
Lake, WA
PSU definition (state and county)
Stratum
population
Percent of
index
population
CA: Riverside, San Bernardino
4,224,851
1.46
WA: King, Pierce, Snohomish
3,439,809
1.19
CA: San Diego
3,095,313
1.07
HI: Honolulu
AK: Anchorage, Matanuska–Susitna
OR: Clackamas, Columbia, Multnomah,
Washington, Yamhill
WA: Clark, Skamania
CA: Sonoma
1,360,301
523,154
.47
.18
5,208,366
1.80
5,163,670
1.78
CA: Butte
4,623,339
1.60
WA: Grant
4,363,676
1.51
Notes:
(1) PSU code (1st character: S—self–representing or N—nonself–representing; 2nd character: region number; 3rd character: division number; 4th
character: A–Q, depending on number of PSUs within a Census division).
Note: The superscripts W1–W4 designate the respective wave during which each new PSU will enter the index; no designation indicates a continuing
PSU.
Source: U.S. Bureau of Labor Statistics.
SUGGESTED CITATION
Steven P. Paben, William H. Johnson, and John F. Schilp, "The 2018 revision of the Consumer Price Index
geographic sample," Monthly Labor Review, U.S. Bureau of Labor Statistics, October 2016, https://doi.org/
10.21916/mlr.2016.47.
NOTES
1 Because of resource constraints, the U.S. Bureau of Labor Statistics (BLS) did not rotate the 2008 geographic sample, which
was based on the 2000 decennial census.
2 The continuous rotation plan for geographic areas includes the following CPI component surveys: the Housing Survey, the
Telephone Point-of-Purchase Survey, and the Commodity and Services survey. The latter two surveys moved to 4-year within-PSU
rotation cycles as part of the 1998 CPI revision. In 2010, CPI began a multiyear effort to continually update the rent sample for the
Housing Survey. (See Frank Ptacek, “Updating the rent sample for the CPI Housing Survey,” Monthly Labor Review, August 2013,
https://www.bls.gov/opub/mlr/2013/article/updating-the-rent-sample-for-the-cpi-housing-survey.htm.) In January 2015, the
Consumer Expenditure Survey (CE) switched to a geographic sample design based on the 2010 decennial census.
3 Janet L. Williams, Eugene F. Brown, and Gary R. Zion, “The challenge of redesigning the Consumer Price Index area sample,”
Proceedings of the Survey Research Methods Section, vol. 1 (American Statistical Association, 1993), pp. 200–205.
17
U.S. BUREAU OF LABOR STATISTICS
MONTHLY LABOR REVIEW
4 Sample allocation divides the total sample into smaller samples for each classification variable.
5 Because of a lack of population, PSUs were not selected for the Northeast region’s urban nonmetropolitan index areas (size
class D100) used in the 1998 area design.
6 Revised delineations of metropolitan statistical areas, micropolitan statistical areas, and combined statistical areas, and guidance
on uses of the delineations of these areas, Bulletin No. 13-01 (Office of Management and Budget, February 28, 2013), https://
www.whitehouse.gov/sites/default/files/omb/bulletins/2013/b-13-01.pdf.
7 Ralph Bradley, “Analytical bias reduction for small samples in the U.S. Consumer Price Index,” Journal of Business & Economic
Statistics, vol. 25, no. 3, 2007, pp. 337–346.
8 Ibid.
9 A self-representing area represents only its own area definition. A nonself-representing area stands for multiple area definitions.
10 John F. Schilp, “Simulated statistics for the proposed by-division design in the Consumer Price Index,” Proceedings of the
Government Statistics Section (American Statistical Association, 2014).
11 The 1998 area design, used currently, includes seven “Census region by size class” groups for the nonself-representing areas,
because it has no sample allocated to the C-sized areas in the Northeast region.
12 Under the 2018 area design, the total population residing in each county included in a CBSA, both metropolitan and
micropolitan, is defined as urban. Under the 1998 design, the total population in A- and B-sized PSUs was defined as urban, but
only the population residing within the political boundaries of an “urban core” in the C-sized PSUs was defined as urban. The
population outside of an urban core, but inside a county defining a C-sized PSU, was considered rural under the 1998 design.
13 The CE additionally covers rural areas. These areas are out of scope for the CPI.
14 The original design based on the 2000 decennial census had 86 urban PSUs for the CE and CPI surveys. CPI did not receive
funding for its initiative in time to make the switch to that design. The CE program did implement the design, but then had to cut 11
PSUs because of insufficient funding in 2006.
15 The variance models are for 6-month percent-change standard errors, because the main purpose of the models is to allocate
outlets and items to areas where the samples are rotated every 6 months.
16 An overview of the ACS data products can be found at https://www.Census.gov/programs-surveys/acs/.
17 The 3-year product was discontinued in 2015.
18 Education level for those age 25 and over.
19 A price relative is the ratio of an item’s current-period price to its previous-period price.
20 Median household income and median property value were derived from 2010 5-year ACS estimates for the final stratification
model.
21 H. P. Friedman and J. Rubin, “On some invariant criteria for grouping data,” Journal of the American Statistical Association, vol.
62, 1967, pp. 1159–1178.
22 Susan L. King, John Schilp, and Erik Bergman, “Assigning PSUs to a stratification PSU,” Proceedings of the Survey Research
Methods Section (American Statistical Association, 2011), pp. 2235–2246.
18
U.S. BUREAU OF LABOR STATISTICS
MONTHLY LABOR REVIEW
23 William H. Johnson, Steven P. Paben, John F. Schilp, “The use of sample overlap methods in the Consumer Price Index
redesign,” Proceedings of the Fourth International Conference of Establishment Surveys, June 11–14, 2012, Montréal, Canada
(American Statistical Association, 2012).
24 See Walter M. Perkins, “1970 CPS redesign: proposed method for deriving sample PSU selection probabilities within 1970 NSR
strata,” memorandum to Joseph Waksberg (U.S. Bureau of the Census, August 5, 1970); and Lawrence R. Ernst, “Maximizing the
overlap between surveys when information is incomplete,” European Journal of Operational Research, vol. 27, no. 2, 1986, pp.
192–200.
25 SOCSLP is written in SAS by Sun Wong Kim, Steven G. Herringa, and Peter W. Solenberger of the University of Michigan. It
should remain useable in the future, because SAS will continue to be supported at BLS. For details on the methodology used in
SOCSLP, see Kim, Herringa, and Solenberger, “Optimizing solution sets in two-way controlled selection problems” (Institute for
Social Research, University of Michigan), ftp://ftp.isr.umich.edu/pub/src/smp/socslp/socslp_paper.pdf.
26 Cincinnati, Cleveland, Milwaukee, Pittsburgh, and Portland were reselected as nonself-representing areas.
RELATED CONTENT
Related Articles
Comparing the Consumer Price Index with the gross domestic product price index and gross domestic product implicit price
deflator, Monthly Labor Review, March 2016.
Explaining the 30-year shift in consumer expenditures from commodities to services, 1982–2012, Monthly Labor Review, April
2014.
The first hundred years of the Consumer Price Index: a methodological and political history, Monthly Labor Review, April 2014.
Updating the rent sample for the CPI Housing Survey, Monthly Labor Review, August 2013.
Related Subjects
Consumer price index
Sampling
Prices and Spending
areas
19
Statistical programs and methods
Geographic
File Type | application/pdf |
File Modified | 2018-09-13 |
File Created | 2018-07-22 |