04/19/2017
Consumer Price Index Commodities and Services
B. DESCRIPTION OF INFORMATION COLLECTIONS EMPLOYING STATISTICAL METHODS
Universe and Sample Size Summary
Because of the complexity, importance and diversity of its universe, the construction of the Consumer Price Index (CPI) requires a complex set of statistical techniques and samples. Conceptually, the potential universe of price quotations for the CPI is the total set of prices, placed in one-to-one correspondence to the total set of purchases of all urban consumers. The sample for ongoing pricing for the Commodities and Services (C&S) portion of the CPI is approximately 37,625 outlets with 124,172 price quotations per month.
The outlet response rate for ongoing pricing is 93.7% per month over the time period from October 2015 to September 2016. The roughly 6% non-response rate in outlets is due to refusals (we remove the outlet from the sample) or outlets being temporarily unavailable for pricing.
The response rate at initiation is 80.9% of eligible outlets. During initiation 10.5% of outlets are terminated, either because they refuse to participate (2.1%), are ineligible (7.1%), or cannot be located (1.3%). The following table presents response rates for outlets undergoing CPI initiation during the two most recent initiation cycles, August 2015 and February 2016:
| Type of Response at Initiation | Percent | 
| Data obtained | 75.0 | 
| Data pending – awaiting central office clearance, temporarily unavailable | 14.8 | 
| Refusal | 2.1 | 
| Ineligible – no CPI items available | 4.5 | 
| Ineligible – out-of-business, out of scope, outlet moved, outlet outside PSU | 2.5 | 
| Unable to locate | 1.2 | 
Following a formula from the Office of Management and Budget (OMB), the 80.9% response rate at initiation is calculated by dividing the percent of outlets with data obtained (75.0%) by the percent of eligible outlets, estimated by the sum of outlets with data obtained (75.0%), data pending (14.8%), refusals (2.1%), and the estimated unable to locate outlets that are eligible (1.2% x 92.9%). The estimate that 92.9% of the unable to locate units are eligible is based on the percentage of located units that are eligible. ([75.0% + 14.8% + 2.1%] / [75.0% + 14.8% + 2.1% + 4.5% + 2.5%]).
The estimation rate for repricing is determined by dividing the number of outlets (or quotes) used in estimation by the number of outlets (or quotes) collected in repricing, in other words, of all the outlets (or quotes) collected in repricing, how many (what percentage of those) were used in estimation. The table below reflects an estimate of an overall response rate, determined by multiplying the average estimation rate for repricing (October 2015 to September 2016) with the average initiation rate for the initiation cycles (August 2014 to July 2016).
| Type of Rates | Outlets | Quotes | 
| Average Estimation Rates for Repricing (Oct 2015 to Sept 2016) | 90.5 | 81.3 | 
| Average Initiation Rates (Aug 2014 to Jul 2016) | 83.0 | 79.4 | 
| Overall Estimated Response Rates (average estimation rate for repricing multiplied with the average initiation rate) | 
			 75.1 | 
			 64.6 | 
Collection Procedures
2.i. Description of Sampling Methodology
A multi-stage stratified sampling process is employed for the CPI. The four main stages of selection are: (1) the sampling of geographic areas, (2) the sampling of outlets within the geographic areas, (3) the sampling of entry-level items (ELIs) to be priced in the outlets, and (4) the sampling of unique items from each ELI in each outlet.
BLS selects Primary Sampling Units (PSUs) or geographic areas for pricing. The geographic area definitions used are the same as those used for Core Based Statistical Areas (CBSAs). The sample pricing areas are derived from a stratified design using a controlled selection procedure that provides for the selection of one sample area from each stratum with a control on the distribution of PSUs by metropolitan/micropolitan status. In the 1998 sample design, four independent variables were used for stratifying the non-self-representing PSUs: normalized (centered and scaled by the range) longitude, the square of normalized longitude, normalized latitude, and percent urban. The initial stratification for the 2018 PSU design was based on the variables average income, average property value, latitude, and longitude.
Each year BLS systematically selects a portion of the sample of outlets and quotes such that over a four-year period most C&S sample outlets have a chance to be replaced. Not only does this re-establish the distribution of the sample, incorporate new outlet construction and reflect shifts in outlet preferences, but it also allows many respondents to rotate out of the sample. Thus, all respondents are not indefinitely retained in the sample.
The outlet sampling frames are constructed from several sources. The primary source for all food and the majority of the other C&S items is the Telephone Point of Purchase Survey (TPOPS – OMB Control Number 1220-0044). The TPOPS provides coverage for 58% of all consumption expenditures for the CPI-U, as of December 2015. Renter and owner-occupied housing account for 32%. The remaining 10% of consumption expenditures are covered from a variety of sampling frames constructed by BLS or obtained from other sources.
The TPOPS is a computer assisted telephone collection survey, used to identify a universe of outlets from which CPI sample outlets are selected, and is conducted by the Census Bureau for BLS. TPOPS is made up of 214 purchase categories of goods and services, e.g., prescription drugs. Under TPOPS, during each quarter of the year, in rotating groups of PSU/purchase category groups, households are asked to identify the amount of their expenditures and the names and addresses of the outlets where purchases were made. Samples of outlets for pricing are selected from the TPOPS generated frames using a systematic sampling procedure. Each outlet has a probability of selection proportional to the expenditures reported for it on the TPOPS.
The sampling frames from which the item sample market baskets are derived are constructed using data from the most current two years of the Consumer Expenditure (CE – OMB Control Number 1220-0050) Survey, which is an ongoing survey. Each year as the CPI rotates a portion of the outlet sample and the ELIs are resampled too. With data from these surveys assembled into the CPI item classification structure, the CPI selects the sample of ELIs using a stratified random selection procedure with each ELI having a probability of selection proportional to the expenditures reported for it on the CE Survey.
The BLS Washington Office merges the sample of ELIs with the appropriate sample of outlets. BLS field representatives then initiate the new outlets and select the specific unique items to be priced within each ELI by following an outlet based multistage probability proportional to sales methodology.
2.ii. Description of Estimation Methodology
Based on December 2016 CPI-U relative importances, 56% of the CPI is calculated using a Geometric mean formula and 44% is based on the Laspeyres index formula. The Laspeyres portion is composed of Rent (8%), Owners’ equivalent rent (25%), and C&S items (12%) Also note that C&S items account for 68 % of the CPI-U weight.
A price index constructed using geometric means more closely approximates a true cost-of-living index than does the Laspeyres, for some items. This occurs because the geometric means formula, unlike the Laspeyres formula, implicitly assumes that product substitution takes place when relative prices change. The geometric means formula assumes that relative expenditures are kept constant over time.
The Laspeyres index formula in concept simply measures the change in the weighted arithmetic mean of prices. As a fixed-weight index, the Laspeyres formula assumes that consumers do not change the amount of each item purchased as relative prices change.
All C&S stratum indexes are calculated using a geometric formula, except for those listed below. Demand elasticity studies led BLS to conclude that the Laspeyres index formula would yield the least biased measure of price change for these items.
C&S Components retaining the Laspeyres (arithmetic mean) Formula
Lodging at school, excluding board
Electricity
Utility (piped) gas service
Water and sewerage maintenance
State motor vehicle registration and license fees
Physicians' services
Dental services
Services by other medical professionals
Hospital services
Nursing homes and adult day services
Prescription drugs
Price relatives.
The price relative for each basic item-area for C&S using the Geometric Mean is based on the formula:
 
The price relative for each basic item-area for C&S using Laspeyres is based on the formula:
 
 and
and
 ,
are, respectively, the geometric and Laspeyres price relatives for
area-item combination, a,i,
from the previous period, t-1
(either 1 month or 2 months ago), to the
current month, t;
,
are, respectively, the geometric and Laspeyres price relatives for
area-item combination, a,i,
from the previous period, t-1
(either 1 month or 2 months ago), to the
current month, t;
 is
the price of the jth
observed item in month t
for area-item combination a,i;
is
the price of the jth
observed item in month t
for area-item combination a,i;
 is
the price of the same item in
time t-1;
is
the price of the same item in
time t-1;
 is
an estimate of the item j’s
price in the sampling period when its TPOPS was conducted; and
is
an estimate of the item j’s
price in the sampling period when its TPOPS was conducted; and 
 is
item j’s
weight in the TPOPS, defined in detail below
is
item j’s
weight in the TPOPS, defined in detail below
The product in the geomeans formula and sums in the Laspeyres formula are taken over all useable quotes in area-item combination a, i. It is important that the price of each quote be collected (or estimated) in both months in order to measure price change.
Quote weights.
For each individual observation, the weight Wj,POPS is computed as:
Wj,POPS = (α E f g b )/ (M B)
where
α is the proportion of the total dollar volume of sales for the ELI relative to the entire Point of Purchase Survey category (POPS category) within the outlet (called the outlet’s percent of POPS for the ELI);
E is an estimate of the total daily expenditure for the POPS category in the PSU half-sample by people in the U population (called the basic weight);
f is a duplication factor that accounts for any special subsampling of outlets and quotes;
g is a geographic factor used to account for differences in the index area’s coverage when the CPI is changing from an area design based on an old decennial census to a design based on a more recent census;
b is the number of times the ELI was selected to represent the item stratum, divided by the total selections for the item stratum, in the PSU half-sample;
M is the number of quotes with usable prices in both months t-1 and t for the ELI-PSU half-sample; and
B is the proportion of the item stratum’s expenditure accounted for by the ELI in the region.
Index calculation.
When aggregating together price relatives above the elementary index level, the Laspeyres formula is used exclusively implying no substitution across elementary index cells in the CPI.
In mid-2002, BLS began publishing a Chained Consumer Price Index for All Urban Consumers (C-CPI-U). 1 The C-CPI-U is a monthly-chained index that uses a Tornqvist formula to aggregate indexes. This index is designed to be a closer approximation to a “cost-of-living index” than the present measures. By utilizing expenditure data in adjoining periods, it reflects consumer substitution across item categories in response to relative prices. The use of expenditure data for both a base period and the current period to average price change across item categories distinguishes the C-CPI-U from the existing CPI measures. Expenditure data required for the C-CPI-U calculations are available only with a lag. Thus, the C-CPI-U, unlike the CPI-U and CPI-W, is issued first in preliminary form and then subject to subsequent revisions. No additional data collection is required to support the publication of the C-CPI-U.
BLS periodically issues a report on its experimental index for the elderly. The CPI for the elderly or CPI-E is calculated monthly and is available on request. The CPI-E is a reweighting of the CPI basic indexes using expenditure weights from households headed by someone 62 years of age or older. No additional data collection is required to support the publication of the CPI-E.
2.iii. Degree of Accuracy Required
Section 2 of Title 29, Chapter 1, Subchapter 1, United States Code mandating the CPI does not specify a required precision or accuracy for the index. BLS requires that the precision of the CPI be maximized given the total cost constraint imposed by the authorized spending level. BLS developed an allocation model to examine relative efficiencies of various alternative sample designs. The objective of the allocation process is to determine values for all sample design parameters which will minimize the variance of price change for the CPI at the U.S. level subject to the total cost constraint of the CPI budget. The model uses a variance function to project the variance of price change given a set of sample design parameters. It also has a cost function to project the annual cost given a set of values for the sample design parameters. A non-linear programming technique is used to determine the set of values for the sample design parameters which minimizes the variance of price change given a cost constraint. 2
Since 1978, the CPI’s sample design has accomplished variance estimation by using two or more independent samples of items and outlets in each geographic area.3 This allows two or more statistically independent estimates of the index to be made. The independent samples are called replicates, and the set of all observed prices is called the full sample.
Currently, BLS collects CPI data in 38 geographic areas across the United States. These areas consist of 31 self-representing areas and 7 non-self-representing areas. Self-representing areas are large metropolitan areas, such as the Boston, St. Louis, and San Francisco metropolitan areas. Non-self-representing areas are collections of smaller metropolitan areas. For example, one non-self-representing area is a collection of 32 small metropolitan areas in the Northeast region (Buffalo, Hartford, Providence, Bangor, and others), of which 8 were randomly selected to represent the entire set. Within each of the 38 areas, price data are collected for 211 item categories called item strata. Together the 211 item strata cover all consumer purchases. Examples of item strata are bananas, women’s dresses, and electricity.
Multiplying the number of current areas by the number of item strata gives 8,018 (= 38 x 211) different area and item combinations for which price indexes need to be calculated. Separate price indexes are calculated for each one of these 8,018 area and item combinations. After all 8,018 of these basic-level indexes are calculated, they are aggregated to form higher-level indexes, using expenditure estimates from the Consumer Expenditure Survey as their weights. Examples of higher-level geographic areas are the four regions (Northeast, Midwest, South, and West); and examples of higher-level item categories are the eight major groups (food & beverages, housing, apparel, transportation, medical care, education and communication, recreation, and other goods and services). The highest level of geographic aggregation is the U.S. city average, and the highest level of item aggregation is all items. Variances are computed with a Stratified Random Groups Method, in which variances are computed separately for certain subsets of areas and items and are then combined to produce the variance of the entire area and item combination. Subsets of items are formed by the intersection of the item category with each of the eight major groups.
In 2018, BLS will introduce a new geographic area sample for the CPI. The new area sample will have 23 self-representing areas and 9 non-self-representing areas for a total of 32 geographic areas. The 23 self-representing areas include 21 PSUs whose population is greater than 2.5 million and 2 additional units - Anchorage, AK, and Honolulu, HI. Anchorage represents all CBSAs in Alaska, and Honolulu represents all CBSAs in Hawaii. These CBSAs are unique because the locations of both states make price change in their markets geographically isolated from that in other markets. For this reason, the CBSAs in Alaska and Hawaii are treated as separate geographic strata. With 23 self-representing PSUs and nine Census divisions, the new area design will yield 6,752 basic indexes (32 index areas by 211 item strata) for the U.S. all-items CPI. This reduction (approximately 16%) in the number of basic indexes will help address the small sample bias in index estimates.
The estimate of the CPI-U median standard error for 12-month intervals from January 2014 through December 2014 was 0.08 for All Items.
2.iv. Special Sampling -- Sampling of Time
The outlet samples of each PSU are divided into three pricing periods. Each outlet is designated for pricing during a specified period of the month. Therefore, a given item is priced at different times in different outlets in order to average out possible systematic differences between one time period of the month and another. Assigning pricing periods also ensures there is a full month between pricings for each monthly priced outlet or a full two months between pricings for bi-monthly collected outlets.
2.v. Use of Periodic Data Collection Cycles
Although BLS publishes monthly estimates of the CPI, prices for about only 59% of the total covered expenditures are collected monthly in all sampling areas. Of the 59% priced monthly, 32% reflects rent and owners’ equivalent rent and 27% C&S items.
Regarding
just the C&S portion (68%) of the total CPI expenditure weight,
27% is collected monthly and 41% is collected bi-monthly.  The
monthly priced C&S items include Food at home, Lodging away from
home, Tenants insurance, Household fuels, Motor fuels, Motor vehicle
parts, equipment and fees, Recreational reading materials, Education,
Postage and delivery, Telephone services, and Tobacco products. 
(Note, in the three largest areas, New York, Chicago and Los Angeles
all sampled items are priced monthly.)  Other commodities and
services are priced bi-monthly ("even" cycle--February,
April, June, August, October and December or "odd"
cycle--January, March, May, July, September and November.) 
Methods of Maximizing Response
BLS utilizes several techniques to ensure that adequate sample sizes are maintained for estimating the CPI. Initial sample sizes are larger than the desired sample sizes to cover initial non-responses, i.e., out-of-business, out-of-scope, refusal, sample items not available, and unable to locate. In rare circumstances, if the sample of outlets is deemed insufficient, the CPI will continue pricing the current sample.
Testing Plans/Procedures
Periodically, the CPI may test a new procedure or method to determine its validity. Prior to testing of any new questions CPI will submit a nonsubstantive change to OMB for approval.
Statistical Responsibility
W. John Layng, Assistant Commissioner, Division of Consumer Prices and Price Indexes, Office of Prices and Living Conditions of BLS (Telephone: 202-691-6950) is the CPI program manager and has overall responsibility for the CPI.
William Johnson, Chief of the Survey Research and Analysis Branch of the Price Statistical Methods Division of the Office of Prices and Living Conditions (Telephone: 202-691-6921) has reviewed and approved the statistical methodology for the survey design.
2 For a complete description of the allocation process, see: Jacobson, Shawn, Leaver, Sylvia G. and Swanson, David C. (1998), “Choosing a Variance Computation Method for The Revised Consumer Price Index,” Proceedings of the Business and Economics Statistical Section, American Statistical Association, 131-136, and Swanson, David C., (1999).
	
| File Type | application/vnd.openxmlformats-officedocument.wordprocessingml.document | 
| File Title | OMB Clearance C&S Supporting Statement | 
| Author | Daniel Ginsburg | 
| File Modified | 0000-00-00 | 
| File Created | 2021-01-22 |