Download:
pdf |
pdfATTACHMENT H
Source and Accuracy of the January 2008 CPS
Microdata File Displaced Worker,
Employee Tenure, and Occupational Mobility
SOURCE OF DATA
The data in this microdata file are from the January 2008 Current Population Survey (CPS). The U.S.
Census Bureau conducts the CPS every month, although this file has only January 2008 data. The
January 2008 survey uses two sets of questions, the basic CPS and a set of supplemental questions. The
CPS, sponsored jointly by the Census Bureau and the U.S. Bureau of Labor Statistics, is the county’s
primary source of labor force statistics for the entire population. The Census Bureau and the Bureau of
Labor Statistics also jointly sponsor the supplemental questions for January 2008.
Basic CPS. The monthly CPS collects primarily labor force data about the civilian noninstitutional
population living in the United States. The institutionalized population, which is excluded from the
population universe, is composed primarily of the population in correctional institutions and nursing
homes (91 percent of the 4.1 million institutionalized people in Census 2000). Interviewers ask questions
concerning labor force participation about each member 15 years old and over in sample households.
Typically, the week containing the nineteenth of the month is the interview week. The week containing
the twelfth is the reference week (i.e., the week about which the labor force questions are asked).
The CPS uses a multistage probability sample based on the results of the decennial census, with coverage
in all 50 states and the District of Columbia. The sample is continually updated to account for new
residential construction. When files from the most recent decennial census become available, the Census
Bureau gradually introduces a new sample design for the CPS..
In April 2004, the Census Bureau began phasing out the 1990 sample1 and replacing it with the 2000
sample, creating a mixed sampling frame. Two simultaneous changes occurred during this phase-in
period. First, primary sampling units (PSUs)2 selected for only the 2000 design gradually replaced those
selected for the 1990 design. This involved 10 percent of the sample. Second, within PSUs selected for
both the 1990 and 2000 designs, sample households from the 2000 design gradually replaced sample
households from the 1990 design. This involved about 90 percent of the sample. The new sample design
was completely implemented by July 2005.
In the first stage of the sampling process, PSUs are selected for sample. The United States is divided into
2,025 PSUs. The PSUs were redefined for this design to correspond to the Office of Management and
Budget definitions of Core-Based Statistical Area definitions and to improve efficiency in field
operations. These PSUs are grouped into 824 strata. Within each stratum, a single PSU is chosen for the
sample, with its probability of selection proportional to its population as of the most recent decennial
census. This PSU represents the entire stratum from which it was selected. In the case of strata
consisting of only one PSU, the PSU is chosen with certainty.
1
For detailed information on the 1990 sample redesign, please see reference [1].
2
The PSUs correspond to substate areas (i.e., counties or groups of counties) that are geographically contiguous.
16-1
Approximately 72,000 housing units were selected for sample from the sampling frame in January 2008.
Based on eligibility criteria, 11 percent of these housing units were sent directly to computer-assisted
telephone interviewing (CATI). The remaining units were assigned to interviewers for computer-assisted
personal interviewing (CAPI).3 Of all housing units in sample, about 59,000 were determined to be
eligible for interview. Interviewers obtained interviews at about 54,000 of these units. Noninterviews
occur when the occupants are not found at home after repeated calls or are unavailable for some other
reason.
January 2008 Supplement. In January 2008, in addition to the basic CPS questions, interviewers asked
supplementary questions about displacement of workers, employee tenure, and occupational mobility.
Questions concerning displaced workers were asked of all respondents who were at least 20 years old, and
questions concerning job tenure and occupational mobility were asked of all employed respondents who
were at least 15 years old.
Due to an error in the electronic instrument used to capture the supplement data, no individuals aged 15 or
16 were asked the job tenure and occupational mobility questions, and with only a few exceptions,
veterans aged 65 and older were not asked any of the supplement questions. The veterans who were not
given the opportunity to answer the displaced workers questions constituted 4.2 percent of the unweighted
sample and 3.9 percent of the weighted sample for those questions. The veterans and teens who were not
given the opportunity to answer the job tenure and occupational mobility questions constituted 1.8 percent
of the unweighted sample and 1.6 percent of the weighted sample for those questions.
The Census Bureau and the Bureau of Labor Statistics explored numerous options for overcoming this
problem. The option chosen used 2006 data as donor data for those cohorts who were not asked the
supplement questions. This included assigning a complete noninterview if that was the donor’s status on
the 2006 file. The following cells were used for this imputation process:
•
•
For employed individuals aged 15 or 16 years old: age (15, 16) by sex – 4 cells.
For veterans aged 65 or older: age (65-66, 67-70, 71-74, 75 and older) by employment
status (employed, unemployed, not in labor force) by sex – 24 cells.
After completion of this assignment of unedited supplement data, the standard supplement edits (identical
to the 2006 edits) were applied to all applicable records. After verification of the accuracy of the edits,
the data were weighted by applying supplement noninterview adjustments identical to the procedures used
in 2006.
Estimation Procedure. This survey’s estimation procedure adjusts weighted sample results to agree with
independently derived population estimates of the civilian noninstitutional population of the United States
and each state (including the District of Columbia). These population estimates, used as controls for the
CPS, are prepared monthly to agree with the most current set of population estimates that are released as
part of the Census Bureau’s population estimates and projections program.
The population controls for the nation are distributed by demographic characteristics in two ways:
•
•
3
Age, sex, and race (White alone, Black alone, and all other groups combined).
Age, sex, and Hispanic origin.
For further information on CATI and CAPI and the eligibility criteria, please see reference [2].
16-2
The population controls for the states are distributed by race (Black alone and all other race groups
combined), age (0-15, 16-44, and 45 and over), and sex.
The independent estimates by age, sex, race, and Hispanic origin, and for states by selected age groups
and broad race categories, are developed using the basic demographic accounting formula whereby the
population from the latest decennial data is updated using data on the components of population change
(births, deaths, and net international migration) with net internal migration as an additional component in
the state population estimates.
The net international migration component in the population estimates includes a combination of the
following:
•
•
•
•
•
Legal migration to the United States.
Emigration of foreign-born and native people from the United States.
Net movement between the United States and Puerto Rico.
Estimates of temporary migration.
Estimates of net residual foreign-born population, which include unauthorized migration.
Because the latest available information on these components lags the survey date, it is necessary to make
short-term projections of these components to develop the estimate for the survey date.
ACCURACY OF THE ESTIMATES
A sample survey estimate has two types of error: sampling and nonsampling. The accuracy of an estimate
depends on both types of error. The nature of the sampling error is known given the survey design; the
full extent of the nonsampling error is unknown.
Sampling Error. Since the CPS estimates come from a sample, they may differ from figures from an
enumeration of the entire population using the same questionnaires, instructions, and enumerators. For a
given estimator, the difference between an estimate based on a sample and the estimate that would result
if the sample were to include the entire population is known as sampling error. Standard errors, as
calculated by methods described in “Standard Errors and Their Use,” are primarily measures of the
magnitude of sampling error. However, they may include some nonsampling error.
Nonsampling Error. For a given estimator, the difference between the estimate that would result if the
sample were to include the entire population and the true population value being estimated is known as
nonsampling error. There are several sources of nonsampling error that may occur during the
development or execution of the survey. It can occur because of circumstances created by the
interviewer, the respondent, the survey instrument, or the way the data are collected and processed. For
example, errors could occur because:
•
•
The interviewer records the wrong answer, the respondent provides incorrect information,
the respondent estimates the requested information, or an unclear survey question is
misunderstood by the respondent (measurement error).
Some individuals that should have been included in the survey frame were missed
(coverage error).
16-3
•
•
•
Responses are not collected from all those in the sample or the respondent is unwilling to
provide information (nonresponse error).
Values are estimated imprecisely for missing data (imputation error).
Forms may be lost, data may be incorrectly keyed, coded, or recoded, etc. (processing
error).
To minimize these errors, the Census Bureau applies quality control procedures during all stages of the
production process including the design of the survey, the wording of questions, the review of the work of
interviewers and coders, and the statistical review of reports.
Two types of nonsampling error that can be examined to a limited extent are nonresponse and
undercoverage.
Nonresponse. The effect of nonresponse cannot be measured directly, but one indication of its potential
effect is the nonresponse rate. For the January 2008 basic CPS, the household-level nonresponse rate was
8.4 percent. The person-level nonresponse rate for the displaced workers, employee tenure, and
occupational mobility supplement was an additional 4.8 percent. (Veterans aged 65 and older were
excluded from this rate computation due to the circumstances discussed in "January 2008 Supplement.")
Since the basic CPS nonresponse rate is a household-level rate and the displaced workers, employee
tenure, and occupational mobility supplement nonresponse rate is a person-level rate, we cannot combine
these rates to derive an overall nonresponse rate. Nonresponding households may have fewer persons
than interviewed ones, so combining these rates may lead to an overestimate of the true overall
nonresponse rate for persons for the displaced workers, employee tenure, and occupational mobility
supplement.
Coverage. The concept of coverage in the survey sampling process is the extent to which the total
population that could be selected for sample “covers” the survey’s target population. Missed housing
units and missed people within sample households create undercoverage in the CPS. Overall CPS
undercoverage for January 2008 is estimated to be about 12 percent. CPS coverage varies with age, sex,
and race. Generally, coverage is larger for females than for males and larger for non-Blacks than for
Blacks. This differential coverage is a general problem for most household-based surveys.
The CPS weighting procedure partially corrects for bias from undercoverage, but biases may still be
present when people who are missed by the survey differ from those interviewed in ways other than age,
race, sex, Hispanic origin, and state of residence. How this weighting procedure affects other variables in
the survey is not precisely known. All of these considerations affect comparisons across different surveys
or data sources.
A common measure of survey coverage is the coverage ratio, calculated as the estimated population
before poststratification divided by the independent population control. Table 1 shows January 2008 CPS
coverage ratios by age and sex for certain race and Hispanic groups. The CPS coverage ratios can exhibit
some variability from month to month.
16-4
Table 1. CPS Coverage Ratios: January 2008
Totals
White only
Black only
Residual race
Hispanic
All
Age
Male Female Male Female Male Female Male Female Male Female
group people
0.88
0.89
0.87
0.90
0.88
0.80
0.77
0.91
0.93
0.94
0.89
0-15
0.87
0.86
0.87
0.88
0.84
0.84
0.95
0.72
0.97
0.96
16-19 0.87
20-24 0.80
0.78
0.82
0.80
0.83
0.67
0.76
0.82
0.84
0.88
0.95
0.78
0.84
0.80
0.85
0.63
0.76
0.81
0.88
0.77
0.89
25-34 0.81
0.84
0.91
0.86
0.93
0.78
0.82
0.81
0.81
0.83
0.96
35-44 0.88
45-54 0.89
0.87
0.91
0.89
0.92
0.79
0.86
0.85
0.89
0.77
0.88
0.92
0.93
0.93
0.94
0.86
0.90
0.83
0.94
0.84
0.92
55-64 0.92
0.93
0.94
0.93
0.94
0.93
0.90
0.96
0.93
0.83
0.78
0.83
65+
0.88
0.86
0.89
0.87
0.91
0.77
0.84
0.84
0.85
0.83
0.91
15+
0+
0.88
0.87
0.89
0.88
0.90
0.78
0.82
0.86
0.87
0.86
0.91
Notes: (1) The Residual race group includes cases indicating a single race other than White or Black,
and cases indicating two or more races.
(2) Hispanics may be any race. For a more detailed discussion on the use of parameters for race
and ethnicity, please see the “Generalized Variance Parameters” section.
Comparability of Data. Data obtained from the CPS and other sources are not entirely comparable.
This results from differences in interviewer training and experience and in differing survey processes.
This is an example of nonsampling variability not reflected in the standard errors. Therefore, caution
should be used when comparing results from different sources.
Data users should be careful when comparing the data from this microdata file, which reflects Census
2000-based controls, with microdata files from March 1994 through December 2001, which reflect 1990
census-based controls. Ideally, the same population controls should be used when comparing any
estimates. In reality, the use of same population controls is not practical when comparing trend data over
a period of 10 to 20 years. Thus, when it is necessary to combine data or compare data based on different
controls or different designs, data users should be aware that changes in weighting controls or weighting
procedures can create small differences between estimates. See the discussion following for information
on comparing estimates derived from different controls or different sample designs.
Microdata files from previous years reflect the latest available census-based controls. Although the most
recent change in population controls had relatively little impact on summary measures such as averages,
medians, and percentage distributions, it did have a significant impact on levels. For example, use of
Census 2000-based controls results in about a one percent increase from the 1990 census-based controls
in the civilian noninstitutional population and in the number of families and households. Thus, estimates
of levels for data collected 2003 and later years will differ from those for earlier years by more than what
could be attributed to actual changes in the population. These differences could be disproportionately
greater for certain population subgroups than for the total population.
Note that certain microdata files from 2002, namely June, October, and November, and the 2002 ASEC,
contain both Census 2000-based estimates and 1990 census-based estimates and are subject to the
comparability issues discussed previously. All other microdata files from 2002 reflect the 1990 censusbased controls.
16-5
Users should also exercise caution because of changes caused by the phase-in of the Census 2000 files
(see “Basic CPS”). During this time period, CPS data are collected from sample designs based on
different censuses. Three features of the new CPS design have the potential of affecting published
estimates: (1) the temporary disruption of the rotation pattern from August 2004 through June 2005 for a
comparatively small portion of the sample, (2) the change in sample areas, and (3) the introduction of the
new Core-Based Statistical Areas (formerly called metropolitan areas). Most of the known effect on
estimates during and after the sample redesign will be the result of changing from 1990 to 2000
geographic definitions. Research has shown that the national-level estimates of the metropolitan and
nonmetropolitan populations should not change appreciably because of the new sample design. However,
users should still exercise caution when comparing metropolitan and nonmetropolitan estimates across
years with a design change, especially at the state level.
Caution should also be used when comparing Hispanic estimates over time. No independent population
control totals for people of Hispanic origin were used before 1985.
A Nonsampling Error Warning. Since the full extent of the nonsampling error is unknown, one should
be particularly careful when interpreting results based on small differences between estimates. The
Census Bureau recommends that data users incorporate information about nonsampling errors into their
analyses, as nonsampling error could impact the conclusions drawn from the results. Caution should also
be used when interpreting results based on a relatively small number of cases. Summary measures (such
as medians and percentage distributions) probably do not reveal useful information when computed on a
subpopulation smaller than 75,000.
For additional information on nonsampling error including the possible impact on CPS data when known,
refer to references [2] and [3].
Standard Errors and Their Use. The sample estimate and its standard error enable one to construct a
confidence interval. A confidence interval is a range about a given estimate that has a specified
probability of containing the average result of all possible samples. For example, if all possible samples
were surveyed under essentially the same general conditions and using the same sample design, and if an
estimate and its standard error were calculated from each sample, then approximately 90 percent of the
intervals from 1.645 standard errors below the estimate to 1.645 standard errors above the estimate would
include the average result of all possible samples.
A particular confidence interval may or may not contain the average estimate derived from all possible
samples, but one can say with specified confidence that the interval includes the average estimate
calculated from all possible samples.
Standard errors may also be used to perform hypothesis testing, a procedure for distinguishing between
population parameters using sample estimates. The most common type of hypothesis is that the
population parameters are different. An example of this would be comparing the percentage of men who
were part-time workers to the percentage of women who were part-time workers.
Tests may be performed at various levels of significance. A significance level is the probability of
concluding that the characteristics are different when, in fact, they are the same. For example, to
conclude that two characteristics are different at the 0.10 level of significance, the absolute value of the
estimated difference between characteristics must be greater than or equal to 1.645 times the standard
error of the difference.
16-6
The Census Bureau uses 90-percent confidence intervals and 0.10 levels of significance to determine
statistical validity. Consult standard statistical textbooks for alternative criteria.
Estimating Standard Errors. The Census Bureau uses replication methods to estimate the standard
errors of CPS estimates. These methods primarily measure the magnitude of sampling error. However,
they do measure some effects of nonsampling error as well. They do not measure systematic biases in the
data associated with nonsampling error. Bias is the average over all possible samples of the differences
between the sample estimates and the true value.
Generalized Variance Parameters. While it is possible to compute and present an estimate of the
standard error based on the survey data for each estimate in a report, there are a number of reasons why
this is not done. A presentation of the individual standard errors would be of limited use, since one could
not possibly predict all of the combinations of results that may be of interest to data users. Additionally,
data users have access to CPS microdata files, and it is impossible to compute in advance the standard
error for every estimate one might obtain from those data sets. Moreover, variance estimates are based on
sample data and have variances of their own. Therefore, some methods of stabilizing these estimates of
variance, for example, by generalizing or averaging over time, may be used to improve their reliability.
Experience has shown that certain groups of estimates have similar relationships between their variances
and expected values. Modeling or generalizing may provide more stable variance estimates by taking
advantage of these similarities. The generalized variance function is a simple model that expresses the
variance as a function of the expected value of the survey estimate. The parameters of the generalized
variance function are estimated using direct replicate variances. These generalized variance parameters
provide a relatively easy method to obtain approximate standard errors for numerous characteristics. In
this source and accuracy statement, Table 3 provides the generalized variance parameters for labor force
estimates and estimates from the January 2008 supplement.
The basic CPS questionnaire records the race and ethnicity of each respondent. With respect to race, a
respondent can be White, Black, Asian, American Indian or Alaskan Native (AIAN), Native Hawaiian or
Other Pacific Islander (NHOPI), or combinations of two or more of the preceding. A respondent’s
ethnicity can be Hispanic or non-Hispanic, regardless of race.
The generalized variance parameters to use in computing standard errors are dependent upon the
race/ethnicity group of interest. The following table summarizes the relationship between the
race/ethnicity group of interest and the generalized variance parameters to use in standard error
calculations.
16-7
Table 2. Estimation Groups of Interest and Generalized Variance Parameters
Generalized variance parameters
to use in standard error calculations
Race/ethnicity group of interest
Total population
Total or White
Total White, White AOIC, or White non-Hispanic population
Total or White
Total Black, Black AOIC, or Black non-Hispanic population
Black
Total Asian, AIAN, NHOPI;
Asian, AIAN, NHOPI AOIC;
or Asian, AIAN, NHOPI non-Hispanic population
Asian, AIAN, NHOPI
Populations from other race groups
Asian, AIAN, NHOPI
Hispanic population
Hispanic
Two or more races – employment/unemployment and
educational attainment characteristics
Two or more races – all other characteristics
Black
Asian, AIAN, NHOPI
Notes: (1) AIAN, NHOPI are American Indian and Alaska Native, Native Hawaiian and Other Pacific
Islander, respectively.
(2) AOIC is an abbreviation for alone or in combination. The AOIC population for a race group of interest
includes people reporting only the race group of interest (alone) and people reporting multiple race
categories including the race group of interest (in combination).
(3) Hispanics may be any race.
(4) Two or more races refers to the group of cases self-classified as having two or more races.
Standard Errors of Estimated Numbers. The approximate standard error, sx, of an estimated number
from this microdata file can be obtained by using the formula:
sx
ax 2
(1)
bx
Here x is the size of the estimate and a and b are the parameters in Table 3 associated with the particular
type of characteristic. When calculating standard errors from cross-tabulations involving different
characteristics, use the set of parameters for the characteristic that will give the largest standard error.
Illustration 1
Suppose there were 4,075,000 unemployed men in the civilian labor force. Use the appropriate
parameters from Table 3 and Formula (1) to get
Illustration 1
Number of unemployed males in the civilian
labor force (x)
a parameter (a)
b parameter (b)
Standard error
90-percent confidence interval
16-8
4,075,000
-0.000032
2,971
108,000
3,897,000 to 4,253,000
The standard error is calculated as
sx
0.000032
4,075 ,000 2
2,971 4,075 ,000
108 ,000
The 90-percent confidence interval is calculated as 4,075,000 ± 1.645 × 108,000.
A conclusion that the average estimate derived from all possible samples lies within a range computed in
this way would be correct for roughly 90 percent of all possible samples.
Standard Errors of Estimated Percentages. The reliability of an estimated percentage, computed using
sample data for both numerator and denominator, depends on both the size of the percentage and its base.
Estimated percentages are relatively more reliable than the corresponding estimates of the numerators of
the percentages, particularly if the percentages are 50 percent or more. When the numerator and
denominator of the percentage are in different categories, use the parameter from Table 3 as indicated by
the numerator.
The approximate standard error, sy,p, of an estimated percentage can be obtained by using the formula:
s y ,p
b
p (100
y
(2)
p)
Here y is the total number of people, families, households, or unrelated individuals in the base of the
percentage, p is the percentage (0 ≤ p ≤ 100), and b is the parameter in Table 3 associated with the
characteristic in the numerator of the percentage.
Illustration 2
Suppose of 8,338,000 displaced workers, 3,109,000, or 37.3 percent, lost their jobs when a plant or
company closed down or moved. Use the appropriate parameter from Table 3 and Formula (2) to get
Illustration 2
Percentage of displaced workers who lost their
jobs when a plant or company closed down
or moved (p)
Base (x)
b parameter (b)
Standard error
90-percent confidence interval
37.3
8,338,000
3,096
0.93
35.8 to 38.8
The standard error is calculated as
s x, p
3,096
8,338 ,000
37 .3 (100 37 .3)
0.93
The 90-percent confidence interval for the percentage of displaced workers who lost their jobs when a
plant or company closed down or moved is from 35.8 to 38.8 percent (i.e., 37.3 ± 1.645 × 0.93).
16-9
Standard Errors of Estimated Differences. The standard error of the difference between two sample
estimates is approximately equal to
s x1
s 2x1
x2
s 2x 2
(3)
where sx and sy are the standard errors of the estimates, x1 and x2. The estimates can be numbers,
percentages, ratios, etc. This will result in accurate estimates of the standard error of the same
characteristic in two different areas, or for the difference between separate and uncorrelated
characteristics in the same area. However, if there is a high positive (negative) correlation between the
two characteristics, the formula will overestimate (underestimate) the true standard error.
Illustration 3
Suppose that of 8,732,000 employed men between 25-29 years of age, 73,000 or 0.8 percent were parttime workers, and of the 7,305,000 employed women between 25-29 years of age, 137,000 or 2.0 percent
were part-time workers. Use the appropriate parameters from Table 3 and Formulas (2) and (3) to get
Illustration 3
Male (x1)
Female (x2)
Percentage 25-29 years
working part-time (p)
Base
b parameter (b)
Standard error
90-percent confidence
interval
Difference
0.8
8,732,000
2,971
0.16
2.0
1.2
7,305,000
2,782
0.27
0.31
0.5 to 1.1
1.6 to 2.4
0.7 to 1.7
The standard error of the difference is calculated as
s x1
x2
0.16 2
0.27 2
0.31
The 90-percent confidence interval around the difference is calculated as 1.2 ± 1.645 × 0.31. Since this
interval does not include zero, we can conclude with 90 percent confidence that the percentage of parttime women workers between 25-29 years of age is greater than the percentage of part-time men workers
between 25-29 years of age.
Standard Errors of Estimated Medians. The sampling variability of an estimated median depends on
the form of the distribution and the size of the base. One can approximate the reliability of an estimated
median by determining a confidence interval about it. (See “Standard Errors and Their Use” for a general
discussion of confidence intervals.)
Estimate the 68-percent confidence limits of a median based on sample data using the following
procedure.
1.
Determine, using Formula (2), the standard error of the estimate of 50 percent from the
distribution.
16-10
2.
Add to and subtract from 50 percent the standard error determined in step 1. These two numbers
are the percentage limits corresponding to the 68-percent confidence interval about the estimated
median.
3.
Using the distribution of the characteristic, determine upper and lower limits of the
68-percent confidence interval by calculating values corresponding to the two points
established in step 2.
Note: The percentage limits found in step 2 may or may not fall in the same characteristic
distribution interval.
Use the following formula to calculate the upper and lower limits.
Xp
pN
NL
NU
NL
(U
L)
L
(5)
where
Xp =
estimated upper and lower limits for the confidence interval
(0 p 1). For purposes of calculating the confidence interval, p takes on
the values determined in step 2. Note that Xp estimates the median when p
= 0.50.
N =
for distribution of numbers: the total number of units (people,
households, etc.) for the characteristic in the distribution.
=
p =
L, U =
for distribution of percentages: the value 100.
the values obtained in Step 2.
the lower and upper boundaries, respectively, of the interval containing Xp.
Note: For continuous data, i.e., income, time, etc., the upper bound of the
interval containing Xp and lower bound of the next interval are essentially
the same and will be treated as such in the illustration.
4.
NL, NU =
for distribution of numbers: the estimated number of units
(people, households, etc.) with values of the characteristic less than L and
U, respectively.
=
for distribution of percentages: the estimated percentage of units (people,
households, etc.) having values of the characteristic less than L and U,
respectively.
Divide the difference between the two points determined in step 3 by 2 to obtain the standard error
of the median.
16-11
Note: Medians and their standard errors calculated as below may differ from those in published tables
and reports showing medians, since narrower income intervals were used in those calculations.
Illustration 4
Suppose you want to calculate the standard error of the estimated median of years on the lost job for all
displaced workers with the following distribution
Years on lost job
<1
1-2.99
3-4.99
5-9.99
10-14.99
15-19.99
20+
Total
Illustration 4
Cumulative number
Number of persons
of persons
1,847,000
1,847,000
2,199,000
4,046,000
1,257,000
5,303,000
1,267,000
6,570,000
527,000
7,097,000
270,000
7,367,000
383,000
7,750,000
7,750,000
Cumulative percentage
of persons
23.83
52.21
68.43
84.77
91.57
95.06
100.00
1.
Using Formula (2) with b = 3,096 from Table 3, the standard error of 50 percent with a base of
7,750,000 is 1.00 percent.
2.
To obtain a 68-percent confidence interval on an estimated median, add to and subtract from 50
percent the standard error found in step 1. This yields percentage limits of 49.00 and 51.00.
3.
The lower and upper boundaries for the interval in which the percentage limits fall are L = 1 year
to U = 3 years, respectively.
Then the estimated number of displaced workers with years on the lost job between 1 and 3 are NL
= 1,847,000 and NU = 4,046,000, respectively.
Using Formula (4), the lower limit for the confidence interval of the median is found to be about
X 0.4900
0.4900 7,750 ,000 1,847 ,000
(3 1) 1
4,046 ,000 1,847 ,000
2.77
Similarly, the upper limit is found to be about
X 0.5100
0.5100 7,750 ,000 1,847 ,000
(3 1) 1
4,046 ,000 1,847 ,000
2.91
Thus, a 68-percent confidence interval for the median number of years on the lost job for
displaced workers is from 2.77 to 2.91
16-12
4.
The standard error of the median is, therefore,
2.91 2.77
2
0.07
Standard Errors of Quarterly or Yearly Averages. For information on calculating standard errors for
labor force data from the CPS which involve quarterly or yearly averages, please see the “Explanatory
Notes and Estimates of Error: Household Data” section in Employment and Earnings, a monthly report
published by the U.S. Bureau of Labor Statistics.
Technical Assistance. If you require assistance or additional information, please contact the
Demographic Statistical Methods Division via e-mail at dsmd.source.and.accuracy@census.gov.
16-13
Table 3. Parameters for Computation of Standard Errors for Labor Force Characteristics:
January 2008
Characteristic
a
b
Civilian Labor Force, Employed
Not in Labor Force
Unemployed
-0.000016
-0.000009
-0.000016
3,068
1,833
3,096
Civilian Labor Force, Employed, Not in Labor Force, and Unemployed
Men
Women
Both sexes, 16 to 19 years
-0.000032
-0.000031
-0.000022
2,971
2,782
3,096
-0.000151
-0.000311
-0.000252
-0.001632
3,455
3,357
3,062
3,455
-0.000141
-0.000253
-0.000266
-0.001528
3,455
3,357
3,062
3,455
-0.000346
-0.000729
-0.000659
-0.004146
3,198
3,198
3,198
3,198
Total or White
Black
Civilian Labor Force, Employed, Not in Labor Force, and Unemployed
Total
Men
Women
Both sexes, 16 to 19 years
Hispanic
Civilian Labor Force, Employed, Not in Labor Force, and Unemployed
Total
Men
Women
Both sexes, 16 to 19 years
Asian, AIAN, NHOPI
Civilian Labor Force, Employed, Not in Labor Force, and Unemployed
Total
Men
Women
Both sexes, 16 to 19 years
Notes: (1) These parameters are to be applied to basic CPS monthly labor force estimates and to the January 2008
displaced workers, employee tenure, and occupational mobility supplement data.
(2) AIAN, NHOPI are American Indian and Alaska Native, Native Hawaiian and Other Pacific Islander, respectively.
(3) The Total or White; Black; and Asian, API, NHOPI parameters are to be used for both alone and in combination
race group estimates.
(4) For foreign-born and noncitizen characteristics for Total or White, the a and b parameters should be multiplied by
1.3. No adjustment is necessary for foreign-born and noncitizen characteristics for Black; Hispanic; and Asian,
AIAN, NHOPI.
(5) Hispanics may be any race. For a more detailed discussion on the use of parameters for race and ethnicity, please
see the “Generalized Variance Parameters” section.
(5) For nonmetropolitan characteristics, multiply the a and b parameters by 1.5. If the characteristic of interest is
total state population, not subtotaled by race or ethnicity, the a and b parameters are zero.
(7) For the group self-classified as having two or more races, use the Black parameters for all characteristics on
employment and unemployment.
16-14
References
[1]
Bureau of Labor Statistics. 1994. Employment and Earnings. Volume 41 Number 5, May 1994.
Washington, DC: Government Printing Office.
[2]
U.S. Census Bureau. 2006. Current Population Survey: Design and Methodology. Technical
Paper 66. Washington, DC: Government Printing Office.
(http://www.census.gov/prod/2006pubs/tp-66.pdf)
[3]
Brooks, C.A. and Bailar, B.A. 1978. Statistical Policy Working Paper 3 – An Error Profile:
Employment as Measured by the Current Population Survey. Subcommittee on Nonsampling
Errors, Federal Committee on Statistical Methodology, U.S. Department of Commerce,
Washington, DC. (http://www.fcsm.gov/working-papers/spp.html)
16-15
File Type | application/pdf |
Author | ASCDDEMO |
File Modified | 2009-06-11 |
File Created | 2009-06-11 |