Download:
pdf |
pdfAppendix B
National Survey of Children’s Health
Sample Frame and Sampling Flags Creation
2019 National Survey of Children’s Health sample frame
John Voorheis
Center for Economic Studies
US Census Bureau
john.l.voorheis@census.gov
301-763-5326
April 9, 2019
This document describes using administrative records to build a sample frame for the National
Survey of Children’s Health (NSCH) for 2019.
Population of interest
The population of interest is all children residing in housing units in the US on the date of the
survey.
A sample frame for all households with children
The sample frame identifies three mutually exclusive strata:
• [1] Households with explicit links to children in administrative data.
• [2a] Households without explicit links to children in administrative data, but predicted to
be likely to have children conditional on administrative data.
• [2b] Households without explicit links to children in administrative data, but predicted to
be unlikely to have children conditional on administrative data.
This document first explains the construction of the Stratum 1 flag, and then documents the
separation of Strata 2a and 2b.
The U.S. Census Bureau reviewed this data product for unauthorized disclosure of confidential
information and approved the disclosure avoidance practices applied to this release.
CBDRB-FY19-245
B-2
Stratum 1: identifying explicit links from children to addresses
The Stratum 1 flag for all households with explicit links to children comes from three data
sources: the Numident, a list of Social Security Number applicants with data updated from
various administrative records; and the Census kidlink file, a prototype linkage between
children and parents based on Census and administrative records. Household addresses are
updated with the Master Address Auxiliary Reference File, a file that links person identifiers
with the latest location updates from a variety of administrative data.
Using the Numident to identify children
The Numident is based on off the all individuals who have been assigned Social Security
Numbers. Demographic data from the Numident is updated from federal tax data and various
administrative records. There are 87,140,000 children in the 2018 Numident who will be aged
0–17 years on June 1, 2019. Figure 1 shows the distribution of date of birth for these children.
Figure 1: Distribution of date of birth, aged 0–17 years as of June 1, 2019 (2018 Numident)
Identifying the households containing the children in the Numident
To sample households with children, we must connect the children in the Numident to the
households in which they live. We do this with the Census kidlink file.
Census kidlink
The U.S. Census Bureau reviewed this data product for unauthorized disclosure of confidential
information and approved the disclosure avoidance practices applied to this release.
CBDRB-FY19-245
B-3
The Census kidlink file uses data from Census survey and federal administrative records to link
children PIKs to parent PIKs. We can use this file to identify the parents of children in the
Numident.
The source data for the Census kidlink file are: the Census Numident, the 2010 Census Unedited
File, the IRS 1040 and 1099 files, the Medicare Enrollment Database (MEDB), Indian Health
Service database (IHS), Selective Service System (SSS), and Public and Indian Housing (PIC) and
Tenant Rental Assistance Certification System (TRACS) data from the Department of Housing
and Urban Development. Of these, the IRS 1040 provides the most significant information.
In the Census kidlink file generated March 2018, there are 62,020,000 unique records for
children who will be aged 0–17 years on June 1, 2019.
In addition to the links between parents and children available in the Census Kidlink, we will
also utilize the links between household members which can be measured in the American
Community Survey, which is not an underlying data source for the Census Kidlink. For each child
in the Numident aged 0-17 on June 1, 2019, we harvest relationships with the head of
household and the spouse of the head of household. We then use these links to supplement
the links in the Census kidlink.
Let us consider how many children from the Numident have been linked to a parent in the
Census kidlink file or to a parent in the ACS. Table 1 shows the number of children linked with
both a mother and a father, linked with a mother only, linked with a father only, linked with a
parent in the ACS or not linked with any parent.
Table 1: Child-parent links in the Census kidlink file relative to the Numident population, aged
0–17 years as of 2018, March 2018 Census kidlink file and ACS
Type of link
Frequency
Percent
Mother and father
57,920,000
66%
Mother only
15,380,000
18%
Father only
2,836,000
3.3%
73,100
0.1%
10,930,000
13%
All children in Numident 87,140,000
100%
ACS link
No link
The U.S. Census Bureau reviewed this data product for unauthorized disclosure of confidential
information and approved the disclosure avoidance practices applied to this release.
CBDRB-FY19-245
B-4
Figure 2 compares the distributions of date of birth for these children against the distribution
shown in Figure 1.
Figure 2: Frequency distributions of date of birth, Numident vs. kidlink entries, aged 0–17 years
as of June 1, 2018
The CARRA kidlink file was updated in March 2018 for NSCH sample frame production. We will
use the same CARRA kidlink file for production in 2019. We will, however, supplement this file
with additional parent-child linkages identified in sources which are not used to build the
CARRA kidlink file, including ACS and CPS-ASEC data.
Updating household location using the MAF-ARF
In order to update household location, we use a Census dataset called the Master Address
Auxiliary Reference File (MAF-ARF). The MAF-ARF links person identifiers to address identifiers
using Census survey data and federal administrative data. The source data for the MAF-ARF file
are: the Census Numident, the 2010 Census Unedited File, the IRS 1040 and 1099 files, the
Medicare Enrollment Database (MEDB), Indian Health Service database (IHS), Selective Service
System (SSS), and Public and Indian Housing (PIC) and Tenant Rental Assistance Certification
The U.S. Census Bureau reviewed this data product for unauthorized disclosure of confidential
information and approved the disclosure avoidance practices applied to this release.
CBDRB-FY19-245
B-5
System (TRACS) data from the Department of Housing and Urban Development, and National
Change of Address data from the US Postal Service. Of these, the IRS 1040 provides the most
significant information.
Out of 87,140,000 1 children in the Numident, 72,090,000 are matched directly to a MAFID. Out
of 73,300,000 kidlink-matched mothers, about 66,900,000 are matched to a MAFID. Out of
60,750,000 kidlink-matched fathers, about 55,440,000 are matched to a MAFID. Additionally,
out of 9,430,000 ACS-matched parents, 8,799,000 are matched to a MAFID.
For each child observation from the Numident, we now have multiple possible MAFIDs: the kid
to MAF-ARF MAFID, the child-to-kidlink-to-mother-to-MAF-ARF MAFID, the child-to-kidlink-tofather-to-MAF-ARF MAFID, and the child-to-ACS parent-to-MAF-ARF MAFID. We allocate a
single MAFID to each child using that order. First, we assign the directly identified child MAFID
(69,380,000 cases). If the MAFID is missing, we assign the mother MAFID (5,968,000 cases).
Then, if the MAFID is still missing, we assign the father MAFID (2,218,000 cases). Finally, if the
child, kidlink mother and kidlink father MAFIDs are missing, we assign the ACS parent MAFID
(30,500 cases). That leaves 9,542,000 children from the Numident not assigned MAFIDs (a
MAFID match rate of 89.1%).
There are some MAFIDs associated with a great number of children. As an example, out of
77,600,000 associated with a MAFID, 7,231,000 children are associated with a MAFID with
more than 20 child-MAFID links.
The 77,600,000 children associated with a MAFID are then collapsed down to 40,020,000
unique MAFIDS. This implies 1.94 children per household for households assigned a flag.
For 2019, we apply one additional step in the construction of stratum 1. We use administrative
HUD PIC and TRACS data, which contain flags for the number of children present at the
household level for all public housing and voucher households, to enhance the existing stratum
1 process. We merge all MAFIDs not assigned a stratum 1 flag using the above kidlink-MAF-ARF
process with the most recent data on all public housing and voucher households in the PICTRACS data. We will then assign a stratum 1 flag to all households which have a child present
flag in the HUD data. This adds 215,000 households to stratum 1.
We then need to scale up the MAFID list to the universe of MAFIDs to allow sampling of
unflagged households. A merge of the 40,020,000 unique child-flagged MAFIDS with the
January 2018 ACS and 2019 MAF-X file matches 40,000,000 MAFIDS with child flags, adds
164,200,000 MAFIDS without child flags, and removes 19,000 MAFIDs with child flags. The
sample frame file now has about 203 million valid MAFIDS. Compare this with the 2011 ACS, in
which about 37 million out of 115 million households included related children. 2
1
All unweighted counts and estimates in this document are rounded to no more than four significant figures in
accordance with Census Disclosure Review Board rules on rounding.
2
http://www.census.gov/prod/2013pubs/p20-570.pdf
The U.S. Census Bureau reviewed this data product for unauthorized disclosure of confidential
information and approved the disclosure avoidance practices applied to this release.
CBDRB-FY19-245
B-6
Stratum 1 construction visualization
Figure 3 shows a visualization of the sample frame construction.
Figure 3: Stratum 1 Construction
The U.S. Census Bureau reviewed this data product for unauthorized disclosure of confidential
information and approved the disclosure avoidance practices applied to this release.
CBDRB-FY19-245
B-7
Strata 2a and 2b: identifying probabilistic links from children to
addresses
In 2016, the Stratum 1 flag performed well. That is, it contained approximately the same rate of
children after as sampling as had been predicted before the survey. The survey team would like
to further increase the sampling efficiency of the survey by adding more information to the
second stratum. By definition, Stratum 2 does not have explicit links from children to
households in the administrative data. In 2019 as in 2017 and 2018, we will further bifurcate
Stratum 2 into those households more likely to have children and those households less likely
to have children.
Households will be assigned to Stratum 2a based on a model of child presence as a function of
variables available in administrative data for all households in the MAF. The model is estimated
with data from the most recent year of the ACS, in which child presence can be observed. Then
parameter estimates from that model can be used to predict the likelihood of child presence for
all households. These models are estimated separately for each state, and the threshold for
bifurcation is based on an objective of minimizing the size of Stratum 2a while also maintaining
95% coverage of children in Strata 1 and 2a.
Definitions
Population or sample concepts
• 2017 ACS sample, edited and swapped
– unit of observation is the household, unless noted otherwise
– sample includes sampled vacant dwellings, unless noted otherwise
• MAF
– population but restricted to MAFIDs marked as valid for ACS
Sample frame notation
•
•
•
•
h indexes household
s indexes states
C equals 1 if a household has any children, 0 otherwise
Strata:
– S1: household with children
– S2a: household likely to have children – S2b: household unlikely to have children
The U.S. Census Bureau reviewed this data product for unauthorized disclosure of confidential
information and approved the disclosure avoidance practices applied to this release.
CBDRB-FY19-245
B-8
• Strata sizes:
– p(S1)
– p(S2a)
– p(S2b)
• Strata child rates:
– p(C|S1)
– p(C|S2a)
– p(C|S2b)
• Coverage with unsampled S2b:
– p(S1 ∪ S2a|C)
Model
Our goal is a scalar measure of the likelihood of a child being associated with a MAFID. This
measure must be available for all ACS-valid MAFIDs in the MAF. Using a sample in which the
presence of children is observable, we will estimate a model of child presence. The regressors
used to make the index prediction must be observable for all MAFIDs (i.e., to predict outside of
the estimation sample to the entire MAF).
The general model is:
Ch = f(Xh;θ),
where C is equal to one if a household includes any children and zero otherwise, X is a vector of
characteristics available for all households, and θ is an unknown vector of parameters.
We estimate the model using the most recent ACS 1-year sample:
E[Ch|Xh] = f(Xh;βˆACS) for households h in the ACS.
With parameter estimates from the ACS, we make predictions for the entire MAF:
Cˆh = f(Xh;βˆACS) for households h in the MAF.
In practice, we estimate models separately for each state. We do this to account for systematic
differences in administrative records coverage and MAF quality across states. The model can
now be specified as:
The U.S. Census Bureau reviewed this data product for unauthorized disclosure of confidential
information and approved the disclosure avoidance practices applied to this release.
CBDRB-FY19-245
B-9
E[Chs|Xhs] = f(Xhs;βˆs,ACS) for households h in state s in the ACS,
where s is the MAFID’s state and the parameters βˆs,ACS now vary across states. The statespecific predictions become:
Estimation
Cˆhs = f(Xhs;βˆs,ACS) for households h in state s in the MAF.
The model above is estimated as a linear probability model separately for each state using the
edited and swapped 2017 ACS sample. The outcome is child_present, a flag for whether a child
is present at the sampled MAFID.
The following covariates are included (with associated data sources) and are available for each
MAFID (except where a missingness flag is used):
• 2017 ACS 5-year published aggregate data
– acs_blkgrp_childrate_lvout: proportion of residents of block group who are children,
excluding the own-observation child counts from the numerator and denominator
• MAF-ARF
– female2050: flag for female between ages 20 and 50 at MAFID
– adult2050: flag for adults between ages 20 and 50 at MAFID
– coresid_sexdiff: flag for coresidence of men and women between ages 20 and 50 at
MAFID
– miss_adult2050: flag for missingness from MAF-ARF
• IRS 1040 filings, tax year 2017
– any_kid_deduct_max: does any tax form associated with this MAFID have any
deduction related to children? 3
– itemized_max: does any tax form associated with this MAFID use itemized
deductions?
– miss_any_kid_deduct_max: flag for MAFIDs without associated tax forms
• VSGI NAR commercial data
– vsgi_nar_homeowner_max: does any observation associated with this MAFID record
it as homeowener-occupied?
– miss_vsgi_nar_homeowner_max: flag for MAFIDs without associated VSGI data
3
The following IRS variable were used to make this variable: child exemptions and EITC qualifying children.
The U.S. Census Bureau reviewed this data product for unauthorized disclosure of confidential
information and approved the disclosure avoidance practices applied to this release.
CBDRB-FY19-245
B-10
• Targus commercial data
–
–
–
–
–
–
–
targus_homeowner_0: various flags for homeowner-occupied MAFID
targus_homeowner_A: various flags for homeowner-occupied MAFID
targus_homeowner_B: various flags for homeowner-occupied MAFID
targus_homeowner_C: various flags for homeowner-occupied MAFID
targus_homeowner_D: various flags for homeowner-occupied MAFID
targus_homeowner_E: various flags for homeowner-occupied MAFID
targus_homeowner_F: various flags for homeowner-occupied MAFID –
miss_targus_homeowner: flag for MAFIDs without associated Targus data
Parameter estimates are stored in the file frame2018_child_present_bystate.csv.
Sample frame objective function
In order to choose an optimal Strata 2a, we use the following objective function:
• Minimize the size of Strata 2a while maintaining coverage of at least 95%
Strata 2a is defined as:
S2a = {households in the MAF with Cˆh > C¯ but not in S1}.
Strata 2b is defined as
S2b = {households in the MAF but not in S1 or S2a}.
With state-specific modeling, the objective function and coverage constraint also becomes
state specific:
• Minimize the size of Strata 2a in each state while maintaining coverage of at least 95% in
each state
State-specific Strata 2a is defined as:
S2a = {households in the MAF with Cˆhs > C¯s but not in S1}.
Strata 2b is defined as before.
The U.S. Census Bureau reviewed this data product for unauthorized disclosure of confidential
information and approved the disclosure avoidance practices applied to this release.
CBDRB-FY19-245
B-11
Optimization algorithm
The optimization parameter is a threshold on the child-present prediction probability, such that
MAFIDs with values above the threshold are assigned to Stratum 2a. Starting at a low threshold
(C¯) 4, follow this algorithm:
1. Under the current threshold C¯, calculate the proportion of MAFIDs in Stratum 2a, p(S2a),
and the coverage of Strata 1 and 2a under no sampling of Strata 2b, (p(S1 ∪ S2a|C)).
2. If p(S2a) > 0 and p(S1 ∪ S2a|C) ≥ 0.95, then increase the child prediction threshold C¯ one
step (e.g., 0.01) and return to (1). If p(S1 ∪ S2a|C) < 0.95, then the previous threshold C¯ is
the optimal cutoff for S2a.
Under state-specific modeling, this algorithm is applied separately to each state.
Optimal strata
Table 2 shows the optimal strata under a 95% coverage constraint for Strata 1 and 2a. The
coverage constraint assumes non-sampling of Stratum 2b. The notation is as defined above. The
strata were optimized separately for each state using parameter estimates from separate state
regressions of child presence in the 2016 ACS microdata.
Auditing the sample frame against the ACS
To examine the performance of the administrative records used to build the sample frame, we
merge the list of MAFIDs constructed above with the American Community Survey housing-unit
sample from 2017. Currently, this audit uses unedited ACS data (i.e., item nonresponse are left
as missing and are not imputed including children’s age). If item nonresponse is random with
respect to the presence of children in the household, this should not cause any systematic bias
in the audit.
All estimates are weighted with the housing-unit-level weights, which include weight for vacant
units (210,000 vacant housing units in the 2017 ACS). In vacant housing units, we assign zero
children. These estimates should reflect the NSCH survey production process.
4
The most conservative starting threshold would be at p(S1), where p(S2b) = 0.
The U.S. Census Bureau reviewed this data product for unauthorized disclosure of confidential
information and approved the disclosure avoidance practices applied to this release.
CBDRB-FY19-245
B-12
State-specific performance
In 2018, the smallest oversample strata were in Hawaii, Maine, Vermont, and West Virginia.
The largest oversample strata are in California, Texas, and Utah. The highest rates of Type 1
error are in DC, Florida, Louisiana, Mississippi, Nevada, and South Carolina. The highest rates of
Type 2 error were in Alaska, Hawaii, New Mexico, Texas, and Utah.
The U.S. Census Bureau reviewed this data product for unauthorized disclosure of confidential
information and approved the disclosure avoidance practices applied to this release.
CBDRB-FY19-245
B-13
Table 3: 5 NSCH strata, ACS, all addresses audit
State
US
AL
AK
AZ
AR
CA
CO
CT
DE
DC
FL
GA
HI
ID
IL
IN
IA
KS
KY
LA
ME
MD
MA
MI
MN
MS
MO
MT
NE
NV
NH
NJ
NM
NY
NC
5
N
p(S1)
p(S2)
p(S3)
p(C|S1) p(C|S2) p(C|S3) p(C|!S1) p(!S3|C)
2146000
0.22
0.449
0.331
0.763
0.153
0.046
0.112
0.951
35000
0.207
0.529
0.264
0.714
0.128
0.052
0.105
0.95
9100
0.13
0.511
0.359
0.741
0.149
0.188
0.157
0.882
42000
0.206
0.487
0.308
0.721
0.151
0.05
0.118
0.95
20500
0.205
0.519
0.276
0.744
0.152
0.05
0.12
0.954
202000
0.265
0.382
0.353
0.784
0.193
0.047
0.127
0.952
36000
0.224
0.435
0.341
0.789
0.16
0.04
0.11
0.953
21500
0.222
0.382
0.396
0.771
0.165
0.033
0.104
0.955
6800
0.193
0.335
0.471
0.751
0.152
0.027
0.087
0.953
4400
0.177
0.517
0.307
0.68
0.084
0.03
0.065
0.95
114000
0.198
0.403
0.399
0.685
0.141
0.03
0.09
0.95
52500
0.247
0.454
0.299
0.737
0.169
0.052
0.127
0.954
9300
0.158
0.66
0.182
0.744
0.219
0.077
0.19
0.951
11000
0.223
0.466
0.31
0.763
0.156
0.045
0.113
0.954
90000
0.225
0.437
0.338
0.764
0.154
0.045
0.113
0.953
45000
0.227
0.4
0.374
0.763
0.163
0.041
0.109
0.951
31500
0.196
0.645
0.159
0.806
0.091
0.205
0.102
0.943
24500
0.216
0.433
0.351
0.787
0.151
0.048
0.112
0.95
31500
0.218
0.629
0.153
0.773
0.123
0.093
0.118
0.952
28000
0.226
0.49
0.283
0.676
0.156
0.049
0.121
0.953
15500
0.132
0.558
0.31
0.801
0.077
0.037
0.066
0.95
36000
0.245
0.383
0.371
0.783
0.163
0.043
0.109
0.95
39500
0.213
0.426
0.362
0.806
0.143
0.039
0.099
0.95
95000
0.202
0.378
0.42
0.784
0.14
0.031
0.088
0.953
63500
0.215
0.389
0.396
0.833
0.14
0.037
0.092
0.952
17000
0.226
0.6
0.174
0.692
0.134
0.083
0.124
0.95
46500
0.209
0.45
0.341
0.768
0.145
0.04
0.105
0.953
10500
0.148
0.695
0.157
0.75
0.091
0.142
0.098
0.929
19000
0.21
0.62
0.169
0.809
0.103
0.141
0.108
0.95
17500
0.231
0.454
0.316
0.733
0.158
0.047
0.117
0.951
10500
0.179
0.483
0.337
0.8
0.117
0.038
0.088
0.952
51500
0.234
0.393
0.373
0.806
0.184
0.044
0.121
0.95
15500
0.164
0.685
0.15
0.68
0.116
0.174
0.123
0.938
127000
0.209
0.467
0.324
0.749
0.156
0.041
0.113
0.954
65000
0.216
0.458
0.326
0.759
0.15
0.045
0.11
0.951
National Survey of Children’s Health sample frame
The U.S. Census Bureau reviewed this data product for unauthorized disclosure of confidential
information and approved the disclosure avoidance practices applied to this release.
CBDRB-FY19-245
B-14
ND
OH
OK
OR
PA
RI
SC
SD
TN
TX
UT
VT
VA
WA
WV
WI
WY
8800
83500
43000
25000
108000
6000
31000
8900
41000
137000
18000
8100
50500
45000
13500
71000
4100
0.175
0.218
0.205
0.21
0.199
0.202
0.212
0.184
0.223
0.261
0.305
0.14
0.242
0.229
0.143
0.197
0.184
0.683
0.41
0.668
0.457
0.404
0.457
0.49
0.697
0.461
0.448
0.388
0.528
0.395
0.45
0.648
0.389
0.667
0.142
0.372
0.128
0.333
0.397
0.341
0.298
0.118
0.316
0.291
0.306
0.332
0.363
0.321
0.208
0.414
0.149
0.805
0.778
0.719
0.79
0.787
0.75
0.724
0.785
0.743
0.755
0.83
0.81
0.786
0.789
0.731
0.81
0.797
0.082
0.147
0.132
0.139
0.133
0.14
0.133
0.098
0.144
0.193
0.21
0.095
0.157
0.158
0.114
0.139
0.122
0.155
0.036
0.142
0.044
0.033
0.039
0.044
0.228
0.046
0.065
0.06
0.03
0.042
0.046
0.182
0.032
0.079
0.09
0.098
0.133
0.103
0.089
0.101
0.103
0.111
0.108
0.149
0.148
0.073
0.106
0.115
0.13
0.09
0.114
0.941
0.953
0.95
0.951
0.953
0.95
0.952
0.929
0.951
0.95
0.954
0.954
0.951
0.952
0.833
0.952
0.953
We additionally audit the frame against an early release file of 2018 ACS microdata.
Table 4: 6 NSCH strata, ACS2017, all addresses audit
State
US
AL
AK
AZ
AR
CA
CO
CT
DE
DC
FL
GA
HI
ID
IL
IN
6
N
p(S1)
p(S2)
p(S3)
p(C|S1) p(C|S2) p(C|S3) p(C|!S1) p(!S3|C)
1913000
0.233
0.424
0.343
0.828
0.141
0.045
0.098
0.943
29500
0.228
0.511
0.262
0.79
0.123
0.045
0.097
0.954
6100
0.182
0.607
0.211
0.773
0.182
0.436
0.248
0.732
36000
0.225
0.469
0.306
0.815
0.142
0.053
0.107
0.939
18000
0.233
0.503
0.264
0.801
0.146
0.047
0.112
0.954
184000
0.272
0.36
0.368
0.826
0.178
0.035
0.106
0.957
32500
0.239
0.401
0.359
0.848
0.148
0.028
0.092
0.963
19500
0.231
0.349
0.421
0.875
0.161
0.027
0.088
0.957
5800
0.214
0.306
0.48
0.809
0.14
0.037
0.077
0.925
3900
0.174
0.5
0.326
0.726
0.084
0.029
0.062
0.947
97500
0.211
0.378
0.411
0.775
0.138
0.025
0.079
0.954
45500
0.257
0.435
0.307
0.799
0.152
0.04
0.106
0.956
7800
0.173
0.648
0.179
0.756
0.229
0.065
0.194
0.96
9400
0.24
0.439
0.32
0.836
0.168
0.043
0.115
0.952
82000
0.235
0.415
0.35
0.835
0.139
0.048
0.097
0.938
40500
0.238
0.37
0.392
0.826
0.153
0.042
0.096
0.94
National Survey of Children’s Health sample frame
The U.S. Census Bureau reviewed this data product for unauthorized disclosure of confidential
information and approved the disclosure avoidance practices applied to this release.
CBDRB-FY19-245
B-15
IA
KS
KY
LA
ME
MD
MA
MI
MN
MS
MO
MT
NE
NV
NH
NJ
NM
NY
NC
ND
OH
OK
OR
PA
RI
SC
SD
TN
TX
UT
VT
VA
WA
WV
WI
WY
30000
22000
28000
23500
11500
32500
36000
82000
62000
14500
41500
8500
18000
16000
9000
45000
12000
109000
55500
7500
76000
35000
23500
98500
5200
26500
7900
36500
120000
16000
6400
46500
42000
11000
63000
3400
0.203
0.234
0.234
0.245
0.18
0.253
0.221
0.221
0.23
0.249
0.226
0.18
0.219
0.239
0.195
0.246
0.186
0.22
0.232
0.194
0.228
0.24
0.214
0.215
0.206
0.233
0.208
0.24
0.276
0.328
0.169
0.251
0.238
0.175
0.217
0.194
0.652
0.406
0.637
0.453
0.573
0.357
0.398
0.353
0.345
0.592
0.425
0.737
0.642
0.428
0.452
0.371
0.722
0.442
0.422
0.716
0.382
0.68
0.451
0.379
0.41
0.456
0.709
0.446
0.422
0.365
0.483
0.375
0.44
0.727
0.354
0.648
0.145
0.36
0.129
0.302
0.247
0.39
0.381
0.426
0.424
0.159
0.348
0.084
0.14
0.333
0.352
0.383
0.092
0.338
0.347
0.09
0.39
0.08
0.335
0.406
0.384
0.311
0.083
0.314
0.302
0.308
0.348
0.374
0.322
0.098
0.429
0.157
0.866
0.852
0.839
0.735
0.82
0.852
0.862
0.851
0.88
0.755
0.832
0.805
0.87
0.787
0.857
0.859
0.728
0.814
0.824
0.843
0.843
0.781
0.839
0.866
0.836
0.793
0.859
0.821
0.809
0.865
0.861
0.848
0.841
0.794
0.867
0.828
0.072
0.139
0.106
0.148
0.089
0.151
0.136
0.134
0.128
0.121
0.134
0.1
0.078
0.14
0.117
0.168
0.122
0.161
0.144
0.083
0.134
0.135
0.122
0.125
0.123
0.123
0.099
0.135
0.175
0.201
0.129
0.144
0.148
0.089
0.14
0.136
0.231
0.056
0.088
0.038
0.043
0.034
0.029
0.027
0.033
0.077
0.036
0.282
0.197
0.032
0.027
0.031
0.296
0.035
0.032
0.244
0.029
0.225
0.043
0.027
0.025
0.031
0.294
0.034
0.045
0.064
0.033
0.033
0.031
0.262
0.027
0.069
0.101
0.1
0.103
0.104
0.075
0.09
0.084
0.076
0.076
0.112
0.09
0.119
0.1
0.093
0.078
0.098
0.142
0.106
0.093
0.101
0.081
0.145
0.088
0.074
0.076
0.086
0.119
0.093
0.121
0.138
0.089
0.088
0.099
0.11
0.078
0.123
0.869
0.927
0.959
0.955
0.95
0.952
0.957
0.953
0.947
0.955
0.952
0.902
0.898
0.959
0.958
0.958
0.891
0.955
0.958
0.91
0.955
0.94
0.943
0.955
0.959
0.962
0.91
0.96
0.956
0.948
0.948
0.955
0.963
0.888
0.954
0.958
Local-area Internet-accessibility
The U.S. Census Bureau reviewed this data product for unauthorized disclosure of confidential
information and approved the disclosure avoidance practices applied to this release.
CBDRB-FY19-245
B-16
Here we describe the construction of a tract-varying Internet-accessible household flag.
Since 2012, ACS respondents have been able to submit survey forms over the Internet. ACS
paradata record whether a respondent chose the online option. The ACS paradata has been
summarized at the tract level. Our Internet-accessible household measure is equal to a
weighted proportion of the respondents that chose to submit the ACS survey over the Internet
if given the option to do so. Figure 4 shows the kernel-smoothed distribution of tract-level
Internet response for the 2013–2014 ACS survey years.
Figure 4: Kernel-smoothed probability distribution function of tract-level ACS Internet response
rate, ACS paradata, 2013–2014 survey years
0
.2
.4
.6
.8
ACS Internet response rate, weighted, by tract
1
To construct an Internet-access flag, we use the first tritile for a cut-off. A block is considered to
have low Internet access if the Internet accessibility index is below the first tritile of the blocklevel distribution. For low-population blocks, we replace missing values of the block-varying
low-Internet flag with the modal value from the corresponding block group. For very new
housing units without assigned Census blocks, we assign a value of zero for this binary variable
(i.e., the default for these new households is high Internet accessibility.)
Local-area household income relative to the poverty rate
The frame has a set of poverty variables from the 2017 5-year American Community Survey file.
These variables measure the proportion of households with household income in an interval
defined by the poverty rate. Figure 5 shows the kernel-smoothed probability distribution
The U.S. Census Bureau reviewed this data product for unauthorized disclosure of confidential
information and approved the disclosure avoidance practices applied to this release.
CBDRB-FY19-245
B-17
function of the proportion of households in the block group that have household income less
than 150% of the poverty rate.
Figure 5: Kernel-smoothed probability distribution function of block-group-level 150% poverty
rate, ACS, 2017 5-year file
0
.2
.4
.6
.8
1
Proportion of individuals below 150% of poverty line, weighted, by block group
Final sample frame data layout
The component data files are merged together based on MAFID. The data layout for this combined file is
given in Table 2.
Table 2: NSCH population data file layout
Variable name
Label
maf_curstate
maf_curcounty
maf_curblktract
Master Address File
ID
State
County
Tract
maf_curblkgrp
Block group
maf_curblk
stratum1
stratum2a
stratum2b
Block
Stratum 1 identifier
Stratum 2a identifier
Stratum 2b identifier
mafid
Level of
variation
Type
Any
missing?
MAFID
long
no
State
County
Tract
Block
group
Block
MAFID
MAFID
MAFID
str2
str3
str6
no
no
yes
str1
yes
str4
byte
byte
byte
yes
no
no
no
The U.S. Census Bureau reviewed this data product for unauthorized disclosure of confidential
information and approved the disclosure avoidance practices applied to this release.
CBDRB-FY19-245
B-18
acs_tract_net_response
ACS Internet
response
Tract
float
yes
web_low
Low web use (lowest
tritile)
Tract
byte
no
blkgrp_lt_100_povrate
Pr. HH w/ inc. < 100%
poverty rate
Block
group
float
yes
blkgrp_100_150_povrate
Pr. HH w/ inc. 100–
150% poverty rate
Block
group
float
yes
blkgrp_150_185_povrate
Pr. HH w/ inc. 150–
185% poverty rate
Block
group
float
yes
blkgrp_185_200_povrate
Pr. HH w/ inc. 185–
200% poverty rate
Block
group
float
yes
blkgrp_gt_200_povrate
Pr. HH w/ inc. > 200%
poverty rate
Block
group
float
yes
float
yes
byte
yes
Pr. HH w/ inc. < 150%
poverty rate
mailvaldf
Valid mailing address
Filename: nsch_pop_file.sas7bdat
Population: all MAFIDs in 2019 MAF-X
Unit of observation: household (MAFID)
Number of observations: 200,100,000
Filesize: 20GB
blkgrp_lt_150_povrate
Block
group
MAFID
The U.S. Census Bureau reviewed this data product for unauthorized disclosure of confidential
information and approved the disclosure avoidance practices applied to this release.
CBDRB-FY19-245
B-19
File Type | application/pdf |
Author | Leah Meyer (CENSUS/ADDP FED) |
File Modified | 2020-01-17 |
File Created | 2020-01-17 |