Appendix B_NSCH Sample Frame and Sampling Flags Creation

Appendix B_NSCH Sample Frame and Sampling Flags Creation.pdf

National Survey of Children's Health

Appendix B_NSCH Sample Frame and Sampling Flags Creation

OMB: 0607-0990

Document [pdf]
Download: pdf | pdf
Appendix B

National Survey of Children’s Health
Sample Frame and Sampling Flags Creation

2021 National Survey of Children’s Health sample
frame1
John Voorheis and Maria Perez-Patron
Center for Economic Studies
US Census Bureau
April 7, 2021
This document describes using administrative records to build a sample frame for the National
Survey of Children’s Health (NSCH) for 2021.

Population of interest
The population of interest is all children residing in housing units in the US on the date of the
survey.

A sample frame for all households with children
The sample frame identifies three mutually exclusive strata:
• [1] Households with explicit links to children in administrative data.
• [2a] Households without explicit links to children in administrative data but predicted to be
likely to have children conditional on administrative data.
• [2b] Households without explicit links to children in administrative data but predicted to be
unlikely to have children conditional on administrative data.
This document first explains the construction of the Stratum 1 flag, and then documents the
separation of Strata 2a and 2b.

Stratum 1: identifying explicit links from children to addresses
The Stratum 1 flag for all households with explicit links to children comes from three data sources:
1) the Numident, 2) a list of Social Security Number applicants with data updated from various
administrative records, and 3) the Census Household Composition Key (CHCK, formerly called
CARRA kidlink) file, a prototype linkage between children and parents based on Census and
administrative records. Household addresses are updated with the Master Address Auxiliary
Reference File (MAF-ARF), a file that links person identifiers with the latest location updates from
a variety of administrative data (see Figure 3). For Sample year 2021, we provide additional
granularity to the information provided in Stratum 1. In addition to a flag for whether there are
All results have been reviewed to ensure that no confidential information is disclosed. The statistical summaries
reported in this paper have been cleared by the Census Bureau’s Disclosure Review Board, release authorization
number CBDRB-FY22-CES019-001.

1

2021 National Survey of Children’s Health sample frame

1

any children under 18 at a MAFID, we provide flags for whether there are any young children
(under 5, stratum 1a) or only older children (5-17, stratum 1b), based on the date of birth
information in the Numident.

Using the Numident to identify children
The Numident is based on all individuals who have been assigned Social Security Numbers.
Demographic data from the Numident is updated from federal tax data and various administrative
records. There are 73,520,000 children in the most recent Numident who will be aged 0–17 years
on June 1, 2021 Figure 1 shows the distribution of date of birth for these children.
Figure 1: Distribution of date of birth, aged 0–17 years as of December 1, 2020, Numident

The CHCK file was updated in March 2020 for NSCH sample frame production.

Identifying the households containing the children in the Numident
To sample households with children, we must connect the children in the Numident to the
households in which they live. We do this with the CHCK file.
Census Household Composition Key File
The CHCK uses data from Census surveys and federal administrative records to link children PIKs
to parent PIKs. We can use this file to identify the parents of children in the Numident.
2021 National Survey of Children’s Health sample frame

2

The source data for the CHCK are: the Census Numident, the 2010 Census Unedited File, the IRS
1040 and 1099 files, the Medicare Enrollment Database (MEDB), Indian HealthService database
(IHS), Selective Service System (SSS), and Public and Indian Housing (PIC) and
Tenant Rental Assistance Certification System (TRACS) data from the Department of Housing and
Urban Development. Of these, the IRS 1040 provides the most significant information.
In the CHCK file generated in 2020, there are 64,700,000 unique records for children who will
be aged 0–17 years on June 1, 2021.
Let us consider how many children from the Numident have been linked to a parent in the
CHCK file. Table 1 shows the number of children linked with both a mother and a father, linked
with a mother only, linked with a father only, or not linked with any parent.
Table 1: Child-parent links in the CHCK file relative to the Numident population, aged 0–17 years
as of 2021, 2020 CHCK file.
Type of link

Frequency

Percent

Mother and father
51,430,000
Mother only
11,300,000
Father only
1,966,000
ACS link
68,000
No link
8,754,000
All children in Numident 73,520,000

70%
15%
2.7%
0.1%
12%
100% 2

Figure 2 compares the distributions of date of birth for these children against the distribution
shown in Figure 1.

2

Note that numbers in this table may not add up correctly due to rounding required for disclosure avoidance.

2021 National Survey of Children’s Health sample frame

3

Figure 2. Frequency distributions of date of birth, Numident vs. CHCK entries, aged 0–17 years as
of June 1, 2020

The CHCK file was updated in 2020 for NSCH sample frame production.

Updating household location using the MAF-ARF
In order to update household location, we use a Census dataset called the Master Address
Auxiliary Reference File (MAF-ARF). The MAF-ARF links person identifiers to address identifiers
using Census survey data and federal administrative data. The source data for the MAF-ARF file
are: the Census Numident, the 2010 Census Unedited File, the IRS 1040 and 1099 files, the
Medicare Enrollment Database (MEDB), Indian Health Service database (IHS), Selective Service
System (SSS), and Public and Indian Housing (PIC) and Tenant Rental Assistance Certification
System (TRACS) data from the Department of Housing and Urban Development, and National
Change of Address data from the US Postal Service. Of these, the IRS 1040 provides the most
significant information.
Out of 84,130,000 3 children in the Numident, 68,110,000 are matched directly to a MAFID.
Out of 72,300,000 CHCK-matched mothers, about 67,530,000 are matched to a MAFID. Out of
61,330,000 CHCK-matched fathers, about 57,250,000 are matched to a MAFID.

All unweighted counts and estimates in this document are rounded in accordance with Census Disclosure Review
Board rules.

1

2021 National Survey of Children’s Health sample frame

4

For each child observation from the Numident, we now have three possible MAFIDs: the childto-MAF-ARF MAFID, the child-to-CHCK-to-mother-to-MAF-ARF MAFID, the child-to-CHCK-tofather-to-MAF-ARF MAFID, and the child-to-ACS parent-to-MAF-ARF MAFID. We allocate a single
MAFID to each child using that order. First, we assign the directly identified child MAFID
(65,470,000 cases). If the MAFID is missing, we assign the mother MAFID (5,294,000 cases).
Finally, if the MAFID is still missing, we assign the father MAFID (2,055,000 cases). That leaves
11,270,000 children from the Numident not assigned MAFIDs (a MAFID match rate of 86.6%).
There are some MAFIDs associated with a great number of children. As an example, out of
72,860,000 associated with a MAFID, 7,862,000 children are associated with a MAFID with more
than 20 child-MAFID links.
The 72,860,000 children associated with a MAFID are then collapsed down to 38,280,000
unique MAFIDS. This implies 1.9 children per household for households assigned a flag.
We then need to scale up the MAFID list to the universe of MAFIDs to allow sampling of
unflagged households. A merge of the 38,280,000 unique child-flagged MAFIDS with the 2020
MAF-X file matches 38,280,000 MAFIDS with child flags, removes 173,600,000 MAFIDS with child
flags. The sample frame file now has about 209 million valid MAFIDS of which 38,280,000 include
child flags. Compare this with the 2011 ACS, in which about 37 million out of 115 million
households included related children. 4

Stratum 1 construction visualization
Figure 3 shows a visualization of the sample frame construction.
Figure 3: Stratum 1 construction

4

http://www.census.gov/prod/2013pubs/p20-570.pdf

2021 National Survey of Children’s Health sample frame

5

Strata 2a and 2b: identifying probabilistic links from children to
addresses
In 2016, the Stratum 1 flag performed well. That is, the surveyed sample contained approximately
the same rate of children as had been predicted before the survey. The survey team would like to
further increase the sampling efficiency of the survey by adding more information to the second
stratum. By definition, Stratum 2 does not have explicit links from children to households in the
administrative data. In 2021 as in previous years, we further bifurcate Stratum 2 into those
households more likely to have children and those households less likely to have children.
Households are assigned to Stratum 2a based on a model of child presence as a function of
variables available in administrative data for all households in the MAF. The model is estimated
with data from the most recent year of the ACS, in which child presence can be observed. Then
parameter estimates from that model can be used to predict the likelihood of child presence for
all households. These models are estimated separately for each state, and the threshold for
bifurcation is based on an objective of minimizing the size of Stratum 2a while also maintaining
95% coverage of children in Strata 1 and 2a.
2021 National Survey of Children’s Health sample frame

6

Definitions
Population or sample concepts
• 2019 ACS sample, edited and swapped
– unit of observation is the household, unless noted otherwise
– sample includes sampled vacant dwellings, unless noted otherwise
• MAF
– population but restricted to MAFIDs marked as valid for ACS
Sample frame notation
•
•
•
•

h indexes household
s indexes states
C equals 1 if a household has any children, 0 otherwise
Strata:
– S 1 : household with children
– S 2a : household likely to have children
– S 2b : household unlikely to have children

• Strata sizes:
– p(S 1 )
– p(S 2a )
– p(S 2b )

• Strata child rates:
– p(C|S 1 )
– p(C|S 2a )
– p(C|S 2b )

• Coverage with unsampled S 2b :

Model

– p(S 1 ∪ S 2a |C)

Our goal is a scalar measure of the likelihood of a child being associated with a MAFID. This
measure must be available for all ACS-valid MAFIDs in the MAF. Using a sample in which the
presence of children is observable, we will estimate a model of child presence. The regressors
2021 National Survey of Children’s Health sample frame

7

used to make the index prediction must be observable for all MAFIDs (i.e., to predict outside of
the estimation sample to the entire MAF).
The general model is:
C h = f(X h ;θ),

where C is equal to one if a household includes any children and zero otherwise, X is a vector of
characteristics available for all households, and θ is an unknown vector of parameters.
We estimate the model using the most recent ACS 1-year sample:
E[C h |X h ] = f(X h ;βˆ ACS ) for households h in the ACS.

With parameter estimates from the ACS, we make predictions for the entire MAF:
Cˆ h = f(X h ;βˆ ACS ) for households h in the MAF.

In practice, we estimate models separately for each state. We do this to account for systematic
differences in administrative records coverage and MAF quality across states. The model can now
be specified as:
E[C hs |X hs ] = f(X hs ;βˆ s,ACS ) for households h in state s in the ACS,

where s is the MAFID’s state and the parameters βˆ s,ACS now vary across states. The state-specific
predictions become:

Estimation

Cˆ hs = f(X hs ;βˆ s,ACS ) for households h in state s in the MAF.

The model above is estimated as a linear probability model separately for each state using the
edited and swapped 2019 ACS sample. The outcome is child_present, a flag for whether a
child is present at the sampled MAFID.
The following covariates are included (with associated data sources) and are available for each
MAFID (except where a missingness flag is used):
• 2019 ACS 5-year published aggregate data
– acs_blkgrp_childrate_lvout: proportion of residents of block group who
are children, excluding the own-observation child counts from the numerator and
denominator
• MAF-ARF
2021 National Survey of Children’s Health sample frame

8

– female2050: flag for female between ages 20 and 50 at MAFID
– adult2050: flag for adults between ages 20 and 50 at MAFID
– coresid_sexdiff: flag for coresidence of men and women between ages 20 and
50 at MAFID
– miss_adult2050: flag for missingness from MAF-ARF
• IRS 1040 filings, tax year 2019
– any_kid_deduct_max: does any tax form associated with this MAFID have any
deduction related to children? 5
– itemized_max: does any tax form associated with this MAFID use itemized
deductions?
– miss_any_kid_deduct_max: flag for MAFIDs without associated tax forms
• VSGI NAR commercial data
– vsgi_nar_homeowner_max: does any observation associated with this MAFID
record it as homeowner-occupied?
– miss_vsgi_nar_homeowner_max: flag for MAFIDs without associated VSGI
data
• Targus commercial data
–
–
–
–
–
–
–
–

targus_homeowner_0: various flags for homeowner-occupied MAFID
targus_homeowner_A: various flags for homeowner-occupied MAFID
targus_homeowner_B: various flags for homeowner-occupied MAFID
targus_homeowner_C: various flags for homeowner-occupied MAFID
targus_homeowner_D: various flags for homeowner-occupied MAFID
targus_homeowner_E: various flags for homeowner-occupied MAFID
targus_homeowner_F: various flags for homeowner-occupied MAFID
miss_targus_homeowner: flag for MAFIDs without associated Targus data

Parameter estimates are stored in the file frame2021_child_present_bystate.csv.

Sample frame objective function
In order to choose an optimal Stratum 2a, we use the following objective function:
• Minimize the size of Stratum 2a while maintaining coverage of at least 95%
Stratum 2a is defined as:
5

The following IRS variable were used to make this variable: child exemptions and EITC qualifying children.

2021 National Survey of Children’s Health sample frame

9

S 2a = {households in the MAF with Cˆ h > C¯ but not in S 1 }.

Stratum 2b is defined as:
S 2b = {households in the MAF but not in S 1 or S 2a }.

With state-specific modeling, the objective function and coverage constraint also becomes
state specific:
• Minimize the size of Stratum 2a in each state while maintaining coverage of at least 95% in
each state
State-specific Stratum 2a is defined as:
S 2a = {households in the MAF with Cˆ hs > C¯ s but not in S 1 }.

Stratum 2b is defined as before.

Optimization algorithm
The optimization parameter is a threshold on the child-present prediction probability, such that
MAFIDs with values above the threshold are assigned to Stratum 2a. Starting at a low threshold
(C¯) 6, follow this algorithm:
1. Under the current threshold C¯, calculate the proportion of MAFIDs in Stratum 2a, p(S 2a ),
and the coverage of Strata 1 and 2a under no sampling of Stratum 2b, (p(S 1 ∪ S 2a |C)).

2. If p(S 2a ) > 0 and p(S 1 ∪ S 2a |C) ≥ 0.95, then increase the child prediction threshold C¯ one
step (e.g., 0.01) and return to (1). If p(S 1 ∪ S 2a |C) < 0.95, then the previous threshold C¯ is
the optimal cutoff for S 2a .

Under state-specific modeling, this algorithm is applied separately to each state.

Optimal strata
Table 2 shows the optimal strata under a 95% coverage constraint for Strata 1 and 2a. The
coverage constraint assumes non-sampling of Stratum 2b. The notation is as defined above. The
strata were optimized separately for each state using parameter estimates from separate state
regressions of child presence in the 2019 ACS microdata.

6

The most conservative starting threshold would be at p(S1), where p(S2b) = 0.

2021 National Survey of Children’s Health sample frame

10

Table 2: Optimal 2021 NSCH strata with 95% coverage constraint, state-level optimization
State
US
AL
AK
AZ
AR
CA
CO
CT
DE
DC
FL
GA
HI
ID
IL
IN
IA
KS
KY
LA
ME
MD
MA
MI
MN
MS
MO
MT
NE
NV
NH
NJ
NM
NY
NC
ND
OH

N
p(S1)
p(S2)
p(S3)
p(C|S1) p(C|S2) p(C|S3) p(C|!S1) p(!S3|C) q
7
2,079,000
0.22
0.46
0.33
0.76
0.14
0.04
0.11
0.95
33,000
0.21
0.53
0.26
0.71
0.12
0.05
0.10
0.95
8,000
0.15
0.53
0.32
0.71
0.13
0.17
0.13
0.89
40,000
0.21
0.45
0.34
0.76
0.15
0.04
0.11
0.95
20,000
0.21
0.58
0.20
0.74
0.12
0.07
0.11
0.95
191,000
0.26
0.38
0.36
0.78
0.19
0.05
0.12
0.95
35,000
0.22
0.41
0.36
0.78
0.16
0.04
0.10
0.95
20,000
0.21
0.40
0.39
0.78
0.15
0.04
0.10
0.95
6,500
0.19
0.42
0.39
0.72
0.11
0.03
0.08
0.95
4,000
0.16
0.75
0.09
0.66
0.07
0.09
0.07
0.95
109,000
0.19
0.41
0.39
0.68
0.13
0.03
0.08
0.95
49,000
0.24
0.46
0.30
0.72
0.15
0.06
0.12
0.95
8,500
0.13
0.62
0.25
0.68
0.23
0.05
0.18
0.95
11,000
0.23
0.41
0.36
0.79
0.15
0.04
0.10
0.95
86,000
0.22
0.44
0.34
0.77
0.14
0.05
0.10
0.95
43,500
0.22
0.45
0.33
0.74
0.13
0.04
0.10
0.95
31,500
0.19
0.64
0.16
0.81
0.09
0.20
0.10
0.94
24,000
0.22
0.42
0.36
0.78
0.15
0.05
0.11
0.95
30,000
0.22
0.55
0.23
0.77
0.13
0.06
0.12
0.95
26,000
0.22
0.46
0.32
0.67
0.15
0.05
0.11
0.95
15,500
0.14
0.45
0.41
0.80
0.09
0.03
0.06
0.95
34,000
0.25
0.41
0.35
0.78
0.15
0.04
0.10
0.95
38,000
0.21
0.42
0.37
0.80
0.14
0.03
0.09
0.95
91,500
0.20
0.36
0.45
0.78
0.13
0.03
0.08
0.95
68,000
0.21
0.39
0.40
0.82
0.12
0.04
0.09
0.95
16,000
0.22
0.70
0.07
0.68
0.11
0.17
0.12
0.95
45,000
0.21
0.45
0.34
0.77
0.13
0.04
0.10
0.95
10,000
0.15
0.71
0.14
0.76
0.09
0.17
0.10
0.91
19,000
0.22
0.60
0.18
0.81
0.09
0.13
0.10
0.95
17,000
0.23
0.47
0.31
0.72
0.15
0.05
0.11
0.95
10,500
0.18
0.41
0.41
0.79
0.12
0.03
0.08
0.95
48,500
0.23
0.39
0.38
0.79
0.18
0.04
0.11
0.95
14,000
0.17
0.70
0.13
0.70
0.12
0.19
0.12
0.93
119,000
0.20
0.45
0.34
0.75
0.15
0.04
0.11
0.95
62,000
0.21
0.44
0.35
0.76
0.14
0.04
0.10
0.95
8,400
0.18
0.68
0.14
0.75
0.09
0.15
0.10
0.94
80,000
0.22
0.38
0.41
0.78
0.14
0.03
0.09
0.95

31
23
-1
31
15
37
36
38
37
7
38
27
21
34
31
31
-1
31
20
31
33
34
37
43
37
-1
31
-1
5
30
38
37
-1
33
33
-1
40

C_hat_S2
0.00
0.03
-0.48
0.06
0.01
0.11
0.08
0.08
0.05
-0.03
0.06
0.06
0.10
0.09
0.05
0.05
-0.15
0.06
0.03
0.07
0.03
0.07
0.07
0.07
0.05
-0.18
0.05
-0.17
-0.05
0.06
0.06
0.09
-0.17
0.08
0.07
-0.16
0.07

Note that the state population totals do not add up to the national population due to rounding required by Census
disclosure avoidance rules

7

2021 National Survey of Children’s Health sample frame

11

OK
OR
PA
RI
SC
SD
TN
TX
UT
VT
VA
WA
WV
WI
WY

40,500
24,500
105,000
5,700
30,000
8,700
39,500
130,000
17,500
7,900
49,500
43,500
12,500
70,000
3,900

0.21
0.20
0.20
0.19
0.21
0.20
0.22
0.26
0.30
0.15
0.24
0.23
0.16
0.20
0.19

0.64
0.46
0.39
0.47
0.43
0.69
0.43
0.46
0.43
0.77
0.36
0.45
0.70
0.39
0.61

0.15
0.34
0.42
0.34
0.36
0.11
0.35
0.28
0.27
0.08
0.40
0.32
0.15
0.42
0.20

0.73
0.77
0.80
0.78
0.72
0.80
0.74
0.75
0.81
0.80
0.79
0.79
0.75
0.80
0.75

0.13
0.13
0.13
0.13
0.13
0.10
0.15
0.18
0.19
0.06
0.16
0.15
0.10
0.13
0.11

0.11
0.04
0.03
0.04
0.04
0.19
0.04
0.07
0.07
0.13
0.04
0.05
0.15
0.03
0.06

0.12
0.10
0.09
0.09
0.09
0.11
0.10
0.14
0.15
0.07
0.10
0.11
0.11
0.08
0.10

0.95
0.95
0.95
0.95
0.95
0.94
0.95
0.95
0.95
0.95
0.95
0.95
0.89
0.95
0.95

7
31
40
34
34
-1
33
25
23
-1
40
31
-1
40
15

Auditing the sample frame against the ACS
To examine the performance of the administrative records used to build the sample frame, we
merge the list of MAFIDs constructed above with the American Community Survey housing-unit
sample from 2020. Currently, this audit uses unedited ACS data (i.e. item nonresponse are left
as missing and are not imputed including children’s age). If item nonresponse is random with
respect to the presence of children in the household, this should not cause any systematic bias
in the audit.
All estimates are weighted with the housing-unit-level weights, which include weight for
vacant units. In vacant housing units, we assign zero children. These estimates should reflect the
NSCH survey production process.

State-specific performance
Table 3 shows the overlap between the MAFID and ACS distributions by state. In 2021, the
smallest oversample strata are in Hawaii, Maine, Vermont, and West Virginia. The largest
oversample strata are in California, Texas, and Utah. The highest rates of Type 1 error are in DC,
Florida, Louisiana, Mississippi, Nevada, and South Carolina. The highest rates of Type 2 error were
in Alaska, Hawaii, New Mexico, Texas, and Utah.

2021 National Survey of Children’s Health sample frame

12

-0.01
0.05
0.06
0.06
0.06
-0.19
0.07
0.07
0.08
-0.11
0.09
0.07
-0.19
0.06
0.01

Table 3: 8 NSCH strata, ACS, all addresses audit
State
US
AL
AK
AZ
AR
CA
CO
CT
DE
DC
FL
GA
HI
ID
IL
IN
IA
KS
KY
LA
ME
MD
MA
MI
MN
MS
MO
MT
NE
NV
NH
NJ
NM
NY
NC
ND
OH
OK
OR
PA
8

N
2,079,000
33,000
8,000
40,000
20,000
191,000
35,000
20,000
6,500
4,100
109,000
49,000
8,700
11,000
86,000
43,000
31,000
24,000
30,000
26,000
15,500
34,000
38,000
91,500
68,000
16,000
45,000
10,000
19,000
17,000
10,000
48,500
14,000
119,000
62,000
8,400
80,000
40,500
24,500
105,000

p(S1)
p(S2)
p(S3)
p(C|S1) p(C|S2) p(C|S3) p(C|!S1) p(!S3|C)
0.22
0.44
0.35
0.76
0.14
0.05
0.11
0.94
0.21
0.53
0.26
0.71
0.12
0.05
0.10
0.95
0.15
0.53
0.32
0.71
0.13
0.17
0.13
0.89
0.21
0.45
0.34
0.76
0.15
0.04
0.11
0.95
0.21
0.58
0.20
0.74
0.12
0.07
0.11
0.95
0.26
0.38
0.37
0.78
0.19
0.05
0.12
0.95
0.22
0.41
0.36
0.78
0.15
0.04
0.10
0.95
0.21
0.40
0.39
0.78
0.15
0.04
0.10
0.95
0.19
0.42
0.39
0.72
0.11
0.03
0.08
0.95
0.16
0.75
0.09
0.66
0.07
0.09
0.07
0.95
0.19
0.42
0.39
0.68
0.13
0.03
0.08
0.95
0.24
0.46
0.30
0.72
0.15
0.06
0.12
0.95
0.13
0.62
0.25
0.68
0.23
0.05
0.18
0.95
0.23
0.41
0.36
0.79
0.15
0.04
0.10
0.95
0.22
0.44
0.34
0.77
0.14
0.05
0.10
0.95
0.22
0.44
0.33
0.74
0.13
0.04
0.10
0.95
0.19
0.64
0.16
0.81
0.09
0.20
0.10
0.94
0.22
0.42
0.36
0.78
0.15
0.05
0.11
0.95
0.22
0.55
0.23
0.77
0.13
0.06
0.12
0.95
0.22
0.46
0.32
0.67
0.15
0.05
0.11
0.95
0.14
0.46
0.41
0.80
0.09
0.03
0.06
0.95
0.25
0.41
0.35
0.78
0.15
0.04
0.10
0.95
0.21
0.42
0.37
0.80
0.14
0.03
0.09
0.95
0.20
0.35
0.45
0.78
0.13
0.03
0.08
0.95
0.21
0.39
0.40
0.82
0.12
0.04
0.09
0.95
0.22
0.70
0.07
0.68
0.11
0.17
0.12
0.95
0.21
0.45
0.34
0.77
0.13
0.04
0.10
0.95
0.15
0.71
0.14
0.76
0.09
0.17
0.10
0.91
0.22
0.60
0.18
0.81
0.09
0.13
0.10
0.95
0.23
0.47
0.31
0.72
0.15
0.05
0.11
0.95
0.18
0.41
0.41
0.79
0.12
0.03
0.08
0.95
0.23
0.39
0.38
0.79
0.18
0.04
0.11
0.95
0.17
0.70
0.13
0.70
0.12
0.19
0.12
0.93
0.20
0.45
0.34
0.75
0.15
0.04
0.11
0.95
0.21
0.44
0.35
0.76
0.14
0.04
0.10
0.95
0.18
0.68
0.14
0.75
0.09
0.15
0.10
0.94
0.22
0.38
0.41
0.78
0.14
0.03
0.09
0.95
0.21
0.64
0.15
0.73
0.13
0.11
0.12
0.95
0.20
0.46
0.34
0.77
0.13
0.04
0.10
0.95
0.20
0.39
0.42
0.80
0.13
0.03
0.09
0.95

National Survey of Children’s Health sample frame

2021 National Survey of Children’s Health sample frame

13

RI
SC
SD
TN
TX
UT
VT
VA
WA
WV
WI
WY

5,700
30,000
8,700
39,500
130,000
17,500
7,900
49,500
43,500
12,500
70,000
3,900

0.19
0.21
0.20
0.22
0.26
0.30
0.15
0.24
0.23
0.16
0.20
0.19

0.47
0.44
0.69
0.43
0.46
0.43
0.77
0.36
0.45
0.70
0.39
0.61

0.35
0.36
0.11
0.34
0.28
0.27
0.08
0.40
0.32
0.15
0.42
0.20

0.78
0.72
0.80
0.74
0.75
0.81
0.80
0.79
0.79
0.75
0.80
0.75

0.13
0.13
0.10
0.15
0.18
0.19
0.06
0.16
0.15
0.10
0.13
0.11

0.04
0.04
0.19
0.04
0.07
0.07
0.13
0.04
0.05
0.15
0.03
0.06

0.09
0.09
0.11
0.10
0.14
0.15
0.07
0.10
0.11
0.11
0.08
0.10

0.95
0.95
0.94
0.95
0.95
0.95
0.95
0.95
0.95
0.89
0.95
0.95

We additionally audit the frame against an early release file of 2020 ACS microdata, as shown in
table 4.
Table 4: 9 NSCH strata, ACS 2020, all addresses audit
State
US
AL
AK
AZ
AR
CA
CO
CT
DE
DC
FL
GA
HI
ID
IL
IN
IA
KS
KY
LA
ME
MD
MA
9

N
1,300,000
19,000
4,500
24,500
11,500
125,000
23,500
13,500
4,000
2,700
65,000
28,500
5,700
7,100
55,500
27,500
21,000
16,000
19,000
14,500
8,000
22,500
26,000

p(S1)
p(S2)
p(S3)
p(C|S1) p(C|S2) p(C|S3) p(C|!S1) p(!S3|C)
0.23
0.42
0.35
0.84
0.11
0.04
0.08
0.94
0.23
0.51
0.26
0.80
0.10
0.03
0.08
0.96
0.19
0.62
0.19
0.77
0.16
0.38
0.21
0.77
0.22
0.44
0.34
0.83
0.12
0.03
0.08
0.95
0.23
0.59
0.19
0.82
0.10
0.06
0.09
0.96
0.27
0.35
0.38
0.84
0.15
0.03
0.09
0.96
0.24
0.40
0.37
0.86
0.11
0.03
0.08
0.95
0.23
0.37
0.40
0.88
0.12
0.03
0.07
0.96
0.21
0.40
0.39
0.84
0.09
0.03
0.06
0.95
0.17
0.75
0.08
0.79
0.06
0.05
0.06
0.98
0.20
0.39
0.40
0.79
0.11
0.03
0.06
0.95
0.25
0.45
0.30
0.82
0.12
0.04
0.09
0.96
0.15
0.62
0.23
0.76
0.26
0.06
0.21
0.96
0.24
0.39
0.38
0.85
0.15
0.05
0.10
0.93
0.23
0.42
0.35
0.86
0.11
0.05
0.08
0.93
0.23
0.42
0.35
0.85
0.11
0.04
0.08
0.95
0.21
0.64
0.15
0.87
0.06
0.22
0.09
0.87
0.23
0.39
0.39
0.86
0.11
0.06
0.08
0.92
0.23
0.54
0.23
0.85
0.10
0.04
0.08
0.97
0.24
0.43
0.33
0.77
0.12
0.03
0.08
0.96
0.18
0.44
0.38
0.85
0.08
0.03
0.05
0.95
0.25
0.39
0.36
0.86
0.11
0.04
0.07
0.96
0.22
0.39
0.38
0.87
0.12
0.03
0.07
0.96

National Survey of Children’s Health sample frame

2021 National Survey of Children’s Health sample frame

14

MI
MN
MS
MO
MT
NE
NV
NH
NJ
NM
NY
NC
ND
OH
OK
OR
PA
RI
SC
SD
TN
TX
UT
VT
VA
WA
WV
WI
WY

53,500
44,000
9,500
28,000
6,200
13,000
10,500
6,200
31,500
7,400
72,500
36,500
5,400
51,500
25,500
17,000
64,000
3,700
17,500
5,900
24,500
77,000
12,000
4,500
32,000
31,000
7,200
44,500
2,500

0.22
0.23
0.25
0.23
0.18
0.21
0.23
0.21
0.25
0.18
0.22
0.23
0.20
0.22
0.25
0.22
0.21
0.21
0.23
0.21
0.24
0.27
0.31
0.18
0.25
0.24
0.19
0.21
0.20

0.33
0.36
0.72
0.43
0.73
0.62
0.44
0.39
0.37
0.75
0.43
0.41
0.70
0.35
0.63
0.44
0.36
0.43
0.42
0.71
0.41
0.44
0.42
0.77
0.34
0.43
0.74
0.36
0.61

0.45
0.41
0.04
0.34
0.09
0.17
0.33
0.40
0.37
0.07
0.36
0.36
0.10
0.42
0.12
0.35
0.43
0.36
0.35
0.08
0.35
0.29
0.27
0.05
0.41
0.33
0.08
0.43
0.19

0.87
0.89
0.76
0.86
0.83
0.87
0.80
0.86
0.87
0.76
0.83
0.83
0.86
0.86
0.78
0.86
0.88
0.86
0.81
0.87
0.82
0.83
0.86
0.86
0.85
0.86
0.83
0.86
0.82

0.11
0.10
0.09
0.11
0.08
0.07
0.11
0.09
0.15
0.10
0.13
0.11
0.08
0.11
0.11
0.10
0.10
0.10
0.11
0.08
0.11
0.14
0.16
0.07
0.12
0.12
0.07
0.11
0.14

0.03
0.03
0.27
0.03
0.24
0.19
0.03
0.03
0.03
0.29
0.04
0.03
0.25
0.03
0.13
0.03
0.03
0.02
0.03
0.24
0.03
0.04
0.08
0.16
0.03
0.03
0.23
0.02
0.06

0.06
0.06
0.10
0.07
0.10
0.09
0.08
0.06
0.09
0.11
0.09
0.07
0.10
0.07
0.12
0.07
0.06
0.07
0.07
0.09
0.08
0.10
0.13
0.07
0.07
0.08
0.08
0.06
0.12

0.95
0.94
0.96
0.96
0.90
0.88
0.96
0.95
0.96
0.92
0.95
0.96
0.90
0.95
0.94
0.95
0.95
0.96
0.96
0.93
0.95
0.96
0.94
0.96
0.95
0.96
0.92
0.96
0.96

Local-area Internet-accessibility
Here we describe the construction of a tract-varying Internet-accessible household flag.
Since 2012, ACS respondents have been able to submit survey forms over the Internet. ACS
paradata record whether a respondent chose the online option. The ACS paradata has been
summarized at the tract level. Our Internet-accessible household measure is equal to a weighted
proportion of the respondents that chose to submit the ACS survey over the Internet if given the
option to do so. Figure 4 shows the kernel-smoothed distribution of tract-level Internet response
for the 2013–2014 ACS survey years.
Figure 4: Kernel-smoothed probability distribution function of tract-level ACS Internet response
rate, ACS paradata, 2013–2014 survey years
2021 National Survey of Children’s Health sample frame

15

To construct an Internet-access flag, we use the first tercile for a cut-off. A block is considered
to have low Internet access if the Internet accessibility index is below the first tercile of the blocklevel distribution. For low-population blocks, we replace missing values of the block-varying lowInternet flag with the modal value from the corresponding block group. For very new housing
units without assigned Census blocks, we assign a value of zero for this binary variable (i.e., the
default for these new households is high Internet accessibility.)

Local-area household income relative to the poverty rate
The frame has a set of poverty variables from the 2019 5-year American Community Survey file.
These variables measure the proportion of households with household income in an interval
defined by the poverty rate. Figure 5 shows the kernel-smoothed probability distribution function
of the proportion of households in the block group that have household income less than 150%
of the poverty rate.

2021 National Survey of Children’s Health sample frame

16

Figure 5: Kernel-smoothed probability distribution function of block-group-level 150% poverty
rate, ACS, 2019 5-year file

Final sample frame data layout
The component data files are merged together based on MAFID. The data layout for this
combined file is given in Table 5.
Table 5: NSCH population data file layout

Variable name

Label

Level of
variation

Type

Domain

Any missing?

mafid

Master Address File ID

MAFID

long

9 digits

no

maf_curstate
maf_curcounty
maf_curblktract

State
County
Tract

State
County
Tract

str2
str3
str6

2021 National Survey of Children’s Health sample frame

no
no
yes

17

maf_curblkgrp
maf_curblk
stratum1
stratum1a
stratum1b
stratum2a

Block group
Block
Stratum 1 identifier
Stratum 1a identifier
Stratum 1b identifier
Stratum 2a identifier

Block group
Block
MAFID
MAFID
MAFID
MAFID

str1
str4
byte
byte
byte
byte

stratum2b

Stratum 2b identifier

MAFID

byte

acs_tract_net_response

ACS Internet response

Tract

Float

web_low

Low web use (lowest tritile)

Tract

byte

blkgrp_lt_100_povrate

Pr. HH w/ inc. < 100% poverty rate

Block group

float

blkgrp_100_150_povrate

Pr. HH w/ inc. 100–150% poverty rate

Block group

float

blkgrp_150_185_povrate

Pr. HH w/ inc. 150–185% poverty rate

Block group

float

blkgrp_185_200_povrate

Pr. HH w/ inc. 185–200% poverty rate

Block group

float

blkgrp_gt_200_povrate

Pr. HH w/ inc. > 200% poverty rate

Block group

float

blkgrp_lt_150_povrate

Pr. HH w/ inc. < 150% poverty rate

Block group

float

mailvaldf

Valid mailing address

MAFID

byte

{0, 1}
{0, 1}
{0, 1}
{0, 1}
{0, 1}
[0, 1]
0, 1
[0, 1]
[0, 1]
[0, 1]
[0, 1]
[0, 1]
[0, 1]
{0, 1}

yes
yes
no
no
No
no
no
Yes
No
Yes
yes
yes
yes
yes
yes
yes

Filename: nsch_pop_file.sas7bdat
Population: all MAFIDs in 2020 MAF-X
Unit of observation: household (MAFID)

2021 National Survey of Children’s Health sample frame

18


File Typeapplication/pdf
File TitleMy title
SubjectSubject
AuthorAuthor
File Modified2022-02-10
File Created2022-02-10

© 2024 OMB.report | Privacy Policy