2025 and 2027 National Youth Risk Behavior Survey
Attachment O
Sampling and Weighting Plan for the national YRBS
Contents
2. Estimation and Justification of Sample Size 3
2.2 Expected Confidence Intervals 4
3.1 Design Updates and Modifications 5
4 Weighting and Variance Estimation 15
4.2 Non-response Adjustments, Raking and Trimming 16
4.3 Estimators and Variance Estimation 17
	
	
The national YRBS is part of CDC’s larger surveillance system, the Youth Risk Behavior Surveillance System (YRBSS), which includes state, territorial, tribal, and local school-based YRBSs conducted by education and health agencies as part of cooperative agreement activities with CDC. The YRBSS was developed to monitor priority health risk behaviors that contribute to the leading causes of mortality, morbidity, and social problems among youth and young adults in the United States. YRBSS captures information on the following topics: student demographics (sex, sexual identity, race and ethnicity, and grade), youth health behaviors and conditions (sexual; injury and violence; bullying; diet and physical activity; obesity; and mental health, including suicide), and substance use behaviors (electronic vapor product and tobacco product use, alcohol use, and other drug use). Changes to the national questionnaire in 2021 included new questions that examined urgent and relevant student health behaviors and experiences, including protective factors (parental monitoring and school connectedness), housing instability and exposure to community violence.
The national YRBS allows CDC to assess how risk behaviors change over time among the high school population in the United States. It also provides comparable data to state, territorial, tribal, and local entities conducting their own YRBSs to demonstrate how their youth’s behaviors compare to those at the national level. The objective of the sampling design for the national YRBS is to support estimation of the health risk behaviors in a nationally representative population of 9th through 12th grade students. Estimates will be generated among students overall and by sex, grade, and race/ethnicity (white, Black, Hispanic). The 2025 YRBS will be the 19th fielding of this national survey.
Section 2 of this document presents our plans for achieving the target number of participating students in the 2025 national YRBS. Section 3 describes the sampling methods, and Section 4 presents the planned weighting and variance estimation procedures.
The sample design planned for the 2025 YRBS survey is consistent with the sample design used in past cycles, which includes adjusting sampling parameters to reflect changing demographics of the in-school population of high school students.
The YRBS sample size calculations are based on the following assumptions:
The main structure of the sampling design will be consistent with the design used to draw the sample for prior cycles of the YRBS.
3 Secondary Sampling Units (SSUs) within each sample Primary Sampling Unit (PSU) will be selected. A PSU is defined as a county, a portion of a county, or a group of counties. A SSU is a “full” school that serves as a sampling unit that can supply a full complement of students in grades 9 through 12. SSUs with at least 28 students per grade are considered “large;” otherwise they are considered “small.” On average, each selected class will include 25 students in large schools and 20 students in small schools.1
A 55% overall response rate, calculated as the product of the school and student response rates, and based on an 82% student response rate and on a 68% school response rate.
Based on these assumptions, we will draw a sample of 60 PSUs, with 3 large SSUs (“full” schools) selected from each PSU for a total of 180 large SSUs. On average and before non-response, a PSU will supply a sample of 264 students across all of grades 9-12 (3 SSUs * 4 grades/school * 25 students per grade) from large schools. Across the 60 sample PSUs, the sample yield from large schools will be 23,940 students before school and student non-response, and 13,167 after non-response.
To provide adequate coverage of students in small schools, we also will select small SSUs from a subsample of 20 PSUs. As in prior YRBS cycles, we will select one small SSU in each of 20 subsample PSUs, therefore adding an additional 20 SSUs to the sample. Small SSUs are expected to add 1,600 students before non-response and 880 participating students. The total expected yield is therefore 13,167 + 880 = 14,047 participating students.
Altogether, the proposed sample design is expected to yield 200 SSUs. Each SSU is either comprised of a single school (if the school includes each of grades 9-12), or two or more linked schools that do not include all of grades 9-12. This is done to form school-based SSUs that provide coverage for all four grades in each unit. During the grade selection process (see Section 3.6.1), physical schools are selected for each SSU.
Within each school, one class will be selected from each grade to participate in the survey except in high minority schools, where two classes per grade will be selected. Double class selection (DCS) has been used in all previous YRBS surveys to support health risk behavior prevalence estimates by race/ethnicity. For the 2025 YRBS, we will implement DCS in schools with high concentrations of Black students; specifically, in the 33% of large schools with the highest percentages of Black students. As discussed later in Section 3.4.3, the changes have been introduced to enhance the number of participating Black students. In total, the sample schools are expected to yield 14,047 participating students using an overall 55% response rate.
Factors that influence the size of prevalence estimate confidence intervals include 1) whether the estimate is for the full population or for a demographic subgroup (i.e., by sex, race/ethnicity, or grade) 2) the prevalence rate, 3) and the design effect (DEFF) associated with each risk behavior.2 The DEFF, which equals 1.0 for simple random sampling, reflects the variance-increasing effects of unequal weighting and sample clustering.
Based on the prior YRBS studies, which had similar designs and sample sizes, we can expect the following levels of precision:
95% confidence for domains defined by grade, sex, or race/ethnicity,
95% confidence for domains defined by crossing grade or race/ethnicity by sex; and
90% confidence for domains formed by crossing grade with race/ethnicity.
The sampling universe for the YRBS sample consists of all regular public, Catholic, and other private school students in grades 9 through 12 in the 50 states and the District of Columbia. The target population excludes students in alternative schools, special education schools, Department of Defense (DOD) operated schools, Bureau of Indian Education (BIE) schools, vocational schools that serve only pull-out populations3, and students enrolled in regular schools unable to complete the questionnaire without special assistance. The sampling frame has been updated as described below in Section 3.2 and now contains 1,072 PSUs.
The sample will be a stratified, three-stage cluster sample with PSU stratified by racial/ethnic status and urban versus rural. PSUs are classified as "urban" if they are in one of the 54 largest Metropolitan Statistical Areas (MSA) in the U.S.; otherwise, they are classified as "rural". Within each stratum, PSUs, defined as a county, a portion of a county, or a group of counties, will be chosen without replacement at the first stage. Exhibit 3.1 presents key sampling design features.
Exhibit 3.1 Key Sampling Design Features
| Sampling Stage | Sampling Units | Sample Size (Approximate) | Stratification | Measure of Size | 
| 1 | PSU: County, a portion of a county, or a group of counties | 60 PSUs | Urban vs. non-urban (2 strata) Minority concentration (8 strata) | Aggregate school size in target grades | 
| 
				 2 | 
				 Schools | Sample 200 second-stage units (SSUs) with >=3 per PSU | 
				 Small vs. other | 
				 Enrollment | 
| 
				 3 | 
				 Classes/ students | 1 or 2 classes per grade per school: Approx. 14,000 participating students | 
				 
				 | 
				 
				 | 
We plan to replicate the main features of the 2023 YRBS sample design. As in the past few cycles, we will continue to adjust sampling parameters to reflect changing demographics of the in-school population.
Decreasing Need to Oversample Hispanic and Black Students
In general, as the proportion of Black and Hispanic students in the study population increases and the minority population becomes more evenly distributed, the parameters that drive minority oversampling can be relaxed, allowing us to maintain yields while moving towards a statistically more efficient design.
Specifically, growing percentages of Black and Hispanic students have allowed the design to be closer to a self-weighting design, and therefore, be more efficient in the sense of minimizing the variance of overall survey estimates. The main modification in the last few cycles of the study has been to define the measure of size (MOS) as eligible enrollment rather than a weighted MOS designed to oversample minority students.
In cycles prior to 2017, the allocation to strata oversampled strata with higher concentrations of minority students. In more recent cycles (2017, 2019, 2021 and 2023), however, the design moved to a nearly proportional allocation with the aim of enhancing the precision of overall estimates. While oversampling via the weighted MOS is no longer necessary to achieve sufficient numbers of Black and Hispanic students, DCS still implements oversampling of Black students by focusing this sampling on schools with high concentrations of Black students. Section 3.4.2 describes the updated stratification and Section 3.4.3 describes our DCS method.
Exhibit 3.2 presents the percentages of public high-school students who are Black and Hispanic for the YRBS sampling frames spanning 2011 to 2023. The table shows that while the percentage of Black students has remained stable in the frame, the percentage of Hispanic students has been steadily increasing. The percentage of Hispanic high-school students has increased from 21.7% in 2011 to 28.4% in 2023. By contrast, the percentage of Black students has largely oscillated between 15% and 16% (with a notable dip in 2015).
Exhibit 3.2 Historical Trends for Black and Hispanic Students in YRBS Frame
| 
				 | 2011 | 2015 | 2017 | 2019 | 2020 | 2021 | 2023 | 
| Black | 15.94% | 12.21% | 14.87% | 15.40% | 15.44% | 14.94% | 14.52% | 
| Hispanic | 21.72% | 23.58% | 23.58% | 23.96% | 24.01% | 27.06% | 
The design updates aim to increase the sample representation of Black students which has declined steadily over the same period even while the population percentages have remained stable for this group. On the other hand, the sample representation of Hispanic students has more closely reflected the growing presence of the group in the population frame.
Design Updates
Two other design features are also routinely updated in each cycle:
The stratum boundaries based on the percentage of minority students are re-computed to minimize variances according to the cumulative square root rule (Dalenius-Hodges rule).4
We adjust PSU definitions to account for school openings and closings and may also adjust PSU sample sizes if the simulated yields indicate the need for adjusting sample sizes.
In addition, as described in Section 3.4.3, the PSU sample allocation and DCS approach have been revised to enhance the yields for minority students.
In the 2025 YRBS, we will continue the practice of constructing a more comprehensive sampling frame from different data sources. The frame will combine data files obtained from MDR Inc. (Market Data Retrieval, Inc.) and from the National Center for Education Statistics (NCES). The MDR frame contains school information including enrollments, grades, race distributions within the school, district, and county information, and other contact information for public and non-public schools across the nation. The NCES frame source includes the Common Core of Data (CCD) for public schools and the Private School Survey (PSS) for non-public schools. Prior to the 2013 YRBS, one single source of national schools (MDR) was used as the sampling frame. The updated frame contains 1,072 PSUs.
The reason for constructing a combined frame from multiple data sources was to increase the coverage of schools nationally. Exhibit 3.3 illustrates the potential increase of coverage. If we consider the column of data on the left to be the previous approach and the column of data on the right to be the added NCES datasets, we can see that both sources of data are missing schools from their list (indicated by the dashed lines). The MDR schools not on the NCES files do not represent an increase in coverage as they already exist on the single-source frame. Combining helps to fill in the missing schools and ensures more representation.
This dual-source frame build method was piloted in 2015 and resulted in a coverage increase among all public and non-public high schools of 23%. There was 15.5% increase of coverage among public schools and a 46% increase in coverage among non-public high schools. The increase of schools increased the student coverage among public high schools by 2% and 16.5% for non-public high schools. Most of the added schools were smaller schools. This dual-source frame build method has subsequently been used each cycle.
Exhibit 3.3 Increased Coverage with the Combined File Approach
 
When combining data sources to form a sampling frame, it is essential to eliminate duplicates across the files – that is, each school should be represented once on the final frame, regardless of the number of times it is represented in the multiple source files. To minimize duplication, schools will be matched based on NCES school identifier, address and phone number. Once the sample has been drawn, a manual review of the sampled schools will be conducted to further eliminate duplicate schools.
A modification introduced in 2015 was the inclusion of a threshold for school size so that the frame does not include very small schools. The threshold is defined in terms of the aggregate school enrollment in eligible grades. The threshold was modified from the minimum 25, used in prior cycles, to a minimum total enrollment of 40. The school size threshold was established in consultation with CDC primarily for cost efficiency, but also due to concerns about confidentiality. The cost of recruiting and collecting data from very small schools outweighed the benefit of adding a relatively small number of students that attend this subset of schools. In other words, the efficiency gains may come at the price of under-coverage of small schools, with the potential for associated biases. This section summarizes the results of our investigation of the under-coverage impact of requiring a minimum school size.5
This analysis looked at the percentage of students that would be left out of the frame for varying values of the threshold. To assess the potential bias that might be associated with these exclusions, we also examined the percentage of Black and Hispanic students who are left out of the frame when very small schools are not included in the school frame.6 The analysis showed that the bias potential was very small for either size threshold, c=25 or c=40. It showed that 0.51% of the students would have been excluded from the frame using a truncation threshold of 25 students; for a threshold of 40, these percent exclusions increased to 0.97%. The percentages of minority students also dropped by very small amounts for the threshold of c=40 as well as for c=25.
In summary, the truncation resulting from either size threshold leads to small levels of student-level under-coverage, and therefore, minimum impact on student-level estimates. At the same time, excluding these very small schools leads to substantial efficiencies in recruitment efforts and in increased student yields per visited school. Therefore, ICF plans to continue the use of a threshold of c=40 for the 2025 YRBS.
The sampling approach will utilize Probability Proportional to Size (PPS) sampling methods. In general, when the measure of size is defined as the count of final-stage sampling units, and a fixed number of units are selected in the final stage of a PPS sample, the result is an equal probability of selection for all members of the universe. This is the case for the YRBS, where student counts are used as the measure of size, and a roughly fixed number of students are selected from each school as the final stage. Thus, this design results in a roughly self-weighting sample.
Prior cycles of YRBS have included a weighted measure of size to increase the probability of selection of high minority (Hispanic and Black) PSUs and schools. The effectiveness of a weighted measure of size in achieving oversampling is dependent upon the distributions of Black and Hispanic students in schools. The need for a weighted measure of size is predicated on a relatively low prevalence of minority students in the population; however, this premise has become less tenable with the growth in the population proportion of Black and Hispanic students as seen in Exhibit 3.2 earlier.
During the design of the initial YRBS cycles, ICF conducted a series of simulation studies that investigated the relationship of various weighting functions to the resulting numbers and percentages of minority students in the obtained samples.7 We performed new simulation studies periodically to ensure that we are using the minimum amount of measure of size weighting necessary to achieve target yields of Black and Hispanic students. Starting with the 2013 YRBS, we concluded that we could move to an unweighted measure of enrollment size, which would increase the statistical efficiency of the design and therefore lead to more precise prevalence estimates. Therefore, an unweighted measure of size will continue to be used for the 2025 YRBS sampling designs.
Definition of Primary Sampling Units
In defining PSUs, several issues are considered:
Each PSU should be large enough to contain the requisite numbers of schools and students by grade, and small enough so as not to be selected with near certainty.
Each PSU should be compact geographically so that field staff can go from school to school easily.
PSUs definitions should be consistent with secondary sampling unit (school) definitions.
PSUs are defined to contain at least four large high schools.
Generally, counties will be equivalent to PSUs, with two exceptions:
Low population counties are combined to provide sufficient numbers of schools.
High population counties are divided into multiple PSUs so that the resulting PSU will not be selected with certainty8.
The basic county-to-PSU assignments have remained relatively stable from one YRBS cycle to the next. As we obtain new frame data each YRBS cycle, school and student counts for each PSU are updated to account for school openings and closings. Updated county populations are aggregated from school enrollment data for the grades of interest.
The PSU frame is then screened for PSUs that no longer meet the criteria given above. We adjust the frame by re-combining small counties/PSU as necessary to ensure sufficient size while maintaining compactness. Near certainty PSUs are split using an automated procedure built into the sampling program.
Stratification of PSUs
The PSUs will be organized into 16 strata, based on the urban/rural location of the school and minority enrollment. The approach involves the computation of optimum stratum boundaries using the cumulative square root of “f” method developed by Dalenius-Hodges. This method is useful where there are many PSUs at the lower levels of concentration, and they become sparse as the percentage increases, which is the case here. The boundaries or cutoffs change as the frequency distribution (“f”) for the racial groupings change from one survey cycle to the next.
To reiterate, the three-stage cluster sample will be stratified by racial/ethnic composition and urban versus non-urban status at the first stage. PSUs are defined as a county, a group of smaller counties, or a portion of a very large county. PSUs are classified as “urban” if they are in one of the 54 largest MSAs in the U.S.; otherwise, they are classified as “non-urban.” Additional, implicit stratification will be imposed by geography by sorting the PSU frame by state and by 5-digit ZIP Code (within state).
The specific definitions of primary strata are as follows:
If the percentage of Hispanic students in the PSU exceeded the percentage of Black students, then the PSU is classified as Hispanic. Otherwise, it is classified as Black.
If the PSU is within one of the 54 largest MSAs in the U.S. it is classified as ‘Urban’, otherwise it is classified as ‘Rural.’
Hispanic Urban and Hispanic Rural PSUs are classified into four density groupings depending upon the percentages of Hispanic students in the PSU.
Black Urban and Black Rural PSUs are also classified into four groupings depending upon the percentages of Black students in the PSU.
Exhibit 3.4 illustrates the process with the boundaries newly computed for the 2025 YRBS. It is worth stressing that the boundaries are re-computed for each cycle of the YRBS as we employ the Dalenius-Hodges method (described above) to allow the boundaries to adapt to the changing race/ethnic distribution of the student population. Exhibit 3.5 shows the stratum sizes using the new definitions of the 16 strata.
Exhibit 3.4 Minority Percentage Bounds for PSU Stratification
| Minority Concentration | Density Group | Bounds | |
| Urban | Rural | ||
| Black | 1 | 0%-16% | 0%-14% | 
| 2 | >16%-30% | >14%-30% | |
| 3 | >30%-46% | >30%-46% | |
| 4 | >46%-100% | >46%-100% | |
| Hispanic | 1 | 0%-12% | 0%-8% | 
| 2 | >12%-28% | >8%-22% | |
| 3 | >28%-44% | >22%-46% | |
| 4 | >44%-100% | >46%-100% | |
Exhibit 3.5 Stratum Sizes: Number of High Schools and Students Enrolled
| Stratum | High Schools | Students Enrolled | 
| BR1 | 1,052 | 505,904 | 
| BR2 | 1,312 | 804,329 | 
| BR3 | 1,067 | 550,755 | 
| BR4 | 787 | 365,164 | 
| BU1 | 601 | 419,250 | 
| BU2 | 707 | 559,556 | 
| BU3 | 1,277 | 795,507 | 
| BU4 | 620 | 404,926 | 
| HR1 | 3,882 | 1,400,744 | 
| HR2 | 3,373 | 1,669,284 | 
| HR3 | 2,300 | 1,375,662 | 
| HR4 | 1,671 | 1,133,452 | 
| HU1 | 742 | 550,271 | 
| HU2 | 1,320 | 1,126,282 | 
| HU3 | 2,068 | 1,892,199 | 
| HU4 | 3,231 | 2,800,388 | 
| Total | 26,010 | 
Exhibit 3.6 presents the percentages of Black and Hispanic students in the different strata. It shows how the “predominantly Black” strata have increasing concentrations of Black students in the rural areas when you move from BR1 to BR4, and similarly for urban areas (BU1 to BU4). It also shows how the “predominantly Hispanic” strata have increasing concentrations of Hispanic students in the rural areas when you move from HR1 to HR4, and similarly for urban areas (HU1 to HU4). For example, the percentages of Black students in the two “high Black” strata, BR4 and BU4, exceed 56%; that can be contrasted with 10% or less in the two “low Black” strata, BR1 and BU1.
Exhibit 3-6 Minority Percentage by Strata
| Stratum | Percent Black | Percent Hispanic | 
| BR1 | 6.72% | 4.69% | 
| BR2 | 20.61% | 10.45% | 
| BR3 | 36.36% | 9.76% | 
| BR4 | 56.03% | 7.70% | 
| BU1 | 10.65% | 5.90% | 
| BU2 | 22.40% | 12.96% | 
| BU3 | 36.48% | 18.84% | 
| BU4 | 56.29% | 16.36% | 
| HR1 | 1.99% | 4.62% | 
| HR2 | 4.26% | 14.35% | 
| HR3 | 11.26% | 32.60% | 
| HR4 | 3.05% | 71.22% | 
| HU1 | 3.31% | 6.72% | 
| HU2 | 7.50% | 20.75% | 
| HU3 | 15.80% | 34.95% | 
| HU4 | 11.05% | 61.09% | 
Allocation of the PSU Sample and Double Class Sampling
In the last few cycles of the YRBS, the sample PSUs are allocated to the 16 strata in a way to improve minority student yields, and therefore the precision of subgroup estimates. Using the 2025 YRBS stratified sampling frame, we developed a new allocation shown in Exhibit 3.7 and validated it based on simulations.
Exhibit 3.7. Sample PSU Allocation to First-Stage Strata
| Predominant Minority | Urban/Rural | Density Group Number | Stratum Code | PSU Sample Allocation 
					 | 
| Black | Urban 
					 | 1 | BU1 | 2 | 
| 2 | BU2 | 2 | ||
| 3 | BU3 | 3 | ||
| 4 | BU4 | 3 | ||
| Rural 
					 | 1 | BR1 | 2 | |
| 2 | BR2 | 3 | ||
| 3 | BR3 | 2 | ||
| 4 | BR4 | 3 | ||
| Hispanic | Urban 
					 | 1 | HU1 | 2 | 
| 2 | HU2 | 4 | ||
| 3 | HU3 | 6 | ||
| 4 | HU4 | 10 | ||
| Rural 
					 | 1 | HR1 | 4 | |
| 2 | HR2 | 6 | ||
| 3 | HR3 | 4 | ||
| 4 | HR4 | 4 | 
The simulation study also allowed an investigation of two approaches DCS. The first approach, adopted in recent cycles (2021 and 2023 YRBS), considered DCS in a subset of large schools (1/3 of large schools). The second approach considered DCS in a subset of large schools with the highest concentrations of Black students. This alternate approach, used in two previous cycles (2017 and 2019 YRBS), was designed to improve the Black student, yields which have been declining in comparison with Hispanic student yields. Exhibits 3.8 and 3.9 show the simulation results attained with the two DCS methods. The results show that the second method leads to much improved yields for Black students, and was the method chosen for the 2025 YRBS sample design and selection.
Exhibit 3.8 Average Yields for
Black and Hispanic Students for Method 1:
Target Large Schools
| 
 | Overall Students | Black Students | Hispanic Students | 
| Total | 15766 | 2659 | 4365 | 
| Grade 9 | 3945 | 668 | 1099 | 
| Grade 10 | 3939 | 660 | 1090 | 
| Grade 11 | 3941 | 665 | 1089 | 
| Grade 12 | 3941 | 666 | 1086 | 
Exhibit 3.9 Average Yields for Black and Hispanic Students for Method 2: Target Large Schools with Highest Concentrations of Black Students
| 
 | Overall Students | Black Students | Hispanic Students | 
| Total | 16144 | 3525 | 4326 | 
| Grade 9 | 4019 | 877 | 1082 | 
| Grade 10 | 4065 | 890 | 1092 | 
| Grade 11 | 4019 | 872 | 1076 | 
| Grade 12 | 4041 | 886 | 1076 | 
Selection of PSUs
Using PPS sampling, we will select a sample of 60 PSUs for the YRBS. The size measure used will be the sum of total school enrollment across schools in the PSU. With PPS sampling, the selection probability for each PSU is proportional to the PSU’s measure of size.
If 
 is the measure of size for school k
in PSU l
in stratum m
and if
is the measure of size for school k
in PSU l
in stratum m
and if 
 is
the number of PSUs to be selected in stratum m, then
is
the number of PSUs to be selected in stratum m, then 
 is the probability of selection of PSU l in stratum m:
is the probability of selection of PSU l in stratum m:
 
As noted above, 20 of the 60 sample PSUs will be sub-sampled for the separate sampling of small schools. Thus, the sub-sample PSUs are assigned an additional sampling factor (20/60) in their probability of selection for small schools.
Second-stage units (SSUs)
Secondary Sampling Units (SSUs) are formed from single schools or combinations of schools. Single schools represent their own SSU if they have students in each of grades 9th-12th. Schools that do not have all grades are grouped together to form an SSU (a.k.a., “linked school”). Most commonly, students from a 10-12th grade school are grouped with the 9th grade students from a nearby 7th-9th grade school to form a SSU. Forming SSUs that contain all grades ensure representation at each grade level to support the selection of one or more classes from each grade in SSUs (third stage).
Stratification
SSUs are stratified into two size strata comprised of small and large schools. Small schools are defined as those that cannot support the selection of an entire class at all grade levels. That is, a school is considered to be small if it has less than 28 students per grade at any grade level; all other schools are considered large.
SSU selection
Three large high schools are selected from each PSU. In addition, one small school is selected from each of 20 sub-sample PSUs. SSUs will be selected using a systematic probability proportional to size (PPS) method, with the unweighted enrollment described earlier as the measure of size.
The probability of selecting large
school k
in PSU l
and stratum m,
 ,
was computed as follows:
,
was computed as follows:
 
For small schools, one school was
drawn from sub-sampled PSU, so the probability of selection of a
small school, 
 ,
then becomes:
,
then becomes:
 
Note that the factor of 20/60 is the fixed probability that the PSU was selected for small school sampling.
Selection of grades
Within large SSUs, a single grade is sampled to represent the school at each of the four high school grades. For the vast majority of SSUs, composed of one physical school, this means that all eligible grades are included in the class selection process for the school; there is a one-to-one correspondence between SSU and school.
Within each SSU formed by linking, or combining physical schools, grade samples are drawn independently with one component school being selected to supply each grade, proportional to grade level enrollment.
For small schools, no grade level sampling is performed. All students in the eligible grades that make up the school will be selected.
Selection of classes
We will select one class per grade to participate in the survey in all schools not designated for DCS. In the subset of large schools with the highest concentrations of Black students, we will select two classes per grade instead of one single class. We will adopt DCS in the top tercile of sample schools along the percentage of Black students; the subset with DCS will be 1/3 of all large schools in the sample. DCS is an instrumental design feature to meet precision requirements for racial/ethnic subgroup estimates.
The method of selecting classes will vary from school to school, depending upon the organization of that school and whether schools are linked. The key element of the class sampling strategy is to identify a structure that partitions the students into mutually exclusive, collectively exhaustive groupings that are of approximately equal sizes. Beyond that basic requirement, we will do the partitioning to result in groups in which both sexes and all students have a chance to be selected. In selecting classes, we will generally give preference to selecting from mandatory courses such as English.
We will not use special procedures to sample for minorities at the school building level for two reasons:
Schools do not maintain student rosters that identify students by racial/ethnic affiliation.
Identifying student respondents based on race/ethnicity may be perceived as offensive by students and/or school administrators.
Selection of students
All students in a selected classroom will be eligible for the survey with the exception of students who cannot complete the survey independently (e.g., for language or cognitive reasons).
Replacement of schools/school systems
We will not replace refusing school districts, schools, classes or students. We have allowed for school and student response in the sampling design. The numbers of selections are inflated to account for expected levels of non-response as discussed earlier, assumed to be 82% at the student level and 68% at the school level.
This section addresses the steps planned for computing sampling weights, nonresponse adjustments, and post-stratification (raking) adjustments. In addition, we describe the methods planned for trimming weights (to avoid excessive variability) and to compute weighted estimates and their variances.
Before describing these steps in turn, it is worth noting that the procedures planned for the tablet sample follow exactly the same sequence of steps.
Although the sample was designed to be self-weighting under certain idealized conditions, it will be necessary to compute weights to produce unbiased estimates. The basic weights, or sampling weights, will be computed on a case-by-case basis as the reciprocal of the probability of selection of that case.
If k is the number of PSUs to be selected from a stratum, Ni is the size of stratum i and Nij is the size of PSU j in stratum i (in all cases "size" refers to student enrollment), then the probability of selection of PSU j is k×Nij/Ni.
Assuming three large schools are to be selected in stratum i, Nijk is the size of school k in PSU j in stratum i, then the conditional probability of selection of the school given the selection of the PSU is 3×Nijk/Nij for YRBS large schools.
The derivation is similar for small schools, with an extra factor to account for PSU subsampling probability.
If Cijk is the number of classes in school ijk then the conditional probability of selection of a class is just 1/Cijk (or 2/Cijk if two classes are taken). Since all students are selected, the conditional probability of selection of a student given the selection of the class is unity.
The overall probability of selection of a student in stratum is the product of the conditional probabilities of selection. The probabilities of selection will be the same for all students in a given school, regardless of their ethnicity.
Sampling weights assigned to each student record are the reciprocal of the overall probabilities of selection for each student.
Several adjustments are planned to account for student and school nonresponse patterns. An adjustment for student nonresponse will be made by sex and grade within school. With this adjustment, the sum of the student weights over participating students within a school matches the total enrollment by grade and sex in the school collected during data collection. This adjustment factor will be capped in extreme situations, such as when only one or two students respond in a school, to limit the potential effects of extreme weights on the precision of survey estimates.
The weights of students in participating schools will be adjusted to account for nonparticipation by other schools. The adjustment uses the ratio of the weighted sum of measures of size over all selected schools in the stratum (numerator of adjustment factor), and over sum of the weighted measure of size for participating schools in a stratum (denominator of adjustment factor). The adjustment factor will be computed and applied to small and large schools separately.
For large schools the partial school weight is the inverse of the probability of selection of the school given that the PSU was selected:
 
For small schools the partial school weight is:
 
Extreme variation in sampling weights can inflate sampling variances, and offset the precision gained from a well-designed sampling plan. One strategy to compensate for these potential effects is to trim extreme weights and distribute the trimmed weight among the untrimmed weights. We will integrate the trimming and raking iterative processes as in the previous YRBS cycles in a way that makes both processes more efficient statistically as well as logistically.
Post-stratification approaches capitalize on known population totals and percentages available for groups of schools and students. National estimates of racial/ethnic counts for post-stratification are obtained from two sources described next. Private school enrollments by grade and five racial/ethnic groups are obtained from the Private School Universe Survey (PSS). Public school enrollments by grade, sex, and five racial/ethnic categories are obtained from the Common Core of Data (CCD), both produced by the National Center for Education Statistics (NCES). These databases are combined to produce the enrollments for all schools, and to develop population counts to use as controls in the post-stratification step.
An iterative approach to post-stratification, called raking, will allow the use of additional post-stratification dimensions.
For post-stratification purposes, a unique race/ethnicity is assigned to respondents with missing data on race/ethnicity, those with an “Other” classification, and those reporting multiple races. For private schools, we use two race/ethnic classifications – white and non-white. For public schools we use the full five categories.
If wi is the weight of case i (the inverse of the probability of selection adjusted for nonresponse and post-stratification adjustments) and xi is a characteristic of case i (e.g., xi=1 if student i smokes, but is zero otherwise), then the mean of characteristic x will be (Σ wixi)/(Σ wi). A population total would be computed similarly as (Σ wixi). The weighted population estimates will be computed with the Statistical Analysis System (SAS).
These estimates will be accompanied by measures of sampling variability, or sampling error, such as variances and standard errors, that account for the complex sampling design. These measures will support the construction of confidence intervals and other statistical inference such as statistical testing (e.g., subgroup comparisons or trends over successive YRBS cycles). Sampling variances will be estimated using the method of general linearized estimators9 as implemented in SAS survey procedures. These software packages must be used since they permit estimation of sampling variances for multistage stratified sampling designs, and account for unequal weighting, and for sample clustering and stratification.
1 Note that the definition of the school size strata uses the same cutoff for small schools that is traditionally used. This cutoff, not necessarily linked to the expected number of students participating per class, will lead to large schools which can support double class sampling.
2 The design effect is defined as the ratio of actual variances attained under the actual design and the variances that would be obtained with a simple random sample of the same size.
3 Pull-out populations include students who receive general instruction at one school but report to a different school for specialized instruction or services.
4 Dalenius, T. and Hodges, K. (1959) “Minimum variance stratification.” Jour. Amer. Statist. Assoc., 54, 88-101.
5 The new method for frame construction improves coverage by using a frame that combines MDR and NCES data files rather than relying on a single source. This method adds a disproportionately large number of very small schools that used to be left out of the frames based solely on the MDR files.
6 In theory, bias due to loss of coverage of these very small schools might also be assessed by comparing selected estimates of risk behavior outcomes for students in these schools with estimates from the balance of the schools or with overall estimates. This comparison is not statistically possible, however, as the number of tiny schools is relatively small in recent cycles of the surveys, and so is the student yield in these schools.
7 Errecart, M.T., Issues in Sampling African-Americans and Hispanics in School-Based Surveys. Centers for Disease Control, October 5, 1990.
8 The variance estimation process is more efficient without the need to account for certainty PSUs. The method of dividing large PSUs ensures that each sub-county PSU mirrors the distribution of schools in the county as a whole.
9Skinner CJ, Holt D, and Smith TMF, Analysis of Complex Surveys, John Wiley & Sons, New York, 1989, pp. 50.
	
	
| File Type | application/vnd.openxmlformats-officedocument.wordprocessingml.document | 
| Author | Trott, Jill | 
| File Modified | 0000-00-00 | 
| File Created | 2024-09-07 |