 
Supporting Statement B for
Extention of NEXT Generation Health Study – NICHD
[OMB No. 0925-0610]
Date: (Should be the date when the final version is sent to our office; which is after the 60-day comment period)
Denise L. Haynie, PhD, MPH
Health Behavior Branch
Division of Intramural Population Health Research
Eunice Kennedy Shriver National Institute of Child Health and Human Development
Building 6100, 7B13
6100 Executive Blvd
Bethesda, Maryland, 20892-7510
Telephone: (301) 435-6933
Fax: (301) 402-2084
E-mail: Denise_Haynie@nih.gov
Table of contents
B. COLLECTIONS OF INFORMATION EMPLOYING STATISTICAL METHODS 3
B.1 Respondent Universe and Sampling Methods 3
B.2 Procedures for the Collection of Information 3
B.3 Methods to Maximize Response Rates and Deal with Non-response 9
B.4 Test of Procedures or Methods to be Undertaken 14
B.5 Individuals Consulted on Statistical Aspects and Individuals Collecting
and/or Analyzing Data………………………………………………………………………………………………………….………15
A nationally-representative cohort of U.S students in grade 10 was recruited using a multistage stratified design. Primary sampling units consisted of school districts or groups of school districts stratified across the nine U.S. Census divisions. Within this sampling framework 137 schools were selected and formally recruited; 81 (59%) agreed to participate. Tenth-grade classes were randomly selected within each recruited school and 3,796 students were recruited to participate; youth assent and parental consent were obtained from 2,874 (76%) students. At Wave 6, 2,296 (80% of total sample) participants completed surveys (not including respondents to the peer survey). There was an oversampling of African Americans, resulting in 687 participants in the original sample, with 598 (87%) participating in the Wave 6 survey. There were 835 Hispanic participants in the original sample, with 693 (83%) participating in the Wave 6 survey. Among the 560 original NEXT Plus participants, 459 (82%) completed Wave 6 surveys, and 82% completed the most recent home visit (Wave 4).
Statistical methodology for stratification and sample selection.
A multi-stage design was used for sample selection. The first stage of sampling consisted of the construction of 1,302 primary sampling units (PSUs) from a population of around 14,000 school districts. The list of school districts supplied the Quality Educational Data, Inc. (QED). QED maintains a continuously updated list of every school district in the U.S. and is therefore current. It also maintains a current list of K-12 schools by state with contact information covering 100% of public, private and Catholic schools by State in the U.S. Private and parochial schools were linked to public districts to ensure that these sampled schools fell within the same sample clusters as sampled public schools. PSUs were formed by grouping school districts within each Census division. Some PSUs contained only one very large school district, others contain all school districts within a county or two adjacent counties. A sample of PSUs was drawn, stratified by Census division, and a list of schools offering grade 10 was obtained for only the selected PSUs. This method of sampling reduced the cost of data collection as the sample of schools was not spread very widely across the U.S. We contacted a probability sample of 137 schools and 81 agreed to participate in the survey. We conducted response bias analysis to determine if the schools that consented to participate in the study were different than the schools that refused. The ONLY significant difference between schools that participated and those that refused was on the proportion of Asian American students. Because of the relatively small difference in the proportion of Asian American students in both groups (approximately 3%), this difference could have been due to the population of a single school in the refusal group and/or the oversample of schools with a high proportion of African American students.
The sampling frame for the NEXT Plus substudy (N=560) was all schools successfully recruited to participate in the basic survey. The following sampling stages were implemented.
In each of the nine strata (Census Divisions) all schools recruited were listed.
Geographic cluster sampling was used to group schools, which were in relatively close geographic proximity, into clusters (or “communities”).
On average, two clusters per Census Division were randomly selected for a total of 20 communities.
Within each “community” cluster, schools were first sorted by whether they were urban, suburban, and rural schools to assure representation.
Two schools within each cluster were then systematically sampled.
Each school selected contributed two classrooms that were randomly selected to participate in the basic survey.
At the study office, students’ in the selected classrooms were categorized as “overweight” or “normal weight” based on their height and weight measurements collected during the main study.
Seven overweight children and seven normal weight children were randomly selected across classes per school from the respective weight status categories and recruited to the NEXT Plus sample.
Estimation procedure.
For producing population-based estimates, each responding student is assigned a sampling weight. This weight combines a base sampling weight which is the inverse of the probability of selection of the student and an adjustment for nonresponse at the school level and the student level. The probability of selecting a student is the product of the probability of selecting the school district, the probability of selecting the school within the district and the probability of selecting the class in which the student is present. The inverse of the overall probability gives the base weight. Various selection probabilities are recorded and used to construct the sampling weight. The base weights are adjusted for nonresponse. All student level estimates including estimates of change are weighted estimates using the student weight. All student level analyses use student weights.
The objective is to select each student with a known probability of selection. Because of probability proportional to size (PPS) sampling at the first and second stages and unequal number of classes in selected schools, the overall probabilities of selection for students are unequal. As indicated above, we determine the overall probability of selecting each student in the sample considering the three stages of sampling. The base sampling weight assigned to each student is the inverse of the overall probability of selection of that student.
The
size measure for selecting primary sampling units using PPS sampling
is total enrollment. The size measure for selecting schools offering
grade 10 was enrollment in grade 10.  We used PPS systematic sampling
to select primary sampling units and schools within selected primary
sampling units. The determination of probability of selection at each
stage is straightforward under PPS systematic sampling.  For example,
the probability of selecting a PSU (say PSU )
within a Census division is
)
within a Census division is 
				 
where
 
 is the number of PSUs selected,
is the number of PSUs selected, 
 is the total enrollment in PSU
is the total enrollment in PSU 
 and
and 
 is the total enrollment in all the PSUs in that Census division. 
Similarly, we can determine the probability of selection within a
selected PSU. Classes were selected within a selected school using
equal probability systematic sampling. As indicated earlier, the
overall probability is determined by taking the product of the
probabilities of selection at the three stages.
is the total enrollment in all the PSUs in that Census division. 
Similarly, we can determine the probability of selection within a
selected PSU. Classes were selected within a selected school using
equal probability systematic sampling. As indicated earlier, the
overall probability is determined by taking the product of the
probabilities of selection at the three stages.
The adjustment for nonresponse at each stage is being done using the original base weights assigned to each unit. For example, the adjustment for nonresponse at school level involves the adjustment of school weights of responding schools such that the sum of the adjusted weights equal the sum of the weights of all selected schools including respondents and nonrespondents. Similarly, the weights of the responding students are adjusted to account for nonresponding students. There is a final post-stratification adjustment of all student weights using a raking procedure such that the sum of the students in gender and race groups add to known number of students in the population of students in grade 10.
Thus, for producing population-based estimates, each responding participant is assigned a sampling weight. This weight combines a base sampling weight which is the inverse of the probability of selection of the participant and an adjustment for nonresponse at the school level and the student level. The probability of selecting a participant is the product of the probability of selecting the school district, the probability of selecting the school within the district and the probability of selecting the class in which the student is present. The inverse of the overall probability gives the base weight. Various selection probabilities were recorded and used to construct the sampling weight. The base weights are adjusted for nonresponse. All participant level estimates including estimates of change are weighted estimates using the participant weight. All participant level analyses also use participant weights.
Degree of accuracy needed for the purpose described in the justification.
The NEXT sample has adequate power to provide populations estimates with a margin of error of plus or minus 3 percentage points at the 95% confidence level. In addition, this sample enables sub-group analyses comparing Hispanic, African-American, and Caucasian youth. The oversample of minorities results in a final basic survey sample with a minimum of 200 Hispanic and 200 African-American participants. As indicated in the power analysis for the NEXT Plus subsample (below), this sample will enable sophisticated longitudinal comparisons across racial/ethnic groups.
For specific hypotheses, the NEXT Plus subsample will be adequate to address primary hypotheses relating to obesity and cardiovascular disease. Power analysis and sample size estimation for specific hypotheses were conducted using Monte Carlo simulation procedures recommended by Muthen and Muthen (Muthen & Muthen, 2001). Monte Carlo simulation is the most common and preferred method to determine sample size for sufficient statistical power in multivariate analysis and structural equation modeling. In a Monte Carlo simulation, random samples with a specified sample size are generated repeatedly from a population with known parameters consistent with the proposed model. Path coefficients are then estimated from each simulated sample. The percentage of simulated samples that have significant parameters indicates the power of the study. The required sample size can be accurately determined by varying sample sizes in a series of simulations. The Monte Carlo study for determining power and sample sizes for the present study was conducted using Mplus version 3.0, which provides extensive simulation facilities for structural equation modeling.
The power analysis for determining sample sizes was conducted using a latent growth curve model for the relationship between participant physical activity and participant-reported peer physical activity, i.e., a linear model with seven repeated measures of physical activity as outcome with one-year intervals between the measures. Peer behavior was specified as a covariate with two additional covariates (gender and SES). Simulation was conducted using two peer effect sizes including various corresponding peer behaviors and outcomes in the study (substance use, physical activity, diet, obesity). A smaller effect size was defined by Cohen (1988) as 0.1 in standardized estimate and a medium effects size was 0.3. The path loadings from the intercept to the seven outcome measures were set at 1 and to the slopes were set from 0 to 7 with each unit represents a one year interval of assessment. Missing values were also generated in the simulation with each variable having 15% random missing.
Muthen and Muthen (2001) recommend several criteria for estimating appropriate sample sizes in power analysis for structural equation modeling. Parameter bias should not exceed 10%; standard error bias should not exceed 5%, and the coverage remains between 90 to 98%. The Monte Carlo simulation for this study conducted 1,000 replications with various sample sizes. The results from the simulation indicated that a final sample size of N = 440 for the linear model with small effect size had a statistical power of 96% to detect a peer effect, provided that missing values are random and below 15%. A separate simulation with medium effect size indicated that a sample size of N = 150 would have a power greater than 90% for detecting a peer effect. As a marker of clinical significance, a 0.3 to 0.5 SD between-group difference in physical activity should have a significant relation to health outcomes such as metabolic syndrome or adiposity. Thus, we would have the power to detect a clinically significant change in adiposity in analyses of the main sample and in analyses of selected subgroups. Subject retention has been higher in the NEXT Plus sample than the NEXT sample. The larger NEXT sample provides power to examine smaller effects within multilevel models and comparisons across sub-groups of interest. All criteria recommended by Muthen and Muthen (2001) were satisfied for the simulation studies.
Unusual problems requiring specialized sampling procedures.
We anticipated insufficient sampling of African American students in the basic sample, and therefore implemented a strategy to oversample this group. The strategy for minority oversampling was based on the requirement of around 215 African-American students at the end of wave 4 out of sample of 1,050 completes. We expected to get around 180 African American students at the end of wave 7. To get the additional minority students, we identified school with a high percentage of African American students and selected additional samples of students to screen and identify minority students. Originally it was planned to select additional primary sampling units for sampling Hispanic students. This plan was not necessary. We were able to recruit the required number of 215 Hispanic students without oversampling as the percentage of Hispanic students was slightly higher than African-American students.
Any use of periodic (less frequent than annual) data collection cycles to reduce burden.
Survey data is collected annually. The NEXT Plus in-home data collections occurred annually for the first four years of the study, but not at Wave 5 and 6. The final in-home assessment will be made at Wave 7.
The initial response rate for the study is 75.7%. Retention rates throughout the study have exceeded 75% at every wave (see Table 1). Of concern in longitudinal studies is loss of participants over times. To address this, additional outreach strategies were funded and employed between Wave 5 and Wave 6 resulting in an increase in participation Wave 6. Specific outreach strategies included mass mailings, multiple contacts via text and e-mails, holiday cards, and birthday cards. Participants identified as having incorrect contact information who had not updated their information in response the e-mail or text queries receive two phone calls from NEXT staff. All outreach contacts were tracked. Social media, including Facebook (the study has its own page which participants “like”) and LinkedIn were also utilized to track and contact participants. Although the survey is completed online, CDM routinely deploys Health Researchers into the field to facilitate participation. Public meeting places with free Wi-Fi access are identified and participants who have not yet completed the survey are invited to complete the survey at these locations. During the Wave 6 deployment, additional efforts were made to re-contact participants for whom current contact information was missing. These included visiting homes, contacting parents, and asking other participants from the same high school regarding their contact with participants missing information. Parent and peers were asked to contact the participants for the study staff, and share study staff contact information. These efforts resulted in re-engaging some participants who had missed one or more assessment, who completed the survey and updated their contact information.
| Table 1. Retention rates at each wave | ||
| Wave | Sample size | Percenta | 
| 1 | 2,618 | na | 
| 2 | 2448b | 86.8c | 
| 3 | 2414 | 83.9 | 
| 4d | 2183 | 75.9 | 
| 5 | 2202 | 76.6 | 
| 6 | 2296 | 79.9 | 
aDenominator = all participants with any assessments (n=2874, Waves 3-6)
bIncludes 254 participants not in Wave 1, including those from a school added at Wave 2
cParticipants in Waves 1 and 2 (n=2194) divided by total at Wave 1 (n=2618)
dTransition year out of high school
Nonresponse Bias Analysis in NEXT
Bias in a survey estimate because of nonresponse consists of two components. The first is the nonresponse rate and the second is the difference between respondents and nonrespondents in the population parameter that is being estimated. For example, if we are estimating a population percentage by selecting a simple random sample and computing the sample percentage and there is nonresponse, the bias in the sample percentage due to nonresponse is given by
			 
where
 is the sample percentage based on respondents,
is the sample percentage based on respondents, 
 is the response rate,
is the response rate,  
 is the population percentage among the respondents and
is the population percentage among the respondents and 
 is the population percentage among the nonrespondents.  Therefore, it
is important to examine both the response rate and the differences
between the responding and nonresponding groups in the analysis of
bias in the estimates due to nonresponse.  We describe below the
steps that we followed for nonresponse bias analysis due to
nonresponse by some schools in the NEXT sample.  These steps are in
accordance with the statistical standards set up by the National
Center for Education Statistics (NCES) for nonresponse bias analysis
(http://nces.ed.gov/StatProg/2002/std4_4.asp
).
is the population percentage among the nonrespondents.  Therefore, it
is important to examine both the response rate and the differences
between the responding and nonresponding groups in the analysis of
bias in the estimates due to nonresponse.  We describe below the
steps that we followed for nonresponse bias analysis due to
nonresponse by some schools in the NEXT sample.  These steps are in
accordance with the statistical standards set up by the National
Center for Education Statistics (NCES) for nonresponse bias analysis
(http://nces.ed.gov/StatProg/2002/std4_4.asp
).
1. Examination of Response Rates
We examined both the overall response rate and the response rates for various subgroups as per the guideline 4-4-2A under NCES Statistical Standards. We examined school response rates by: (1) census division; (2) rural and urban; (3) enrollment (large schools vs. small schools); (4) proportion of minority students; (5) poverty index for schools; and (6) school type - public, Catholic and private schools. As indicated above, the only significant difference between participating schools and those schools that declined was for the proportion of Asian-American students (6% in non-participating schools; 3% in participating schools; p < .05). We have made appropriate weighting adjustments to reduce this bias.
We also examined the proportion of missing data among participants. The overall missing rate for Waves 1 and 2 were reported in the previous application and were 9.7% and 8.4% respectively. For Waves 3 through 5, the overall mean percent missing was calculated for numeric variables only, and adjusted for skip patterns, which resulted in much lower overall proportion of missing. Character variables were not included because in the survey they are predominantly “Other, specify” items that are answered only when the offered options do not apply. Therefore, these have a high rate of missing values that do not represent participant nonresponse. Percent missing for questions that were part of a skip pattern, that is only answered by participants who answered another question a particular way, were computed using the number of respondent expected to answer the question. Skip patterns are programmed into the online survey, thus these questions are only presented to those who are eligible to respond; other participants are missing on these items by default. Wave 6 data is not yet available for these calculations. Below is a table with the mean percent missing for each wave by gender and race/ethnicity. Males had a significantly higher percentage of missing than females at Waves 4 and 5. Race/ethnicity difference were found at all three waves. African Americans had the highest percentage of missing at Wave 3; Hispanics had the highest percentage of missing at Waves 4 and 5.
| Table 2. Mean Percent missing on survey items by gender and race/ethnicity | |||
| 
			 | Wave 3 | Wave 4 | Wave5 | 
| Total | 2.8 | 3.8 | 5.0 | 
| Gender | 
			 | 
			 | 
			 | 
| Male | 3.0 | 4.2 | 5.5 | 
| Female | 2.6 | 3.3 | 4.7 | 
| Race/Ethnicity | 
			 | 
			 | 
			 | 
| White (referent) | 2.3 | 3.9 | 5.0 | 
| African American | 3.7 | 3.7 | 5.4 | 
| Hispanic | 2.7 | 4.2 | 5.7 | 
| Other | 2.5 | 3.3 | 4.0 | 
2. Comparison of Sample and Frame Estimates
Per the NCES guideline 4-4-2C, we use sampling weight based on the probability of selection of responding schools without any nonresponse adjustment and data from the responding schools to compute population estimates of some characteristics available (not used for stratification at the time of selection of schools) on the sampling frame. These estimates are compared with the population values. If there had been large differences taking into account the sampling error, then this may have indicated bias because of nonresponse. We also generated estimates of students in responding schools by race/ethnicity, and compared this to the total computed from the population of schools on the frame to determine whether there was any bias in the estimates. This was not the case.
3. Comparison of estimates based on respondents to estimates from external sources
Per the NCES guideline 4-4-2C, we compared estimates of the prevalence of selected identical survey health behaviors items from the 2009-2010 Health Behavior in School-Age Children Survey of 10-grade students to determine whether there were large differences in the survey estimates. A large difference which cannot be attributed to sampling error might indicate a bias in the estimates. Although comparisons were only made when the survey items were identical in both surveys, this approach is limited as differences may not be solely due to sample bias.
The primary outcomes of interest in NEXT are behaviors related to obesity; these include physical activity, sedentary behavior, and diet. Responses to the Wave 1 NEXT survey for physical activity, sedentary behavior and diet did not differ significantly from responses to identical items on the HBSC survey. However, comparisons of substance use behaviors (there are no equivalent national surveys of dating violence or young drivers available for comparisons) indicated that the NEXT cohort reported a lower prevalence of smoking and alcohol use as well as lower reported use of Baltok, a fictitious ‘drug’ used to test dissembling. The fact that samples did not differ on physical activity, sedentary behavior or diet would suggest that there is little bias in the NEXT sample. Explanations for differences in reported substance use include: 1) the NEXT sample is indeed different from some national samples; and 2) the HBSC survey is anonymous while the NEXT survey is confidential but not anonymous – youth may have been more willing to report substance use in the HBSC survey, including fictional drug use.
The failure to find differences on key obesogenic behaviors and the likelihood that lower reported substance use in the same cohort may have been due to the lack of anonymity suggests that there is little or no bias in the NEXT sample. Furthermore, because subsequent NEXT surveys are completed during the same time of year and there is no evidence that the concerns about anonymity will differentially affect subsequent responses, the cohort should be more than adequate for addressing the primary questions about the development of obesogenic behaviors, dating violence, substance use and driving.
4. Comparisons of Respondents by Successive Levels of Recruitment Effort
As per the guideline 4-4-2D by NCES, we compared schools that agreed to participate in the survey after the first contacts with those that agreed after several attempts or those that refuse first and then later agree. Estimates of student level characteristics were computed based on each successive wave of participating schools (i.e., adding respondents in the order of level of effort used to recruit the school) and the sampling weights based on probabilities of selection. If the estimates based on the initial sample and successively larger samples have a trend of either increasing or decreasing, this would indicate bias because of nonresponse. For example, if the percentage of students who are obese increased significantly as the number of responding schools increased, this might indicate that we are underestimating the percent of students who are obese. These analyses revealed no significant differences on the primary outcomes (e.g., body mass index, substance use) between students in schools that agreed to participate in 1 or 2 contacts versus 3 to 5 contacts versus >5 contacts before agreeing to participate in the study.
This is the seventh wave of a longitudinal study. All data will be collected using previously developed procedures that have demonstrated ability to yield high quality data.
DIPHR Statisticians: Statistical consultation
Danping Liu, PhD, Investigator
Biostatistics and Biobehavior Branch
Division of Intramural Population Health Research
301-443-7041
Paul Albert, PhD, Chief and Investigator
Biostatistics and Biobehavior Branch
Division of Intramural Population Health Research
301-496-5582
Contractor : Data collection and management
Mary Ann D’Elio
CDM Group, INc.
240-223-3074
Subcontractor to the CDM group: Sample design and weighting
Kadaba. P. Srinath, PhD
Abt Associates, Inc
301-347-5000
Last Minute Checklist for SSB
If attachments are referenced in the text of Supporting Statement A (SSA), please include than also in SSB.
Attachments are referenced as Attach1.file.name, Attach2.filename, etc. Filenames should be consistent in SSA and SSB.
Attachments are in individual files, either in Word or .pdf format to be included with the package.
All information collection (surveys, forms, questionnaires, telephone scripts, etc.), have OMB number and Expiry Date displayed for respondents on the upper right hand corner of the first page (OMB #: 0925-xxxx, Expiration Date: xx/xxxx
The Burden Statement is displayed on first page of the data collection instrument or the instructions and/or script or on cover sheet due to space problem.
	 
		
	
| File Type | application/msword | 
| File Title | Supporting Statement 'B' Preparation - | 
| Subject | Supporting Statement 'B' Preparation - 03/21/2011 | 
| Author | OD/USER | 
| Last Modified By | Haynie, Denise (NIH/NICHD) [E] | 
| File Modified | 2016-02-11 | 
| File Created | 2016-02-01 |