SUPPORTING STATEMENT
Revised 5/6/2014
U.S. Department of Agriculture
Economic Research Service
Rural Establishment Innovation Survey (REIS)
OMB Control No. 0536-XXXX
Part B. Collection of Information Employing Statistical Methods
Universe and Respondent Selection
For the Rural Establishment Innovation Survey (REIS), the sample will be selected from the business establishment list maintained by the Bureau of Labor Statistics as part of its Quarterly Census or Employment and Wages (QCEW) program for those state employment security departments granting approval, and from a proprietary business list frame (Dunn and Bradstreet) for states not granting approval. Forty-six states and the District of Columbia have agreed to participate, 5 states have declined.
The sample will exclude business establishments with fewer than 5 employees, establishments that are not privately owned and establishments not included in ‘tradable industries’ defined as mining, manufacturing, wholesale trade, transportation and warehousing, information, finance and insurance, professional/scientific/technical services, arts, and management of businesses.
Sampling stratification will be based on North American Industry Classification System (NAICS) code, metropolitan/nonmetropolitan location, employment size class and whether or not the state has agreed to release their QCEW list frame through BLS for production. Establishments from the same strata in participating and nonparticipating states with have identical target sampling rates. The strata table below provides cell sizes for the study population and drawn sample for the combined BLS Quarterly Census of Employment and Wages and proprietary frames.
Establishment populations by strata are provided in the table below. The full study sample will have an initial sample size of 60,000; roughly 4,000 from a proprietary sample frame will receive a telephone screening survey and the roughly 56,000 from the BLS sample will not be pre-screen due to a very low share of ineligible establishments identified in the pilot study. This is the number of businesses that could be contacted and re-contacted multiple times and by multiple ways in a mixed mode survey protocol and stay within the survey budget. The target sampling rates were initially computed by compiling population establishment total across the 9 target industries for nonmetropolitan counties and metropolitan counties:
Nonmetropolitan Sample Rate = 0.66667 x 60,000
Nonmetropolitan Establishment Total
Metropolitan Sample Rate = 0.33333 x 60000
Metropolitan Establishment Total
Examination of the establishment population data made it clear that the sample sizes for Management of Businesses (Headquarters) and Performing Arts Companies, Museums, Historical Sites, and Similar Institutions (Arts & Museums) would be insufficient to provide reliable statistics. In addition, the Finance and Insurance (Finance) establishment population is very large, particularly with respect to potentially tradable services in rural areas. Oversampling of Headquarters and Arts & Museums by a factor of 3.3 ensures reliable statistics and is offset by an undersampling of Finance establishments by a factor of 0.33.
Table 1. Population Universe by Strata for Rural Establishment Innovation Survey
| Stratum: Industry | Stratum: Geography | Stratum: Estab. Size | Estab. Population1 | Sampling Rate | Sample | 
| Mining | Nonmetro | 5-19 | 4200 | 0.2845 | 1195 | 
| Mining | Nonmetro | 20-99 | 2508 | 0.2887 | 724 | 
| Mining | Nonmetro | 100 + | 588 | 0.5884 | 346 | 
| Mining | Metro | 5-19 | 5096 | 0.0232 | 118 | 
| Mining | Metro | 20-99 | 2979 | 0.0235 | 70 | 
| Mining | Metro | 100 + | 841 | 0.0488 | 41 | 
| Manufacturing | Nonmetro | 5-19 | 15573 | 0.3178 | 4949 | 
| Manufacturing | Nonmetro | 20-99 | 10625 | 0.3163 | 3361 | 
| Manufacturing | Nonmetro | 100 + | 4953 | 0.6239 | 3090 | 
| Manufacturing | Metro | 5-19 | 75618 | 0.0245 | 1851 | 
| Manufacturing | Metro | 20-99 | 52144 | 0.0260 | 1358 | 
| Manufacturing | Metro | 100 + | 17778 | 0.0514 | 913 | 
| Wholesale Trade | Nonmetro | 5-19 | 18629 | 0.2891 | 5386 | 
| Wholesale Trade | Nonmetro | 20-99 | 5723 | 0.2939 | 1682 | 
| Wholesale Trade | Nonmetro | 100 + | 389 | 0.5707 | 222 | 
| Wholesale Trade | Metro | 5-19 | 122693 | 0.0227 | 2781 | 
| Wholesale Trade | Metro | 20-99 | 45369 | 0.0230 | 1043 | 
| Wholesale Trade | Metro | 100 + | 6429 | 0.0464 | 298 | 
| Transportation | Nonmetro | 5-19 | 10366 | 0.2933 | 3040 | 
| Transportation | Nonmetro | 20-99 | 3895 | 0.2924 | 1139 | 
| Transportation | Nonmetro | 100 + | 448 | 0.5915 | 265 | 
| Transportation | Metro | 5-19 | 37847 | 0.0230 | 869 | 
| Transportation | Metro | 20-99 | 20003 | 0.0230 | 461 | 
| Transportation | Metro | 100 + | 4632 | 0.0466 | 216 | 
| Information | Nonmetro | 5-19 | 6964 | 0.2885 | 2009 | 
| Information | Nonmetro | 20-99 | 2134 | 0.2854 | 609 | 
| Information | Nonmetro | 100 + | 144 | 0.5417 | 78 | 
| Information | Metro | 5-19 | 29635 | 0.0222 | 657 | 
| Information | Metro | 20-99 | 17247 | 0.0223 | 384 | 
| Information | Metro | 100 + | 4722 | 0.0449 | 212 | 
| Finance | Nonmetro | 5-19 | 20395 | 0.0916 | 1868 | 
| Finance | Nonmetro | 20-99 | 3334 | 0.0918 | 306 | 
| Finance | Nonmetro | 100 + | 212 | 0.1792 | 38 | 
| Finance | Metro | 5-19 | 121239 | 0.0073 | 880 | 
| Finance | Metro | 20-99 | 27559 | 0.0072 | 199 | 
Table 1. Population Universe by Strata (Cont.)
| Stratum: Industry | Stratum: Geography | Stratum: Estab. Size | Estab. Population | Target Sampling Rate | Anticipated Sample | 
| Finance | Metro | 100 + | 6437 
 | 0.0146 
 | 94 
 | 
| Prof/Sci/Tech Serv. | Nonmetro | 5-19 | 16742 | 0.2839 | 4753 | 
| Prof/Sci/Tech Serv. | Nonmetro | 20-99 | 2373 | 0.2840 | 674 | 
| Prof/Sci/Tech Serv. | Nonmetro | 100 + | 214 | 0.5748 | 123 | 
| Prof/Sci/Tech Serv. | Metro | 5-19 | 181087 | 0.0225 | 4068 | 
| Prof/Sci/Tech Serv. | Metro | 20-99 | 56302 | 0.0227 | 1279 | 
| Prof/Sci/Tech Serv. | Metro | 100 + | 9838 | 0.0453 | 446 | 
| Headquarters | Nonmetro | 5-19 | 1332 | 0.9437 | 1257 | 
| Headquarters | Nonmetro | 20-99 | 728 | 0.9451 | 688 | 
| Headquarters | Nonmetro | 100 + | 149 | 1.0000 | 149 | 
| Headquarters | Metro | 5-19 | 10530 | 0.0756 | 796 | 
| Headquarters | Metro | 20-99 | 7637 | 0.0757 | 578 | 
| Headquarters | Metro | 100 + | 3349 | 0.1514 | 507 | 
| Arts & Museums | Nonmetro | 5-19 | 921 | 0.9197 | 847 | 
| Arts & Museums | Nonmetro | 20-99 | 444 | 0.9302 | 413 | 
| Arts & Museums | Nonmetro | 100 + | 47 | 1.0000 | 47 | 
| Arts & Museums | Metro | 5-19 | 4085 | 0.0729 | 298 | 
| Arts & Museums | Metro | 20-99 | 2045 | 0.0738 | 151 | 
| Arts & Museums | Metro | 100 + | 608 | 0.1612 | 98 | 
| Totals | 
 | 
 | 1007779 
 | 0.0595 | 59924 
 | 
The relatively small cell sizes for some of the “Large” establishment strata raises the possibility of oversampling the Large establishment strata. However, the main interest in including all establishment size classes is to ensure the ability to make inferences on the tradable sector nationally. The main focus of the study is on innovation in small and medium sized establishments. Data collection during the pilot study revealed that large establishments were responding at half the rate of small and medium-sized establishments. Thus, for the main study, the large establishment strata are oversampled by a factor of 2.
The sample for the pilot study was comprised of roughly 2,600 respondents from the previous 1996 ERS Rural Manufacturing Survey and 2,874 respondents drawn from the BLS sample frame.
For the states that do not approve BLS providing sample, the Dun and Bradstreet sample (DB) is expected to be less current and of lesser quality compared to the BLS sample. For the pre-screening effort, it is anticipated that the DB sample will be updated less frequently and with less authority compared to the BLS provided sample. The screening survey is very short and since anyone answering the phone can provide this information there will be lesser limitations to responding with contact information (Attachment J). The screening survey design and full study questions for the REIS are very similar to the 1996 Rural Manufacturing Survey that was administered and validated by SESRC.
Procedures for Collecting Information
For participating establishments, REIS will be a one-time survey collection and will occur mainly in 2014. This is a voluntary government sponsored survey and will be conducted by an academic survey organization based at a Land Grant University. Establishment drawn from the proprietary sampling frame will be contacted through an initial telephone screening effort to determine if businesses are eligible (currently “in-business” and having 5 or more employees) for the study. During this contact, information will be obtained to identify a knowledgeable and appropriate respondent for the business and to collect all of this individual’s contact information (Attachment J). The results from the pilot survey demonstrated that the number of ineligibles in the BLS sample was very small and identifying a specific contact within the establishment did not significantly improve response rates. For these reasons prescreening will not be done for the BLS sample.
A letter of introduction signed by ERS Administrator Mary Bohman will be sent to the BLS sample and eligible establishments that complete the telephone prescreening from the proprietary sample (Attachment D). The purpose of this advance letter is to notify businesses about the study and why we need their participation. The second page of this letter contains a brief list of frequently asked questions regarding confidentiality, how the respondent was identified, and estimated burden for completing the survey. In addition, an advance letter from Danna Moore, the study director at SESRC is also included in the mailing that provides a web link to the survey and provides the justification for the token incentive as a gesture of reciprocity.
For the REIS, respondents will be asked to complete questionnaires in at least one of three possible survey modes (telephone, web, or mail, Attachments A, B and C). All survey instruments across modes will be carefully aligned to provide the same information and explanations of the survey. The web version of the survey is to be located on the SESRC WSU website with a specific URL. Each question screen will carry a banner with the survey title “National Survey of Business Competitiveness” and USDA ERS sponsor. The telephone survey introduction will be used by interviewers to explain the purpose and the sponsorship of the study. The mail surveys will use a cover letter to provide this information. All modes of contacting respondents will provide information on how to contact SESRC or ERS if they have questions or need clarifications about the study.
The survey methodology literature over the last decade has addressed the use of incentives as a means to improve response rates in household and person based surveys. However, there remain gaps in this literature with respect to detailed description of the establishment survey response process, the effectiveness of survey mode sequencing and how incentives interact within these processes to impact establishment survey respondents. The most important aspects of survey implementation shown to increase response rates in business surveys respectively are: 1) “Response Required By Law” message; 2) multiple contacts; and 3) cash incentives. A pilot study will use an experimental design to test various interventions on survey response that can be used to improve response. The experimental testing framework used in this study (see Table 2 and Table 3) is important because it will offer insights into how non-mandatory (voluntary) survey response is impacted by process components and strategies. There are a number of objectives to be tested: 1) alternative survey mode sequencing (telephone sequence first versus mail sequence first); 2) the effectiveness of each mode; 3) the combination of postal class and packaging (first class postage versus two day priority mail class and mail envelope packaging cardboard mailer versus brown paper envelope); and 4) early stage, later stage, and repetitive application of a small token $2 cash incentive with mail questionnaire; and 5) the timing of offering the web mode as an alternative response option for survey completion. Depending on the experimental group assignment and intervention, the business respondent will be contacted by telephone and/or by mail and will be offered one of three ways (telephone, mail, or web) to complete the survey.
These results collected within a voluntary survey environment reflect a more generalizable survey structure than those realized under mandatory government collections. We hope to capitalize on respondents’ awareness of web surveys and the offering of a choice as a means to accommodate completing the survey in a mode of their preference to determine if this is an important element of survey strategy. There is research in the household respondent survey literature that suggests offering more than one survey mode at a time can decrease survey response rather than enhance response (Millar and Dillman, 2011). This is an aspect that has not been tested in the establishment survey arena. This study specifically incorporates the idea of offering a web link at specific junctures in the contact process and then following this with email augmentation to those respondents with an email address that was gained during the telephone prescreening contact of the business.
The mode sequence selected for the full study will be contingent on findings from the pilot and the factors surrounding this decision will be fully elaborated in the pilot study assessment report submitted to OMB. The three general outcomes anticipated are statistically significant higher response rate of one mode sequence over all others, statistically significant higher response rates for two or more mode sequences over remaining mode sequences without identification of a clear dominant mode sequence, or failure to discern statistically significant differences in response rates across all mode sequences. The mode sequence selected in the first and last case would be the one with highest response rate or lowest cost, respectively. In the middle case mode sequences with statistically significant lower response rates would be abandoned and the survey would be administered by allocating an equal share of potential respondents to the remaining mode sequences. The likelihood that the pilot study will identify substantive differences between mode sequences if they exist is high: the power of detecting a difference in response rates of 0.05 between two mode sequences with a sample size of 1600 is 0.953 at the 0.01 level of significance.
For the web survey version, the website for the survey will be secure and respondents can only access the website by entering their specific project assigned identification code. It is anticipated most respondents will be able to complete the questionnaire in one session. However, business respondents will be allowed multiple reentries to the survey website if needed to complete the questionnaire in multiple sessions.
Upon receipt of completed questionnaires, SESRC will download, enter, compile, and aggregate survey responses from each survey (mode version and interventions) and analyze all survey responses. Respondents will all be addressed with the same survey questions about their business environment, activities and revenues thus providing uniform data across survey venues.
All contact materials and survey questionnaires have benefited from expert consultation (internal and external) and peer review by stakeholder groups. Cognitive interviews to test the survey questionnaire were conducted in September 2013 (Attachment F). The letters and reminders were developed in collaboration with internal and external survey methodologists.
DATA EDITING PROCEDURES
Telephone screening and telephone interviewing
Survey data for all REIS samples – landline, listed, and cell– will be collected using the same computer-assisted telephone interview (CATI) system for both screening telephone survey phase and extended full interviews collected over the telephone. While the screening interview may vary somewhat by sample, the same editing procedures will be followed for all REIS cases. In a CATI environment, the data collection and interview process is controlled using a series of computer programs to ensure consistency and quality. At SESRC WSU, the commercial CATI software used is Voxco and this software has been used more than 15 years. SESRC has more than 25 years experience with CATI software. For the telephone survey administration, the CATI system programming determines which questions are asked based on business characteristics, composition, respondent characteristics, or preceding answers, and the order in which the questions are presented to interviewers. The system also presents the response options that are available for recording answers. CATI range and logic edits do much to help ensure the integrity of the data during the collection process by telephone. This editing at the time of the interview greatly reduces the need for post-interview editing and allows most questionable entries to be reviewed in real time with the respondent as part of the collection process. Although the CATI system virtually eliminates out-of-range responses and many other anomalies, some consistency and edit issues may arise. For example, interviewers may note concerns or problems that must be handled by data analysts or preparation staff after the interview is complete. Updating activities require that both manual and machine editing procedures be developed to correct interviewer, respondent, and CATI program errors and to check that updates made by data management staff were input correctly. Because data editing may result in changes to the survey data, specific quality control procedures will be implemented. REIS survey data will be carefully examined and edited before delivering final data files to ERS USDA.
Additional data quality assurance occurs through survey supervision of interviewer performance. Quality checking is implemented by survey monitors and survey supervisors that listen and visually screen check coding of live interview answering between interviewers and respondents while they are being conducted. Any problems in question delivery, interview performance, or entry will be noted and the interviewer will be notified of performance problems. SESRC has a performance management scoring system for interviewers. This process includes meeting with each interviewer to discuss performance, review outcomes, and plan for improvement. Interviewers are routinely monitored with a goal of once a week during calling, within the first few days of calling on any given project, and to meet contractual agreements. Routinely, as part of contractual agreements, SESRC monitors between 5 % and 10% of all interviews for quality. If needed an interviewer will be retrained and systematically monitored for improvement. If an interviewer is not capable of meeting performance objectives they are terminated from calling. If an error in the data recorded by an interviewer is detected a data correction will be made to the case. If the errors detected are severe then all cases by a given problematic interviewer will be reviewed for completeness and accuracy. If any cases are suspect, then cases will be recalled and/or particular answers verified with the business.
One critical step during the data collection process for telephone interviews includes a process whereby at the completion of an interview, each interviewer answers a set of questions about the interview. If the interviewer detects concerns with quality such as compromised respondent ability, extreme distractions, or other issues these are noted at this time. Survey supervisors routinely review these results to detect poor or suspect interviews. Quality control procedures associated with data corrections may also involve limiting the number of staff who make updates, using the CATI specifications to resolve issues in complex questionnaire sections, carefully checking updates, and performing computer runs to identify inconsistencies or illogical patterns in the data associated with the current questionnaire.
The data editing procedures for REIS will consist of four main tasks: (1) managing and resolving problem cases (error checking), (2) reading and using interviewer comments to make data updates, (3) coding questions with open ended text strings (i.e., “other, specify” responses), (4) verifying data editing updates, (5) survey supervisor review of interviewer response outcomes on interviews. The final step will be to convert the edited data from the CATI system to the SAS data delivery files.
Mail returns, review, hand coding, and hand data entry
For completed mail questionnaires, the data entry process consists of three main stages: 1) initial data entry by one clerical staff, 2) verification (second pass data entry) performed by a different clerical staff, and 3) the final validation step is to account for all questionnaires by ID number and ensure all observations have been entered, verified and to correct any errors that may have occurred during this process. The data entry program consists of a computerized online system that prompts clerical personnel for valid responses to every question in the survey. The data entry program has the same features and operational features as the CATI questionnaire software for range checks and question branching/skipping logic.
Prior to the initial data entry, data editing and data cleaning will occur once a large number of returned completed questionnaire are received and a coding manual has been developed. During this initial phase several hundred paper questionnaires and question answers will be reviewed for: 1) respondents’ adherence to following question branching and skip instruction patterns, 2) marks and comments written in the margins or on questions; 3) completeness and open-ended numeric answers with anomalies; 4) straight lining on question banks; 5) selective checking in question banks; and any other types of errors indicating the need for data cleaning and data edits. Once a large number of paper questionnaires have been reviewed a coding manual will be drafted and reviewed with principal investigators and researchers at ERS prior to hand coding and data entry. Cleaning decisions will be documented in the coding manual and instructions for specific questions and problems developed for coders. Coding will be performed by a limited numbers of coder staff to ensure accuracy and consistency of coding. A data manager/analyst will review coding. Once questionnaires are coded data entry will be performed by data entry staff.
Web surveys
In the web survey environment, all questions allow voluntary responses and there is no insistence built into the web questionnaire program that requires an answer to maintain progression through the survey by the respondent. This also meets the best practices for human subject’s research. Allowing the respondent to “not answer a question” also prevents abandonment of the interview as it reduces respondent’s frustration if they are unwilling to answer a given question. In order to reduce instances of questions being skipped over without answering special screen prompts will be programmed and shown that will prompt for an answer. The goal of this functionality is to persuade the respondent to answer the question by describing the importance of the response or the purpose of the question. The types of questions that most often experience item non-response are open ended numeric questions. These questions will be carefully reviewed and pretested to determine if they require specific instructions for inclusions or exclusions. If it is found during the early stages of the study that respondents are skipping particular questions, these particular questions will be reviewed for sensitivity, wording and or comprehension issues. If needed the question will be changed or information added such as an instruction, definition, or a screen prompt.
Estimation procedures
The analytical approach for addressing the study’s central research questions are discussed below:
What percentage of rural establishments in tradable industries introduced product, process or practice innovations in the previous 3 years?
What percentage of self-reported innovative establishments also demonstrates behaviors consistent with substantive innovation?
How do self-reported and ostensibly substantive innovation rates differ by urban/rural location, industry and establishment age?
What establishment and community characteristics are associated with self-reported and ostensibly substantive innovation?
Do ostensibly substantive innovators demonstrate faster rates of employment growth or higher survival rates than claimed innovators and non-innovators?
Questions 1-3 will be addressed using descriptive analysis. Questions 4-5 will be addressed using multivariate regression techniques. In addition, questions 2-5 will require a method for classifying innovative establishments as either claimed innovators or substantive innovators.
To address the first question, the percentage of rural respondents that report product, process or practice innovations will incorporate information from the complex sample design to the entire sample to produce valid estimates of mean and variance and pseudo-maximum likelihood methods for generating population weighted frequency tables. Within the rural stratum, comparison of innovation rates across settlement types ranging from micropolitan counties to entirely rural counties will use domain analysis to take into account the randomness of the sample size across settlement types. As the first quantitative assessment of rural innovation in the U.S., valid variance estimation will be critical in describing the phenomenon across the rural continuum.
However, past efforts examining measures of self-reported innovation in the European Union have identified a problem of over-reporting (North and Smallbone 2000). Lacking the resources to qualitatively assess the innovativeness of each respondent, the analysis will utilize auxiliary information on various establishment characteristics believed to be strongly associated with substantive innovation. For example, a question designed to correct for social desirability bias will ask about failed innovations at the establishment. Comparing the percentage of claimed innovators that acknowledge failed innovations to the percentage of claimed innovators that do not acknowledge failed innovations will provide one measure of possible over-reporting. Other characteristics, such as safeguards for protecting intellectual property or practices that facilitate data-driven decision-making, may also differentiate substantive innovators from claimed innovators. Variation in these observed variables may reflect variation in an unobserved factor related to substantive innovation.
Mixture models such as latent class models are well-suited to the problem of describing and analyzing observations hypothesized to come from different unobserved subgroups in the population. The two conceptual classes of most interest are substantive innovators and nominal innovators with non-innovators identified as respondents opting out of the innovation questions. However, the data could support four subgroups in the population with a subgroup of advanced non-innovators being identified; i.e., respondents that did not introduce new or significantly improved products but did utilize data-driven decision-making tools or possessed intellectual property worth protecting. Recent research examining the use of latent class models with complex survey design data (Patterson, et al. 2002; Vermunt 2007; Wedel, et al. 1998) has made it possible to apply these tools when the assumption of simple random sampling is violated.
The validity of the latent class structure will be assessed in the short-run by comparing the industry distribution of substantive innovators with known innovation intensive industries. If ostensible substantive innovators are much more likely to be in innovation intensive industries, then this would provide prima facie evidence of the validity of the class structure. In the long-run, linking REIS to the Business Employment Dynamics data at BLS (see below) will provide longitudinal performance data to compare substantive with nominal innovators that would provide outcome based evidence of the validity of the class structure.
Questions 2 and 3 will apply the relevant innovator classification to all respondents and then estimate mean and variance of percentages as was done for the self-reported innovation variable in Question 1. Domain analysis will be used when estimating parameters across groups such as settlement type, industry or establishment age.
Question 4 will be addressed using a binary response model to investigate the relationship between innovative activity and establishment and community characteristics. Nonlinear logit or probit models able to incorporate complex survey design information are available in statistical software packages allowing unbiased estimation of parameter variance. Domain analysis will allow investigating similarities or differences with respect to innovative activity across settlement types or industry groups providing critical information for designing rural innovation policy.
The analysis will also provide an assessment of the value of the ostensibly substantive innovation classification. It is anticipated that the explanatory power of the substantive innovation model will be significantly higher than the self-reported innovation model since the latter is thought to include establishments over-reporting innovative activity due to social desirability bias. Alternatively, if the substantive innovation model does not demonstrate better explanatory power then it is less likely that the observed characteristics thought to be related to substantive innovation are correlated with the hypothesized unobserved factor.
Questions 1-4 will be addressed as soon as cleaned data from the REIS becomes available. Addressing Question 5 will not be possible until several years later when a sufficient amount of quarterly employment data is available to support survival analysis. It is anticipated that the REIS will be linked with the Business Employment Dynamics data at the Bureau of Labor Statistics that will allow examining the medium and long-term effects of innovative activity on establishment survival and employment growth.
To examine employment growth we will use a two-stage model that incorporates information from an establishment exit model to correct for the nonrandom selection of surviving establishments. This model has been widely adopted in manufacturing studies (Doms et al., 1995; Jarmin, 1999, Acs 2002). The two stages are specified as:
(1)	 
(2)	 ,
,
where is the parameter vector from the exit equation,
is the parameter vector from the exit equation, 
 is the
parameter vector from the growth equation,
is the
parameter vector from the growth equation, 
 is the
covariance between the disturbance terms of the two equations and
is the
covariance between the disturbance terms of the two equations and is
the inverse Mills ratio—derived from the first stage regression
and used as an instrument to control for selection bias in the second
stage.  We estimate equation (1) using standard limited
dependent variable techniques. We identify equation (2) via
the nonlinearity of the Mills ratio as do Evans, 1987, and Doms et
al., 1995.
is
the inverse Mills ratio—derived from the first stage regression
and used as an instrument to control for selection bias in the second
stage.  We estimate equation (1) using standard limited
dependent variable techniques. We identify equation (2) via
the nonlinearity of the Mills ratio as do Evans, 1987, and Doms et
al., 1995.  
Establishment survival will be assessed using a proportional hazard specification that is widely-used and designed to account for the censored nature of the data. Our dependent variable, whether an establishment is continuing or has exited, is reported quarterly for each establishment, is modeled as:
(3)	 ,
,
where
i= 1, …, N establishments, t=1, …, T quarters during
the specified period and 
 is 0,
1.
is 0,
1. 
The
quarterly dependent variables are regarded as a panel of binary
variables; each quarter, for each establishment, there is an
indicator variable for whether or not the establishment has any
employees. Each establishment is viewed as contributing several
observations to a larger logit likelihood function, the product of
each of the (3) logit models:
(4)	 
Treating the data as a panel data set facilitates estimating flexible hazard functions because the complicated likelihood maximization problem is replaced with a familiar logit estimation problem (equation 4), which can be estimated with standard software.
Integrating complex survey design information into the analysis required to address Question 5 is now possible using the svyset functionality in Stata 11. Both 2-stage selection models and proportional hazard models can now be estimated using the svy command that incorporates survey design information and allows performing domain analysis on selected subpopulations to produce valid variance estimates.
Degree of Accuracy Needed
Comparing
innovation rates between urban and rural establishments is a primary
focus of the study.  The most challenging aspect of this question
with respect to sample size is comparing conventional measures of
innovative or inventive activity such as patent application rates as
these tend to be rare in both urban and rural environments. 
Unfortunately we have not been able to locate previous studies that
have examined patent application rates at the establishment level. 
However, it is possible to combine information from different sources
to arrive at a reasonable estimate of differences in patent
application rates we would expect to observe.  We would want a sample
large enough to detect a significance difference between these
expected application rates.
We combine survey results from
the 1996 Rural Manufacturing Survey with the 2008 BRDIS results to
arrive at an expected patent application rate for manufacturing
establishments.  We then use European data on differences in patent
application rates between manufacturing and services to estimate
patent application rates for our entire sample.  We incorporate
information from both differences in rural and urban patent
application rates and differences in the mix of manufacturing and
services to arrive at expected patent application rates for urban and
rural areas among all tradable sectors.  
The results from
the 2008 BRDIS suggest that 1 in 5 firms with R&D units applied
for at least one patent.  Findings from the Rural Manufacturing
Survey demonstrate that 30% of urban establishments and 22% of rural
establishments had an R&D unit.  For manufacturing we would
expect that 6% of urban establishments applied for a patent compared
with 4.4% of rural establishments.  Given a likely rural
manufacturing sample of 4,907 and urban manufacturing sample of
1,738, and assuming a 60% response rate, the power of the test for
two proportions fails to make the threshold of 0.8 of a powerful test
at 0.655.  This example is instructive because it is the one industry
for which we have the best information and also where the events are
anticipated to be less rare.  However, the low power is not a problem
for the study objectives of comparing rural and urban innovation
rates for the tradable sector.
Pearson Chi-square Test for Two Proportions
| Fixed Scenario Elements | |
| Distribution | Asymptotic normal | 
| Method | Normal approximation | 
| Number of Sides | 1 | 
| Group 1 Proportion | 0.06 | 
| Group 2 Proportion | 0.044 | 
| Group 1 Sample Size | 1043 | 
| Group 2 Sample Size | 2944 | 
| Null Proportion Difference | 0 | 
| Alpha | 0.05 | 
| Computed Power | 
| Power | 
| 0.655 | 
To apply the power analysis to the entire sample we use patent
application data from Europe to arrive at a reasonable ratio of
services to manufacturing patent application rates.  We then apply
this ratio to our estimates of rural and urban US manufacturing
patent application rates to derive the services patent application
rates.  The assumption is that the ratio between manufacturing and
services application rates is the same in both entities without
requiring the more restrictive assumption that patent application
rates in Europe and the US are equal.   
The services
patent application rate in Europe is 41.5% the manufacturing patent
application rate.  Thus, in the US we estimate that the rural
services patent application rate would be 0.01826 (or 41.5% of 4.4%)
and the urban services patent application rate would be 0.025 (or
41.5% of 6%).  The fact that manufacturing makes up a larger share of
the tradable sector in rural areas reduces the expected difference
between rural and urban patent application rates overall.  For the
urban tradable sector overall the patent application rate is expected
to be 0.03183 and 0.024575 for the rural tradable sector.  Assuming a
60% response rate an initial sample size of 30,000 will produce a
test of adequate power of 0.872.
The POWER Procedure
Pearson Chi-square Test for Two Proportions
| Fixed Scenario Elements | |
| Distribution | Asymptotic normal | 
| Method | Normal approximation | 
| Number of Sides | 1 | 
| Group 1 Proportion | 0.03183 | 
| Group 2 Proportion | 0.024575 | 
| Group 1 Sample Size | 6000 | 
| Group 2 Sample Size | 12000 | 
| Null Proportion Difference | 0 | 
| Alpha | 0.05 | 
| Computed Power | 
| Power | 
| 0.872 | 
By
positing the magnitude of innovation events we expect to be rare in
the sample we are able to demonstrate that an initial sample size of
30,000 will be sufficient for detecting expected difference between
rural and urban establishments.  
Methods to Maximize Response
Efforts to maximize response and still remain within the survey budget will use token cash incentives ($2), higher class postage and distinctive mailers in the mail modes of contact. For all modes and mode sequences this study will utilize multiple contacts as a best practice to reach the respondent and achieve response. The use of mixed mode design, with a telephone sequence with 20 call attempts and the use of a mail sequence are also know strategies to increase survey response. In addition, in the mailing portion of the study an additional special contact will be mailed to sampled businesses that refused during telephone contact or by mail. This letter will be specially designed to appeal and persuade based on known psychological messaging to emphasize the importance of the survey request.
Tests of Procedures or Measures
After the initial design phase, the telephone version of the questionnaire was tested by internal SESRC and ERS expert review, mock interviews over the telephone between SESRC and ERS USDA staff. The CATI telephone instrument was tested with one ineligible known innovative business from the local WA State population to assess: questionnaire length, usability, workability, question understanding, and to behavior code respondent clarifications.
After the initial testing, mail and telephone versions of the survey were tested using cognitive interviewing protocols with 6 establishments (see Attachment F for the detailed report). A special focus of the cognitive interviewing was auxiliary questions that will be used to differentiate substantive from nominal innovators. All of the auxiliary questions were easily understood and answered by the six respondents. The cognitive interviewing was also invaluable for assessing how industries outside of manufacturing would respond to questions and resulted in significant modifications to the survey instruments. Finally, the cognitive interviewing helped identify opportunities for decreasing respondent burden (e.g., allowing firms with no debt to avoid questions on borrowing).
The questionnaires will undergo comprehensive testing and usability testing by internal SESRC experts, supervisors, and interviewers during pretesting with actual respondents in a pilot phase of this study after OMB clearance. Usability pretesting during the pilot will include monitoring interviews to observing participants’ probes and clarification behaviors, noting difficulties and comments, and conducting post-testing interviews with interviewers to gain qualitative feedback about potential confusions. In addition, quantitative measures will also be gathered, including time to complete the survey, evaluating paradata and navigation patterns from the web questionnaire.
The pilot study will also be used to assess item nonresponse along with problems of very limited response variation. A focus of this analysis will be to identify systematic nonresponse within particular industry or establishment size strata. With the proposed sample size of 4000 only two of the 54 strata have empty cells and four strata have an initial sample of two. This coverage should be sufficient to identify significant nonresponse problems prior to the full study.
This study includes a pilot study that has experimental components that are designed to evaluate impacts on less cooperative respondents that require more contacting to gain cooperation. The study tests the impact of survey mode sequencing (mail, telephone, and web) and interactions with other interventions as shown in Tables 2 and Table 3. The pilot sample frame is randomly assigned to experimental groups 1 to 5. Each group varies on sequence and timing of treatments. Two-fifths of the sample is assigned to first receive the telephone sequence of survey contacts which is then followed by questionnaire mailings for main data collection. Three-fifths of the sample frame will be contacted first by mailings with questionnaires followed by telephone follow-ups for survey completion. Next the groups vary on when (which specific day and mailing) a web link is offered to do the survey over the internet. For those respondents with an email address an email contact will follow that is designed to augment the postal letter contact as it offers a web link that can be clicked on to go directly to the survey. Also the interventions of $2 cash incentives and the use of higher class two day priority mail compared to first class postage and the number of applications will be used at varying phases in the multi-contact sequence. The overall goal is to evaluate whether any of these interventions comparatively improve response propensity and/or bring in more of the “hard to reach” establishment respondents.
Table 2 shows the overall tests for each group and inclusion of specific treatments. Table 3 shows each group and the specific details of implementation by days across data collection. Early responders from the screening portion of the survey will not be allowed in the pilot so that all respondents experience the experimental treatments. Early responders will be encouraged in the full study.
 
	2011-2012 Establishment
	Listed Sample Frame 1996
	Mnf.                      D&B 
	
 
	3/5 Sample will have a Mail start Pre-notification
		letter 
		 1st
		mail questionnaire w/ cover letter Postcard
		reminder/thank you all respondents 2nd
		mail questionnaire w/ cover letter (exp.
			Random assignment to variations on postage, packaging, cash
			incentive) 
	8 weeks -SWITCH MODE to Telephone 1-10
	Telephone contacts to non-responders Special
		Refusal mailing (Attachment K) 	 
	2/5 Sample will have a Telephone Start Pre-notification
		letter 1-10
		telephone survey contacts 
		 
	 8
	weeks SWITCH MODE to Mail 1st
		Mail qstn to non-respondents (This would be a Special refusal
		mailing to telephone refusals) Postcard
		reminder /thank you to all respondents 2nd
		Mail qstn to non-respondents (exp.
			Random assignment variations on postage, packaging, cash
			incentive) Special
		refusal mailing to telephone refusers (Attachment K). 
	Experiment
	Split Assignment of sample Tel
	start vs. Mail start 
 
 
		
			
	
		
	
		
	
		
			
	
 
 
 
 
 
	Screening telephone contact (1-5 attempts) 
	Screening telephone contact (1-5 attempts) 
 
	
 
 
Table 2. REIS Pilot Study Experimental Design and Stimuli
| 
 | Tel Pre-screen | Pren. letter | Pren. letter has web link | Mode Sequence test | Web link timing | Web link day | When web link is offered | Web link times | Email Augm | $2 Incentive yes/no | Incentive times | Incentive day(s) | Incentive timing | Priority mail | Priority mail timing | 
| 
 | 
 | 
 | 
 | 
 | 
 | 
 | 
 | 
 | 
 | 
 | 
 | 
 | 
 | 
 | 
 | 
| G1 | Yes | Yes | No | Mail first | Early | day 7 | 1st mail qstn and after | 3 | day 14 | Yes | 2x | day 7 & day 35 | early | 1x | late day 35 2nd qstn | 
| G2 | Yes | Yes | No | Tele first | Late | day 42 | 1st mail qstn and after | 3 | day 49 | Yes | 2x | day 42 & day 56 | late | 1x | late day 56 | 
| G3 | Yes | Yes | Yes | Mail first | Very early | Day 1 | Advance letter and all mailings after | 4 | day 7 | Yes | 2x | day 1 & day 28 | very early | 1x | day 28 1st qstn | 
| G4 | Yes | Yes | No | Mail first | Early | day 7 | 1stn mail qstn and after | 4 | day 7 | Yes | 2x | day 7 & day 35 | early | 2x | day 7 1st qstn and day 35 2nd qst | 
| G5 | Yes | Yes | No | Tele first | Late | Day 42 | 1st qst mail qstn and after | 4 | day 49 | No | None | None | None | None | None | 
Table 3. REIS Pilot Study Experimental design specific interventions and details of implementation procedures across data collection.
| Group | Exp. Design | Sample size | Tel Prescreen | Phase 1 | Phase 2 | Phase 3 | Phase 4 | Phase 5 | Phase 6 | Phase 7 | Phase 8 | Phase 9 | Phase 10 | Phase 11 | 
| 
 | 
 | 
 | 4 weeks | Day 1 | Day 7 | Day 14 | Day 21 | Day 28 | Day 35 | Day 42 | Day 49 | Day 56 | Day 63 | Day 70-77 | 
| 1 | Mail First | 800 | tel. prescrn | Advance1 letter NO Web link | 1st Qstn Web link $2 First class | Email Augm | 
 | Postcard thank you reminder | 2nd Qstn We blink $2 Priority Mail | Tel 1-2 | Tel 3-4 | Tel 5-6 | Tel 7-8 | Tel 9-10 | 
| 2 | Telephone First | 800 | tel prescrn | Advance letter NO Web link | Tel 1-2 | Tel 3-4 | Tel 5-6 | Tel 7-8 | Tel 9-10 | 1st Qstn Web link $2 First class | Email Augm | Postcard thank you reminder | 2nd Qstn We blink $2 Priority Mail | Refusal mailing | 
| 3 | Early Web Push Mail first | 800 | tel. prescrn | Advance letter Web link $2 | Email Augm | 
 | paper follow-up reminder letter | 1st Qstn We blink $2 Priority mail | Postcard reminder | Tel 1-2 | Tel 3-4 | Tel 5-6 | Tel 7-8 | Tel 9-10 | 
| 4 | All stimulus Mail 1st Qstn | 800 | tel. prescrn | Advance letter NO Web link | 1st Qstn We blink $2 Priority mail | Email Augm | 
				 | Postcard reminder | 2nd Qstn We blink $2 Priority Mail | Tel 1-2 | Tel 3-4 | Tel 5-6 | Tel 7-8 | Tel 9-10 | 
| 5 | Control Tel 1st No cash First class only | 800 | tel prescrn | Advance letter NO Web link | Tel 1-2 | Tel 3-4 | Tel 5-6 | Tel 7-8 | Tel 9-10 | 1st Qstn NO Web No Cash First class | Email Augm | Postcard Thank you reminder | 2nd Qstn NO We blink NoCash First Class | Refusal mailing | 
1 All advance contacts will have an enclosure from the ERS Administrator Mary Bohman.
Contact(s) for Statistical Aspects and Data Collection
For questions on statistical methods described above, please contact
Timothy R. Wojan
Regional Economist
Farm and Rural Business Branch
Economic Research Service, USDA
355 E Street SW
Washington, DC 20024
Tel. 202-694-5419
twojan@ers.usda.gov
For questions on the data collection described above, please contact:
Danna L. Moore
Social and Economic Sciences Research Center
Washington State University
Pullman WA 99164-4014
Tel. 509-335-1117
moored@wsu.edu
Attachments
Attachment A Draft Rural Establishment Innovation Survey (sent out as National
Survey of Business Competitiveness)
Attachment B Final CATI Script
Attachment C Screen shots of the Rural Establishment Innovation Survey Internet Application
Attachment D Draft Rural Establishment Innovation Survey Letters
Attachment F Cognitive interview Report 12-051: National Survey of Business Competitiveness
Attachment J Pre-screening Telephone Script
Attachment K Mail Short Form for Telephone Refusals
Attachment Not Referenced in Supporting Statement
Attachment G ERS Response to NASS Comments
References
Acs, Z. 2002. Innovations and the growth of cities. Northampton, MA: Edward Elgar.
Doms, M., Dunne, T. and Roberts, M.J.. 1995. “The role of technology use in the survival and growth of manufacturing plants,” International Journal of Industrial Organization. 13: 523-542.
Evans, D.S. 1987. “The relationship between firm growth, size, and age: estimates for 100 manufacturing industries,” The Journal of Industrial Economics. 35(4):567-581.
Hsieh, F.Y. 1989. “Sample size tables for logistic regression,” Statistics in Medicine 8:795-802.
Jarmin, R.S. 1999. “Government technical assistance programs and plant survival: the role of plant ownership type,” CES Discussion Paper 99-2 February.
Millar, M.M. and Dillman, D.A. 2011. “Improving response rates to web and mixed-mode surveys,” Public Opinion Quarterly 75(2):249-269.
North, D. and Smallbone, D. 2000. “The innovativeness and growth of rural SMEs during the 1990s,” Regional Studies 34(2):145-157.
Patterson, B., Dayton, C.M., and Graubard, B.I. 2002. “Latent Class Analysis of Complex Survey Data: Application to Dietary Data,” Journal of the American Statistical Association 97(459): 721-741.
Vermunt, J.K. 2007. “Latent Class Analysis with Sampling Weights: A Maximum Likelihood Approach,” Sociological Methods and Research 36(1):87-111.
Wedel, M., ter Hofstede, F. and Steenkamp, J.-B.E.M. 1998. “Mixture Model Analysis of Complex Samples,” Journal of Classification 15(5):225-244.
1 Combination of Quarterly Census of Employment and Wages (2013Q2) and proprietary business registry from SSI for states not available through QCEW. .
	 
		
	
| File Type | application/vnd.openxmlformats-officedocument.wordprocessingml.document | 
| File Title | SUPPORTING STATEMENT | 
| Author | love0313 | 
| File Modified | 0000-00-00 | 
| File Created | 2021-01-27 |