Download:
pdf |
pdfNational Program of Cancer Registries
Cancer Surveillance System (NPCR-CSS)
2021 Data Release Policy
Diagnosis Years 1995–2020
___________________________
Policy Revised August 2021
Cancer Surveillance Branch
Division of Cancer Prevention and Control
NCCDPHP, CDC
4770 Buford Hwy, N.E., Mailstop S107-4
Atlanta, GA 30341-3717
E-mail: uscsdata@cdc.gov (specify “NPCR-CSS” in subject line)
TABLE OF CONTENTS
I.
INTRODUCTION .................................................................................................................. 1
Summary of Changes .......................................................................................................... 1
II. OVERVIEW OF DATA ......................................................................................................... 2
III.
DATA RELEASE ACTIVITIES ........................................................................................ 3
A. Public Web-based Query Systems ...................................................................................... 3
1.
U. S. Cancer Statistics Data Visualizations Tool ......................................................... 4
2.
CDC WONDER ........................................................................................................... 5
3.
Federal Partners Web-Based Systems .......................................................................... 5
a)
Age-adjusted rates only............................................................................................. 5
b)
Age-adjusted and crude rates .................................................................................... 5
4.
Environmental Public Health Tracking Network Program .......................................... 5
a)
Tracking Program Unsmoothed Rates. ..................................................................... 6
b)
Tracking Program Smoothed Rates. ......................................................................... 6
c)
Tracking Program National Portal to State Portal .................................................... 6
5.
Indian Health Services………………………………………………………………...6
B. Data Release to Federal and Trusted Partners .................................................................... 7
1.
American Cancer Society ............................................................................................. 7
2.
Central Brain Tumor Registry of the United States (CBTRUS)……………………...7
International Association of Cancer Registries (IACR) ............................................... 8
3.
a)
CI5............................................................................................................................. 8
b)
CI5plus ...................................................................................................................... 8
c)
IICC .......................................................................................................................... 8
CONCORD................................................................................................................... 8
4.
5.
Agency for Healthcare Research and Quality (AHRQ)………………………………9
C. Analytic datasets ................................................................................................................. 9
1.
NPCR/SEER USCS Incidence Analytic Data ............................................................ 10
a)
NPCR Internal Survival Dataset ............................................................................. 10
b)
NPCR/SEER Survival Dataset................................................................................ 10
c)
NPCR Internal Prevalence Dataset ......................................................................... 10
d)
NPCR/SEER USCS Delay Adjusted Dataset ......................................................... 10
NPCR/SEER USCS Incidence and Survival Public-Use Research Dataset............... 11
2.
3.
Restricted-Access Research Dataset (RDC) ............................................................... 11
D. Data Release Under Controlled Conditions ...................................................................... 12
E. Emergency and Provisional Data Releases ....................................................................... 13
IV.
PROTECTION OF DATA ............................................................................................... 13
A. Assurance of Confidentiality ............................................................................................ 13
B. Suppression of Rates and Counts...................................................................................... 13
C. Public Release Disclosure Statement ................................................................................ 14
D. Freedom of Information Act (FOIA) Data Requests ........................................................ 14
E. CDC External Data Requests ............................................................................................ 15
V. REFERENCES ..................................................................................................................... 17
A.
NPCR-CSS 2021 Data Release Policy
August 2021
1995–2020 Diagnosis Years
i
TABLE 1
Comparison of NPCR-CSS Data………………..…………………..………………....19
APPENDIX A
State and Metro Area Cancer Registries………..………………………………….23
APPENDIX B
NPCR-CSS Overview of Data Security…………………………….……...............25
APPENDIX C
Data Items for CBTRUS Dataset………………………………………………..…25
APPENDIX D
NPCR/SEER USCS Analytic Data Use Agreement…….……………………....…29
APPENDIX E
CDC Nondisclosure Agreement…………………………………………………....32
APPENDIX F
Data Items for NPCR/SEER USCS Internal Analytic Dataset……….……………44
APPENDIX G
Data Items for NPCR Internal Survival Dataset………………...…….…………...46
APPENDIX H
NPCR-CSS 308(d) Assurance of Confidentiality……………..……………….......48
APPENDIX I
NPCR-CSS 308(d) Assurance of Confidentiality FAQ….………………………...50
APPENDIX J
Data Items for NPCR/SEER USCS Incidence Public Use Dataset………….….…53
APPENDIX K
NPCR Research Data Use Agreement…………….………………………….……55
APPENDIX L
NPCR Data at the NCHS RDC Q&A………………………………………….…..58
APPENDIX M
Data items for Restricted Access Research Dataset…………………………….…63
APPENDIX N
NPCR-CSS Levels of Data Access……………….…………………………….…65
APPENDIX O
Data items for NPCR/SEER USCS Delay-Adjusted Database…………………...67
APPENDIX P
Data items for NPCR Prevalence Database……………………………………….68
NPCR-CSS 2021 Data Release Policy
August 2021
1995–2020 Diagnosis Years
ii
National Program of Cancer Registries
Cancer Surveillance System
2021 Data Release Policy
August 2021
I.
INTRODUCTION
This document describes the format and content of data that the Centers for Disease Control and
Prevention’s National Program of Cancer Registries (NPCR) Cancer Surveillance System (CSS)
releases or shares. This multi-year policy updates the July 2020 NPCR-CSS Data Release Policy.
This policy applies to data submitted to the Centers for Disease Control and Prevention (CDC)
for the 2021 NPCR-CSS data submission and for all future data submissions until a new policy is
provided.
The NPCR-CSS Privacy Steward, as authorized by the Chief of the Cancer Surveillance Branch,
clears all releases of state data, ensuring that the data are released according to the terms of the
NPCR-CSS Data Release Policy.
It is possible that, in future years, data release practices or the content and format of released data
may vary from those described in these guidelines. Such changes may occur as a result of
improvements in the quality of the data, changes in information technology, and evolving data
needs. However, if such variations occur, the data release practices will provide comparable
protection (or more protection) for patient confidentiality to what is described in this policy. If it
is anticipated that any data will be released with less protection (as determined by the NPCRCSS Privacy Steward) for patient confidentiality than is described in this policy, NPCR central
cancer registries will be notified and have ample time to respond before the data are released.
This policy is reviewed annually by the NPCR-CSS Privacy Steward and other appropriate CDC
staff members to determine whether revisions are needed.
Summary of Changes
• Updated description of NPCR-CSS IRB designation. Under the Common Rule the project
is deemed to be a non-research public health surveillance project and annual IRB review
is no longer necessary, page 3
• Updated description of the USCS Data Visualizations tool with software used to build the
tool and new data to be displayed: Stage at Diagnosis and Survival by Stage, page 1
• Survival data is no longer published in the CDC WONDER tool. WONDER descriptions
were updated on page 5.
• Added a description of the new NPCR/SEER Survival Dataset, page 10.
• Added new step of obtaining SEER Research Plus access before accessing USCS
databases on pages 11 and 13.
• Clarified the reviewer process for the Restricted-Access Research Dataset, page 13.
• Aligned wording in the Suppression of Rates and Counts section to Table 1, to indicate
that the suppression rule of fewer than 6 applies to restricted access data, page 16
• Updated diagnosis years available in Table 1.
NPCR-CSS 2021 Data Release Policy
August 2021
1995–2020 Diagnosis Years
1
•
•
II.
Updated data item lists for NPCR/SEER USCS Incidence Analytic Dataset (Appendix F),
NPCR Internal Survival Dataset (Appendix G), NPCR/SEER USCS Incidence PublicUse Research Dataset (Appendix J)
Updated text in Appendix H – NPCR-CSS 308(d) Assurance of Confidentiality
Statement to match the currently approved Assurance of Confidentiality project
description.
OVERVIEW OF DATA
In 1992 Congress established NPCR by enacting the Cancer Registries Amendment Act, Public
Law 102-515.4 The law authorized CDC to provide funds and technical assistance to states and
territories to improve or enhance existing cancer registries and to plan for and implement
population-based central cancer registries where they did not exist. NPCR’s purpose is to assure
the availability of more complete local, state, regional, and national cancer incidence data for the
planning and evaluation of cancer control interventions and for research. NPCR adopted
reporting requirements and definitions consistent with the National Cancer Institute’s (NCI)
Surveillance, Epidemiology, and End Results Program (SEER);11,12 required the use of uniform
data items, codes, and record layouts as defined by the consensus of members of the North
American Association of Central Cancer Registries (NAACCR);13 and established standards for
data management and data completeness, timeliness, and quality similar to those recommended
by NAACCR.13,14 In 1994, the first 37 States received funding from CDC.15 Currently, 46 States,
the District of Columbia, Puerto Rico, the U.S. Virgin Islands, and the U.S. Pacific Island
Jurisdictions are funded by NPCR (appendix A).16 NPCR-funded central cancer registries collect
data on patient demographics, primary tumor site, morphology, stage of disease at diagnosis, and
first course of treatment. In addition, NPCR central cancer registries conduct follow-up for vital
status by linking with state and national death files or active case follow-up.
Invasive and in situ cancer case reports are submitted to CDC by population-based statewide
central cancer registries in all 46 participating states, the District of Columbia, Puerto Rico,
Virgin Islands, and the U.S. Pacific Island Jurisdictions. In each state or territory, state laws and
regulations mandate the reporting of cancer cases by facilities and practitioners who diagnose or
treat cancer to the state health department or its designee.4 The central cancer registry receives
case reports from facilities and practitioners throughout the state and processes them according
to standard data management procedures.14 Personal identifiers including the patient’s name,
Social Security number, and street address are removed from the NPCR-CSS submission prior to
the encryption and electronic transmission of these case reports to a contractor acting on behalf
of CDC. CDC and the contractor adhere to strict data security procedures when receiving,
processing, and managing the data (appendix B). CDC has an Office for Human Research
Protections (OHRP)-approved, federal-wide assurance of compliance with rules for the
protection of human subjects in research (45 Code of Federal Regulations 46). NPCR-CSS
received formal approval (protocol #2594) from CDC’s Institutional Review Board (IRB) in
October 1999 and annual approval was sought and approved through 2020. In 2021, under the
Common Rule (45 Code of Federal Regulations Part 46, Common Rule 2018), NPCR-CSS was
deemed to be a non-research public health surveillance project and annual IRB review is no
longer necessary.
NPCR-CSS 2021 Data Release Policy
August 2021
1995–2020 Diagnosis Years
2
Central cancer registries and federal agencies routinely publish cancer incidence data 23 months
after the close of each diagnosis year based on data that meet data quality standards.16,17
However, other versions of the same data, based on the data file as it exists at different time
periods, are usually available. For example, some central cancer registries have preliminary data
available as soon as 12 months after the close of each diagnosis year. After the publication of
official statistics, central cancer registries (as well as CDC and NCI) continue to update and
republish data with new information incorporated. When cancer incidence data are published, it
is common practice to document either the data submission date (i.e., when the data were
submitted to CDC or NCI) or the date that the file was prepared. Changes in central cancer
registry incidence data that occur more than 22 months after the close of a diagnosis year are
likely to be small; however, delays in reporting are more likely to impact certain cancer sites and
may be important for some research studies.18
CDC generates multiple data products using NPCR-only data and combined NPCR and SEER
data. The combined NPCR and SEER data are referred to as U.S. Cancer Statistics (USCS).
USCS is the official federal cancer statistics, providing the most up-to-date information on the
entire U.S. population.
III.
DATA RELEASE ACTIVITIES
Starting with DP17-1701, participation in all CDC-created and hosted analytic datasets and webbased data query systems, as outlined in this policy, is a required strategy. 1 Therefore, the
NPCR-CSS Dataset Participation Agreement is no longer provided.
A. Public Web-based Query Systems
For purposes of this policy, public web-based query systems are defined as datasets that are
comprised of aggregated data (i.e., not individual case-specific data or microdata) that have been
modified according to accepted procedures to block breaches of confidentiality and prevent
disclosure of the patient’s identity or confidential information and have a database behind a CDC
firewall that is either case-specific microdata or pre-analyzed data tables.2, 5–10 Users are able to
access only aggregate counts and rates with all confidentiality protections built in. A
combination of confidentiality protection measures is employed for each public web-based query
system (see Table 1). These systems do not contain information that is identifiable or potentially
identifiable according to currently accepted procedures for reducing disclosure risk.2, 5–10 Before
each system is finalized, the aggregate values are analyzed to determine whether there is a need
for complementary cell suppression.2, 5–10 If appropriate, the analysis includes consultation with a
statistician with specific expertise in statistical disclosure limitation techniques. Following the
analysis, complementary cell suppression is applied as needed.
There are no restrictions on access to public web-based query systems. A public release
disclosure statement (see IV.C. Public Release Disclosure Statement) cautions users against
1
DP17-1701, 2. CDC Project Description, a. Approach, iii. Strategies and Activities, Program 3: National Program
of Cancer Registries (NPCR) – Component 1, Strategy 3 Cancer Data and Surveillance (Domain 1), Data
Submission (page 19)
NPCR-CSS 2021 Data Release Policy
August 2021
1995–2020 Diagnosis Years
3
inappropriate use of the data or inappropriate disclosure of information. Data are released as
delimited ASCII files, a web-based query system, or possibly through other data products (see
Table 1). As a convenience to NPCR central cancer registries, states may request from CDC a
copy of their complete state-specific analytic database that is used to populate each public webbased query system. The following public web-based query systems are currently being released:
•
USCS Data Visualizations tool
•
CDC WONDER – USCS incidence and incidence/mortality rate ratios
•
Federal partner’s web-based query systems
o NCI’s State Cancer Profiles
o CDC’s Environmental Public Health Program’s Tracking Network
o Chronic Disease Indicators (CDI) website and data portal
o Office on Women’s Health’s Health Information Gateway
All NPCR-CSS public web-based query systems consist of cancer incidence data selected from
the NPCR/SEER analytic database. This is the same database that provides cancer incidence data
for the annual release of USCS data products, including the Data Visualizations tool, public use
database and State Cancer Profiles. Data sources, case definitions, basic registry eligibility
criteria in terms of required data quality, population denominator sources, methods for
calculating incidence rates, and the rationale for specific cell suppression thresholds are as
described in the USCS Data Visualizations Tool Technical Notes, unless noted in separate
documentation that accompanies the data.
Separate documentation may accompany each data product that describes its unique features
(e.g., the data submission date, percentage of the U.S. population covered, diagnosis years and
cancer sites included, variables included, data suppression rules, any special data quality criteria
required for inclusion, and any unique statistical methods employed).
USCS Data Visualizations Tool
The USCS Data Visualizations tool is a web-based application built with D3 Java Script
libraries, React framework, and web APIs, that outputs data in hypertext markup language
(HTML) file containing the aggregate counts and rates for incidence, mortality, prevalence and
survival estimates published annually, along with text documentation and data visualizations.
The tool is available at www.cdc.gov/cancer/dataviz. It currently displays single year and 5-year
aggregate counts, crude rates, age-adjusted rates, and 95-percent confidence intervals by primary
site, sex, race, and ethnicity at the county (5-year aggregate), Congressional districts (5-year
aggregate), state, and national levels. Congressional district estimates (estimated 5-year
aggregate counts, age-adjusted rates) are presented by sex, race, and ethnicity (all
races/ethnicities, non-Hispanic White, Black, and Hispanic). In addition, cancers grouped by
associated risk factors are presented by state, sex, race, and age-group (single year and 5-year
aggregate) are presented in Data Visualizations tool. Data by stage at diagnosis and survival by
stage for select sites are presented by sex, race/ethnicity, and age at the national level. Stage at
diagnosis is categorized as localized, regional, distant, and unknown/unstaged. Preliminary and
delay-adjusted incidence rates and counts, as well as other newly identified indicators, may be
published in the tool. The Data Visualizations tool’s database is behind a CDC firewall with pretabulated data created using SEER*Stat queries, which allows for the display of counts and rates
NPCR-CSS 2021 Data Release Policy
August 2021
1995–2020 Diagnosis Years
4
that meet suppression and confidentiality protections. Users can access only aggregate counts
and rates with all confidentiality protections built in. Downloadable ASCII files with the pretabulated data are available from CDC’s website. States may request a state-specific web API.
CDC WONDER – USCS Incidence, Mortality, and Incidence/Mortality Ratios
The USCS dataset available on CDC WONDER displays the aggregate incidence and mortality
counts, rates, and 95-percent confidence intervals, by primary site, sex, race, and ethnicity at the
state, county, regional, Metropolitan Service Areas (MSA), and national levels. Cancer
incidence/mortality rate ratios (by year, state, MSA, race, ethnicity, sex, and cancer site) are also
available. The WONDER database is stored behind a CDC firewall with case-specific microdata.
Users are only able to access only aggregate counts and rates with all confidentiality protections
built in.
The WONDER tool allows users more flexibility in creating cross-tabulations than the Data
Visualizations tool. While the same underlying USCS data are available in the two tools, more
detailed breakdowns of counts and rates are available through WONDER. The additional values
result from variable selections that are not currently available in the Data Visualizations tool (see
Table 1) and include results for Metropolitan Service Areas that have met the population
threshold of 50,000 or more and standard 5-year age groups that can be combined by the user.
Federal Partners’ Web-based Systems
CDC shares aggregated data with federal partners for display in their web-based query systems.
The data are generated specifically for the partners’ needs and are shared via ASCII files.
Unless otherwise noted below, the data generally consists of aggregate cancer incidence counts,
crude rates, and age-adjusted rates for selected primary sites, age groups, and counties in the
United States (see Table 1 for more details).
Future versions may contain more detail about cancer at the county level. Beginning in 2008,
CDC began routinely publishing county data averaged over 5 years.
Age-adjusted rates only
State Cancer Profiles is a web-based query tool that public health professional can use to
prioritize cancer control efforts at the county-, state-, and national-level. Data are released to NCI
SEER for the development of the State Cancer Profiles data product, which presents average
annual counts and age-adjusted incidence and mortality rates only.
Age-adjusted and crude rates
Data released to the U.S. Department of Health and Human Services, Office of Women’s Health
(OWH) includes crude and age-adjusted rates. The data are available through their online tool,
Health Information Gateway.
Environmental Public Health Tracking Program
USCS data are provided to the CDC’s National Center for Environmental Health’s
Environmental Public Health Tracking Program (Tracking Program) for display on the Tracking
Network and through dashboards on CDC’s Division of Cancer Prevention and Control’s
(DCPC) website. The Tracking Network displays single-year and 5-year aggregate incidence
NPCR-CSS 2021 Data Release Policy
August 2021
1995–2020 Diagnosis Years
5
counts, age-adjusted rates, and 95-percent confidence intervals for selected primary sites and age
groups for selected geographic areas (see Table 1). Single-year can be viewed at the state-level;
data by 5-year average and 5-year summed are available at the county-level; data by 3-, 5-, 7-,
and 10-year aggregate are available at the subcounty level.
Maps of incidence counts and incidence rates for select cancers will be displayed for various
sub-county geographies. Incidence rates are based on incidence counts stratified by census tract,
year of diagnosis, age group (standard 19 groups), sex and Census-based population estimates.
These incidence counts and age-adjusted rates are displayed using spatial aggregation (census
tract, geographies with a minimum of 5,000 persons, or geographies with a minimum of 20,000
persons) and temporal aggregation (3-, 5-, 7-, or 10-year periods) schemas recommended by a
subcounty cancer data workgroup. Rates are age-adjusted to the 2000 US standard population.
Counts and rates are suppressed when there are <16 cases or <100 persons in the geographic
area. Specific to this project, an additional suppression is applied when the relative standard error
of the rate is >30%.
The Tracking Program’s web-based query system runs using a database behind a CDC firewall
with case-specific microdata, which allows for the calculation of locally-weighted smoothed
rates or unsmoothed rates, or both:
•
Tracking Program Unsmoothed Rates
Data published are like those on State Cancer Profiles. It includes cancer data from all 50
states.
•
Tracking Program Smoothed Rates
Smoothing is the process of averaging a measure for an area based on information about
that area and areas around it. Please note that the main purpose of smoothing is to clarify
spatial patterns and to improve the stability of rates, not to prevent disclosure of private
information. Back-calculation of case counts from smoothed rates is sometimes possible
when the method of smoothing is made known and (non-sensitive) denominator data are
available from other sources.
Through the Tracking Program, users can access only aggregate counts and rates with all
confidentiality protections built in.
•
Tracking Program National Portal to State Portal
CDC’s Tracking Program has grantees in several NPCR-funded states that are
responsible for the state-level public portals. In collaboration with the Tracking Program,
upon request, CDC-NPCR provides the state-level Tracking Network dataset to the
Tracking Program state counterpart.
Indian Health Services (IHS)
CDC continues to use the IHS linkage results for analyses of cancer incidence among American
Indian and Alaska Native populations. In addition to improving cancer incidence rates presented
in USCS Data Visualization tool, an analytic database is maintained by a CDC Division of
Cancer Prevention and Control employee assigned to IHS. Access to this database is limited to
approved CDC staff. The data are used to respond to data requests for American Indian and
Alaska Native populations cancer incidence rates from tribal epidemiology centers and tribal
NPCR-CSS 2021 Data Release Policy
August 2021
1995–2020 Diagnosis Years
6
organizations. Five-year aggregate incidence counts, age-adjusted rates, and 95-percent
confidence intervals for selected primary sites are displayed in the USCS Data Visualizations
tool (see Table 1). These data are limited to non-Hispanic American Indian and Alaska Native
people living in IHS Purchased/Referred Care Delivery Areas (PRCDA) counties. Inclusion in
this dataset also allows IHS to provide the state with the date of death obtained through NDIIHS linkage and/or the date the linkage occurred by diagnosis year, for registries that
complete an NDI supplemental confidentiality agreement for application Y9-0033.
B. Data Release to Federal and Trusted Partners
American Cancer Society (ACS)
CDC shares NPCR and USCS data with ACS to promote collaborations on cancer surveillance
and epidemiological research efforts. ACS’s Surveillance and Health Services Research (SHSR)
Program analyzes and disseminates cancer statistics and identifies gaps and opportunities for
cancer prevention, early detection and treatment. The SHSR annually publishes the statistical
report, Facts and Figures, and peer-reviewed journal articles that are used by public health
experts, clinicians, and scientists.
In 2018, a Memorandum of Understanding was implemented with the American Cancer Society,
and ACS staff members must sign a Data Use Agreement form and complete annual Assurance
of Confidentiality training before s/he is given access to the data. Beginning in 2020, due to
changes in SEER’s data release policy, CDC also obtains approval from SEER before releasing
USCS data. CDC provides ACS staff access to the following databases with record level data
through SEER*Stat software: USCS delay-adjusted database, NPCR survival database, NPCR
prevalence database, and selected variables from the NPCR and SEER Quality Control database.
The Quality Control database shared with ACS is restricted to 24-month data, excludes postal
code and census tract variables, and excludes “day” fields for date of birth and date of death.
Central Brain Tumor Registry of the United States (CBTRUS)
CBTRUS annually publishes the print and Web versions of the statistical report, Primary Brain
Tumors in the United States Statistical Report Supplement; a previous version of the report is
available at: https://www.cbtrus.org/reports. The report includes age-adjusted rates and
corresponding 95-percent confidence intervals on brain and other central nervous system tumors
and is presented by state, histology, major histology grouping, primary site, behavior, gender,
race, ethnicity, and age at diagnosis. As a trusted partner, CBTRUS is provided access to the
NPCR Survival Dataset to include survival estimates in the annual report, conduct in-depth
analyses, and respond to queries. CDC provides individual, record-level data to CBTRUS for the
publication of this report; Appendix C lists the variables included in this dataset. Only states
meeting the USCS publication criteria are included in the dataset.
In addition, CBTRUS uses these data to respond to inquiries that are more specific than those
that are provided by the report. For these inquiries, no individual record level data are released;
only aggregated data with the corresponding confidence intervals (if applicable) and appropriate
suppression criteria are provided to data inquirers. Attribution to NPCR is provided. CBTRUS
NPCR-CSS 2021 Data Release Policy
August 2021
1995–2020 Diagnosis Years
7
signs data use agreements before data are released for their report and future inquiries. For
questions, contact CBTRUS staff at cbtrus@aol.com.
International Association of Cancer Registries (IACR)
The International Association of Cancer Registries (IACR) produce the Cancer Incidence in Five
Continents (CI5) and the International Incidence of Childhood Cancer (IICC). The CI5 series of
monographs, published every five years, has become the reference source of data on the
international incidence of cancer. The most recent version was published in 2017. The CI5
databases provide access to detailed information on the incidence of cancer recorded by cancer
registries (regional or national) worldwide in two formats (CI5 and CI5plus) and the IICC
provides access to detailed information on the incidence of pediatric cancers:
•
CI5
Presents the basic data published in the CI5 volumes.
•
CI5plus
Contains annual incidence for selected cancer registries published in CI5 for the longest
possible period.
•
IICC
Presents basic pediatric data.
When IACR requests data, the formal Call for Data Submission giving information on the
evaluation procedure, likely layout of how data will be presented, and questionnaire on registry
operations will be available from the IACR website. CDC-NPCR may facilitate the call for data
on behalf of awardees. CDC-NPCR will provide additional information regarding the CI5 Call
for Data as it becomes available. There are two components of the CI5 Call for Data: 1) the
questionnaire and introductory text and 2) data submission.
Data submitted for CI5 may also be used for the IICC publication making a separate data
submission unnecessary. This IACR product does require a separate questionnaire and
introductory text to be completed by the states.
States are responsible for completing the on-line questionnaires and providing an introductory
text, indicating if the CI5 data and introductory text are also used for the IICC product. CDCNPCR will submit aggregated NPCR data for central cancer registries meeting USCS publication
criteria.
CONCORD
CONCORD is the global program for world-wide surveillance of cancer survival and is led by
the London School of Hygiene & Tropical Medicine and supported by the Union for
International Cancer Control (UICC). CONCORD monitors progress towards the overarching
goal of the UICC World Cancer Declaration made in 2013: “major reductions in premature
deaths from cancer, and improvements in quality of life and cancer survival”.
A call for participation in the CONCORD studies is periodically issued and extends examination
of world-wide cancer survival trends for certain cancer sites: i.e., stomach, colon, rectum, liver,
NPCR-CSS 2021 Data Release Policy
August 2021
1995–2020 Diagnosis Years
8
lung, breast, cervix, ovary, prostate, esophagus, pancreas, and melanoma of skin in adults, as
well as leukemias, lymphomas, and brain tumors in adults and children (0-14 years). The
protocol and dataset specifications are posted to NPCR-CSS Document Server, CONCORD tab
as they become available.
CDC-NPCR may facilitate the call for data on behalf of awardees by submitting NPCR data for
central cancer registries meeting USCS publication criteria for survival analyses (meet USCS
data quality criteria and have conducted active patient follow-up or linked records with the
National Death Index).
Agency for Healthcare Research and Quality (AHRQ)
Health and Human Service’s Agency for Healthcare Research and Quality (AHRQ) is the lead
federal agency charged with improving the safety and quality of America’s health care system. It
develops and disseminates knowledge, tools, and data to improve health care systems and help
Americans, health care professionals, and policy makers make informed health decisions. NPCRCSS data are shared with AHRQ for reports on national healthcare quality and disparities.
C. Analytic datasets
USCS Analytic Data
Combined NPCR and SEER incidence data are referred to as U.S. Cancer Statistics (USCS).
CDC creates USCS Analytic Datasets each year that include data from central cancer registries
meeting USCS publication criteria and diagnosis year coverage. CDC, NCI staff members, and
contractors perform analyses of USCS data as needed using these internal analytic databases.
The datasets are made available via SEER*Stat software to federal employees, fellows, and
contractors in the CDC’s Division of Cancer Prevention and Control and NCI’s SEER Program
after obtaining SEER Registry Plus access, signing a NPCR Analytic Data Use Agreement
(Appendix D) and CDC Nondisclosure Agreement (Appendix E) and completing annual
Assurances of Confidentiality training. The dataset is also available to approved partnering
organizations and state central cancer registries after a Memorandum of Understanding and Data
Use Agreements are signed (see Appendix H and Appendix I).
In specially established collaborative relationships, researchers external to CDC, NCI, and ACS
may be provided access to the USCS analytic datasets. In these relationships, CDC staff must be
included in the analytic project as a co-author, Data Use Agreements must be signed, and
Assurance of Confidentiality training must be completed before access will be provided.
Additionally, access will only be allowed on-site at CDC’s Cancer Surveillance Branch offices.
See the section “External Data Requests”.
Cancer surveillance and epidemiological analyses include assessment of the completeness,
timeliness, and quality of cancer incidence data and analyses of the cancer burden and survival as
needed for meeting national cancer control objectives. Such analyses of state and national data
are conducted routinely by federal agencies, including CDC and NCI, for programmatic or
statistical purposes, as needed, to achieve the agencies’ mandates.
NPCR-CSS 2021 Data Release Policy
August 2021
1995–2020 Diagnosis Years
9
There are five internal analytic datasets routinely analyzed by CDC and NCI staff members:
NPCR/SEER USCS Incidence Analytic Dataset
CDC and NCI staff members and contractors conduct cancer surveillance and epidemiological
research that result in publications, data briefs, and presentations. Examples of research include
descriptive analyses by racial and ethnic populations for specific cancers, descriptions of cancer
incidence trends, and descriptive analyses of the quality of the data. Appendix F lists the
variables available in this dataset.
NPCR Internal Survival Dataset
Cancer survival data are critical for evaluating the progress and impact of early
detection/screening programs and/or comprehensive cancer control plans as well as interventions
from other sources. CDC’s NPCR-CSS calculates and publishes survival estimates on this
population at the national, state, and regional levels. Focusing on the entire NPCR-CSS dataset
supports analyses of survival estimates for rare cancers that cannot be addressed otherwise and
provides data for publication on the USCS Data Visualizations tool. Appendix G lists the
variables available in this dataset.
NPCR/SEER Survival Dataset
This database contains data from NPCR- and SEER-funded registries that have completed
National Death Index linkages or active patient follow-up for all years included in the database
and meet 95% completeness estimates. This dataset will be used to assess Healthy People 2030
cancer objective C-11: Increase the proportion of cancer survivors who are living 5 years or
longer after diagnosis. The list of variables included in the dataset are the same as the NPCR
Internal Survival Data, which are listed in Appendix G.
NPCR Internal Prevalence Dataset
This database provides limited duration prevalence estimates for NPCR registries who meet
USCS publication criteria for all years included in the database and that have completed National
Death Index linkages or active patient follow-up for all years included in the database. Statistics
generated from this dataset are published on the USCS Data Visualizations tool. The list of
variables available in this dataset are in Appendix O.
NPCR/SEER USCS Delay-Adjusted Dataset
Case-reporting delay may result in an underestimate of true incidence. Researchers can adjust for
this delay using composite delay factors, thus producing more precise cancer incidence trends.
The composite delay factors used in this database were developed by SEER and are used by
NPCR, SEER, and NAACCR. The delay-adjustment factors account for cancer site, registry,
age, race, ethnicity, and diagnosis year, and are used to estimate delay-adjusted counts and rates.
The list of variables available in this dataset are in Appendix P.
In compliance with the 308(d) Assurance of Confidentiality, CDC and NCI employees and
contractors and partner organizations conducting these analyses are required to handle the
information in accordance with principles outlined in the CDC Staff Manual on Confidentiality
and to follow the specific procedures documented in the NPCR-CSS Confidentiality/Security
Statement (appendices B, H, and I).
NPCR-CSS 2021 Data Release Policy
August 2021
1995–2020 Diagnosis Years
10
In addition, CDC, SEER, and partner organization staff members are required to acknowledge
state cancer registries whenever NPCR-CSS data are presented, released, or published by CDC
by making available the following (or similar) statement:
These data were provided by central cancer registries participating in the National
Program of Cancer Registries (NPCR) and submitted to CDC in [Month, Year], and/or
the Surveillance, Epidemiology and End Results (SEER) program and submitted to NCI
in [Month, Year]. The dataset includes data for diagnosis years 1998-xxxx (excluding
SEER-Metro Registry data).
NPCR/SEER USCS Incidence and Survival Public-Use Research Dataset
For purposes of this policy, the NPCR/SEER USCS Incidence Public-Use Research Dataset
(Incidence PUD) and the NPCR Survival Public-Use Research Dataset (Survival PUD) are
defined as the version of the full NPCR/SEER USCS microdata (i.e., individual case-specific
data) that have been modified as needed to minimize the potential for disclosure of confidential
information. These datasets contain a subset of data items published in the NPCR/SEER USCS
Incidence Analytic dataset. Personal identifiers, such as a patient’s name, street address, or
Social Security number, are not included in these datasets as this information is not transmitted
by central cancer registries to CDC as part of their annual data submission. Certain data items,
such as date of birth and reporting-source (death certificate only and autopsy only) cases, may be
removed from these research datasets to minimize the potential identification of individuals with
the occurrence of rare cancer in a person of certain age or racial or ethnic group or living in a
specific county. The list of the variables included in the NPCR/SEER USCS Incidence PublicUse Dataset is in Appendix J. The NPCR Survival Public-Use Research Dataset is under
development. Before releasing the Survival PUD, the NPCR CSS Data Release Policy will be
updated.
The Incidence PUD dataset, previously only available to NPCR Registry Staff, is now available
publicly through SEER*Stat software. Upon completion, the Survival PUD will be made
available through the same mechanism. Researchers are given access to the data after obtaining
SEER Registry Plus access and signing an NPCR and SEER – U.S. Cancer Statistics Research
Data Use Agreement (Appendix K). A Public Release Disclosure Statement cautions users
against inappropriate use of the data or inappropriate disclosure of information. Cell suppression
of <16 cases is automatic and the SEER*Stat case listing function is disabled as additional data
protection measures. This dataset allows the authorized counts, crude rates, age-adjusted
incidence rates, and 95-percent confidence intervals to be generated by the authorized user to
meet their specific needs.
Restricted-Access Research Dataset (RDC)
For purposes of this policy, the restricted-access dataset is defined as the version of the full
NPCR/SEER USCS analytic dataset, either aggregated data or microdata (i.e., individual
case-specific data) that has been modified as needed to minimize (but may not remove entirely)
the potential for disclosure of confidential information.
NPCR-CSS 2021 Data Release Policy
August 2021
1995–2020 Diagnosis Years
11
CDC uses the National Center for Health Statistics Research Data Center (NCHS RDC) as a
mechanism for researchers outside of the Division of Cancer Prevention and Control (DCPC) to
request and gain access to NPCR data for research purposes. The data are available through the
NCHS RDC only after the standard data quality reviews that occur as part of the preparation for
USCS. The restricted-access dataset is released to researchers through the NCHS RDC after
CDC authenticates the requestor’s identity and research intent through an extensive proposal
review process and after the researcher completes the NCHS RDC confidentiality and security
requirements. The requestor must also comply with the confidentiality procedures at and data
sharing agreements with the NCHS RDC.
The NCHS RDC has developed and maintains detailed data sharing agreements and procedures
for user authentication and for logging and monitoring of data releases. Proposed project
proposals are reviewed by CDC, which includes NPCR and NCHS RDC staff. Proposals may
also be shared for review with central cancer registry staff whose data are included in the
proposed project. User documentation includes a data dictionary for every diagnosis year
available at the NCHS RDC.
The use of the NCHS RDC to manage data access provides the highest level of data security and
protection of confidentiality that is available for data analysis. Using the NCHS RDC allows
CDC to comply with the Assurance of Confidentiality [308(d)] that was obtained for the NPCRCSS data. The NCHS RDC is also covered by a separate Assurance of Confidentiality [308(d)].
For further information regarding the NCHS RDC, refer to Appendix L of this policy.
The restricted-access dataset does not contain personal identifiers such as a patient’s name, street
address, or Social Security number as this information is not transmitted by central cancer
registries to CDC as part of their annual data submission. However, the dataset may contain
information that is potentially identifiable especially when linked with other datasets, such as the
occurrence of a rare cancer in a person of a certain age or racial or ethnic group or living in a
specific county. The data are made available to researchers through a SAS dataset. The RDC
staff creates a SAS dataset specific to each project. Researchers must include a data dictionary in
their proposal and only the requested variables are included in the SAS file.
D. Data Release Under Controlled Conditions
CDC-wide policy stipulates that a CDC program may consider release of data that cannot be
released as either a public web-based system, a research dataset, or restricted-access dataset
under certain controlled conditions.1 These controlled conditions may include a CDC-controlled
data center such as the data center established at National Center for Health Statistics (NCHS),
on-site at CDC’s Cancer Surveillance Branch offices, or through special licensing. Except as
described above, NPCR-CSS data will not otherwise be released under these controlled
conditions while the current policy is in place. Release of data under controlled conditions will
be considered as part of discussions with partners, and a determination will be made as to
whether such releases of data will be considered for NPCR-CSS data.
NPCR-CSS 2021 Data Release Policy
August 2021
1995–2020 Diagnosis Years
12
E. Emergency and Provisional Data Releases
It is not anticipated that CDC will need to release NPCR-CSS data before the files have been
modified as needed to protect confidentiality as described in this policy. This is prohibited by the
308(d) Assurance of Confidentiality (appendices B, H, and I).
Provisional data and draft data tables may be shared with CDC employees and contractors,
NPCR central cancer registries, and other partners in order to facilitate quality reviews of the
data. When appropriate, individuals who participate in such reviews sign a NPCR Analytic Data
Use Agreement and a CDC Nondisclosure Agreement (when applicable) before accessing the
data or tables.
IV.
PROTECTION OF DATA
A. Assurance of Confidentiality
All data collected and maintained by NPCR-CSS must be managed, presented, published, and
released with strict attention to confidentiality and security, consistent with the general principles
and guidelines established by CDC for confidential case data1–3 and specific restrictions imposed
on NPCR-CSS data (appendices B, H, and I).4 Special care must be given to cancer incidence
data that are not directly identifiable because geographic and small cell data may be indirectly
identifying when combined with detailed information in case reports, laboratory reports, medical
records, or linkage with other data files.5–10
NPCR-CSS has approval for protection under section 308(d) of the Public Health Services (PHS)
Act (42 U.S.C. 242m(d)) (appendices B, H, and I). The 308(d) confidentiality assurance protects
identifiable and potentially identifiable information from being used for any purpose other than
the purpose for which it was collected (unless the person or establishment from which it was
obtained has consented to such use). This assurance protects against disclosures under a court
order and provides protections that the Privacy Act of 1974 (5 U.S.C. 552a) does not. For
example, the Privacy Act of 1974 protects individual participants, but the 308(d) confidentiality
assurance also protects institutions. Confidentiality protection granted by CDC promises
participants and institutions that their data will be shared only with those individuals and
institutions listed in the project’s consent form or in its specified policies.
B. Suppression of Rates and Counts
When the numbers of cases or deaths used to compute rates are small, those rates tend to have
poor reliability. Another important reason for using a threshold value for suppressing cells is to
protect the confidentiality of patients whose data are included in a report by reducing or
eliminating the risk of disclosing their identity.
Therefore, to discourage misinterpretation or misuse of rates or counts that are unstable because
case or death counts are small, annual incidence and death rates and counts in publicly available
datasets and web-based query systems are suppressed if the case or death counts are below 16. A
count of fewer than about 16 results in a standard error of the rate that is approximately 25% or
more as large as the rate itself. Similarly, a case count below 16 results in the width of the 95%
confidence interval around the rate being at least as large as the rate itself. These relationships
were derived under the assumption of a Poisson process and with the standard population age
NPCR-CSS 2021 Data Release Policy
August 2021
1995–2020 Diagnosis Years
13
distribution assumed to be similar to the observed population age distribution. For aggregated
time periods, counts and rates are suppressed for less than 16 cases. However, average annual
rates and counts may not be suppressed if the total case count for the time period exceeds 16.
The cell suppression threshold value of 16, which was selected to reduce misuse and
misinterpretation of unstable rates and counts, is more than sufficient to protect patient
confidentiality.
Per the Data Use Agreements, researchers using restricted access data files are required to
suppress count and statistical results that are based on cells with fewer than 6 cases in
publications and presentations. Researchers are advised to use caution when presenting or
interpreting results based on less than 16 cases.
Complementary cell suppression and suppression of certain race and ethnicity combinations are
required as additional measures to assure confidentiality and stability.
C. Public Release Disclosure Statement
The following (or similar) public release disclosure statement is prominently displayed for users
of all NPCR-CSS public web-based query systems, research datasets, and restricted-access
datasets:
Data Use Restrictions: Read Carefully Before Using
By using these data, you signify your agreement to comply with the following
statutorily based requirements. The National Program of Cancer Registries (NPCR),
Centers for Disease Control and Prevention (CDC), has obtained an assurance of
confidentiality pursuant to Section 308(d) of the Public Health Service Act, 42 U.S.C.
242m(d). This assurance provides that identifiable or potentially identifiable data
collected by the NPCR may be used only for the purpose for which they were obtained
unless the person or establishment from which they were obtained has consented to such
use. Any effort to determine the identity of any reported cases, or to use the information
for any purpose other than statistical reporting and analysis, is a violation of the
assurance. Therefore users will:
•
•
•
Use the data for statistical reporting and analysis only.
Make no attempt to learn the identity of any person or establishment included in these
data.
Make no disclosure or other use of the identity of any person or establishment
discovered inadvertently, and advise the Associate Director for Science, Office of
Science Policy and Technology Transfer, CDC, Mailstop D-50, 1600 Clifton Road,
N.E., Atlanta, Georgia, 30333, Phone: 404-639-7240) (or NCI’s SEER Program if
SEER data) and the relevant state or metropolitan area cancer registry, of any such
discovery.
D. Freedom of Information Act (FOIA) Data Requests
The Freedom of Information Act (FOIA) (http://www.cdc.gov/od/foia/) generally provides that,
upon written request from any person, a federal agency (i.e., CDC) must release any agency
NPCR-CSS 2021 Data Release Policy
August 2021
1995–2020 Diagnosis Years
14
record unless that record falls (in whole or part) within one of nine exemptions. FOIA applies to
federal agencies only and covers only records in the possession and control of those agencies at
the time of the FOIA request (except in certain instances involving grantee-held data). Because
state-based data become a federal record in CDC’s possession, such records are subject to
disclosure in response to a FOIA request. The FOIA exemptions that may be available to protect
some aspects of state data from public disclosures in response to a FOIA request are:
•
•
Exemption 3, which specifically exempts information from disclosure by statute (in this
instance, pursuant to an Assurance of Confidentiality under Section 308(d) of the Public
Health Service Act), and
Exemption 6, which exempts from disclosure personnel and medical files and similar
files, which would constitute an unwarranted invasion of personal privacy.
In general, non-FOIA requests to CDC from the public, media, and other government agencies
for local cancer incidence data are referred to the state health department for a reply. There are
three reasons for this: (1) the state health departments can release cancer incidence data in
accordance with locally established policies and procedures and consistent with provisions of the
Cancer Registries Amendment Act (Public Health Service Act, (42 USC 280e-280e-4), as
amended);4 (2) the relative infrequency of data submission to federal agencies assures that the
state health department or its designated central cancer registry will have the most complete,
accurate, and up-to-date information; and (3) the central registry may be able to provide more
detailed data that can better meet the needs of the requestor. When the request is for data
regarding cancer incidence involving more than one state, CDC will refer the requestor to
published reports or to NPCR-CSS datasets that are released in accordance with practices
described in this document, if relevant.
E. CDC External Data Requests
Individuals, agencies, or organizations outside CDC may request data not available from a public
web-based query system or research dataset. When the requests do not identify a state, CDC staff
members or contractors tabulate the data for the inquirer. For requests that identify a state, CDC
staff members may seek states’ permission regarding use. See Appendix N for additional details.
Researchers may submit data query or study proposal requests for the NPCR/SEER USCS
Incidence Analytic Dataset to CDC. These requests must include:
• Names of individuals who will need access to the data
• Purpose and public health significance of the investigation
• Research question(s)
• Variables required beyond those in the freely-available research data
• Subset of cases needed (specifically cancer type, data years, registries)
• Planned use of data (e.g., manuscript, poster, presentation)
After CDC authenticates the requestor’s identity and research intent, and verifies that
confidentiality is maintained, a CDC analyst will process the data query and provide results to
the researcher. The requestor must comply with all confidentiality and data suppression
procedures outlined in the NPCR-CSS Assurance of Confidentiality [308(d)].
NPCR-CSS 2021 Data Release Policy
August 2021
1995–2020 Diagnosis Years
15
In circumstances where the researcher requires access to the USCS Analytic Datasets:
• CDC staff must be included in the analytic project as a co-author
• Data Use Agreements must be signed
• Assurance of Confidentiality training must be completed
• Access is only allowed on-site at CDC’s Cancer Surveillance Branch offices.
NPCR-CSS 2021 Data Release Policy
August 2021
1995–2020 Diagnosis Years
16
V.
REFERENCES
1. Centers for Disease Control and Prevention. CDC/ATSDR Policy on Releasing and Sharing
Data. Atlanta: Centers for Disease Control and Prevention; 2003. Available at
http://www.cdc.gov/maso/Policy/ReleasingData.pdf.
2. Centers for Disease Control and Prevention. CDC/ATSDR/CSTE Data Release Guidelines for
Re-Release of State Data. Atlanta: Centers for Disease Control and Prevention; 2003 (available
upon request).
3. Centers for Disease Control and Prevention. CDC Staff Manual on Confidentiality. Atlanta:
Centers for Disease Control and Prevention; 1984 and National Center for Health Statistics.
NCHS Staff Manual on Confidentiality. Hyattsville, MD: National Center for Health Statistics;
1999.
4. Cancer Registries Amendment Act, Public Law 102-515, Stat. 3312 (October 22, 1992).
Available at http://www.cdc.gov/cancer/npcr/npcrpdfs/publaw.pdf.
5. American Statistical Association. Data Access and Personal Privacy: Appropriate Methods of
Disclosure Control. Alexandria, VA: American Statistical Association; 2008. Available at
https://www.amstat.org/asa/files/pdfs/POL-DataAccess-PersonalPrivacy.pdf.
6. Doyle P, Lane JI, Theeuwes JM, Zayatz LM (eds). Confidentiality, Disclosure, and Data
Access: Theory and Practical Application for Statistical Agencies. Amsterdam: Elsevier
Science BV; 2001.
7. Federal Committee on Statistical Methodology. Checklist on Disclosure Potential of Proposed
Data Releases. Available at https://nces.ed.gov/FCSM/doc/checklist_799.doc.
8. Federal Committee on Statistical Methodology. Report on Statistical Disclosure Limitation
Methodology. (Statistical Working Paper 22). Washington, DC: Office of Management and
Budget; 1994 (Revised 2005). Available at https://nces.ed.gov/FCSM/pdf/spwp22.pdf.
9. McLaughlin C. Confidentiality protection in publicly released central cancer registry data.
Journal of Registry Management 2002; 29(3):84–88.
10. Stoto M. Statistical Issues in Interactive Web-Based Public Health Data Dissemination
Systems. Draft report prepared for the National Association of Public Health Statistics and
Information Systems, Rand Corporation; September 2002.
11. Surveillance, Epidemiology, and End Results Program. The SEER Program Code Manual. 3rd
ed. Bethesda, MD: National Cancer Institute,1998.
12. Percy C, Van Holten V, Muir C (eds). International Classification of Diseases for Oncology,
2nd edition. Geneva, Switzerland: World Health Organization; 1990.
13. Hultstrom D, editor. Standards for Cancer Registries, vol. II: Data Standards and Data
Dictionary, version 9.1, 6th ed. Springfield, IL: North American Association of Central Cancer
Registries; 2001.
14. North American Association of Central Cancer Registries. Standards for Cancer Registries,
vol. III: Standards for Completeness, Quality, Analysis, and Management of Data. Springfield,
IL: North American Association of Central Cancer Registries; 2000.
NPCR-CSS 2021 Data Release Policy
August 2021
1995–2020 Diagnosis Years
17
15. Hutton MD, Simpson, LD, Miller DS, Weir HK, McDavid K, Hall HI, Progress toward
nationwide cancer surveillance: an evaluation of the National Program of Cancer Registries,
1994–1999. Journal of Registry Management 2001;28(3):113–120.
16. Centers for Disease Control and Prevention. U.S. Cancer Statistics Publication Criteria.
Available at: https://www.cdc.gov/cancer/uscs/technical_notes/criteria/index.htm.
17. NAACCR Method to Estimate Completeness. A Data Analysis Tool for Calculations.
Available at: https://www.naaccr.org/analysis-and-data-improvementtools/#COMPLETENESS.
18. Clegg LX, Fueur EJ, Midthune DN, Fay MP, Hankey BF. Impact of reporting delay and
reporting error on cancer incidence rates and trends. Journal of the National Cancer Institute
2002; 94(20):1537–45.
NPCR-CSS 2021 Data Release Policy
August 2021
1995–2020 Diagnosis Years
18
TABLE 1 –Comparison of the National Program of Cancer Registries-Cancer Surveillance System Datasets
Overview
Format
Mode of Access
Web Address or Contact
Information
Contains Potentially
Identifiable Information
Registry Eligibility Criteria
for Data Completeness and
Quality
When Available
a
b
USCS Data
Visualizations
Tool
Database of
aggregate counts
and rates, with text
documentation
Web-based query
system with
downloadable
ASCII files
USCS Web site
www.cdc.gov/canc
er/dataviz
Public Web-Based Query Systems
USCS WONDER a
USCS Data for
Partners b
NCEH’S Tracking
Network
Database of
aggregate counts
and rates, with text
documentation.
The database
behind the CDC
firewall is casespecific microdata.
Database of
aggregate counts and
rates, with text
documentation
Database of
aggregate counts and
rates, with text
documentation. The
database behind the
CDC firewall is casespecific microdata.
Web-based query
system
Flat ASCII file, webbased query system,
and separate brief
text documentation
Request from
uscsdata@cdc.gov
(specify “USCS
County” in subject
line)
Web-based query
system
CDC WONDER
http://wonder.cdc.g
ov
No
No
National
Environmental Public
Health Tracking
Program
https://ephtracking.cd
c.gov/
No
USCS
publication criteria
USCS publication
criteria;
data meet criteria for
unknown county
Updated 2022
USCS publication
criteria;
data meet criteria for
unknown county
Updated 2022
Updated 2022
This data file is also shared with OWH.
This data file is shared with CDI and AHRQ.
NPCR-CSS 2021 Data Release Policy
August 2021
1995–2020 Diagnosis Years
19
Analytic datasets
USCS Public-Use
USCS RestrictedResearch Database
Access Dataset
Customized, analytic
database. The
database behind the
SEER*Stat firewall is
case-specific
microdata with
enforced cell
suppression and case
listing disabled.
SEER*Stat clientserver mode only
after receipt of signed
Data Use Agreement
https://www.cdc.gov/
cancer/public-use
Customized, analytic
database available
through proposal process
No
Yes
USCS publication
criteria
USCS
publication criteria; data
meet criteria for
unknown county
Updated 2022
Updated 2022
On-site at CDC or
through CDC staff
assistance
Application process
available at
www.cdc.gov/rdc
TABLE 1 – Comparison of the National Program of Cancer Registries-Cancer Surveillance System Datasets
Cases Included
Public Web-Based Query Systems
USCS WONDER
NPCR/SEER
USCS County
States/ Territories
Diagnosis Years
Cancer Sites
USCS Data
Visualizations
Tool
NPCR/SEER States meeting eligibility
criteria
1999; 2000; 2001;
2002; 2003; 2004;
2005; 2006; 2007;
2008; 2009; 2010;
2011; 2012; 2013;
2014; 2015; 20112015; 2016; 2017;
2018; 2019; 2020
preliminary results
1999; 2000; 2001;
2002; 2003; 2004;
2005; 2006; 2007;
2008; 2009; 2010;
2011; 2012; 2013;
2014; 2015; 2016;
2017; 2018; 2019
All reportable invasive cancers; in situ female
breast, in situ male and female breast, and
benign and borderline primary intracranial and
central nervous system tumors (diagnosis year
2004)
NCEH’s Tracking
Network
Analytic datasets
USCS Public-Use
USCS Restricted-Access
Research Database
Dataset
NPCR/SEER States
meeting eligibility
criteria
NPCR States meeting
eligibility criteria
NPCR/SEER States
meeting eligibility
criteria
NPCR States meeting
eligibility criteria
2015-2019
Individual years 2001
through 2019 for
state level; 5-year
increments for county
level; 10-year increments
for DCPC melanoma
dashboard;
3-, 5-, 7- and 10-year
increments for sub-county
level
Female breast; Late stage
female breast; lung and
bronchus; bladder; brain
& other nervous system;
thyroid; leukemias (all
types; Acute myeloid
leukemia; Chronic
lymphocytic leukemia);
non-Hodgkin lymphoma;
all childhood cancers
(state level only);
childhood leukemias (state
level only); childhood
CNS & miscellaneous
intercranial & intraspinal
neoplasms (state level
only); mesothelioma (state
level only); kidney &
renal pelvis; prostate;
melanoma of skin; liver &
intrahepatic bile duct;
pancreas; oral cavity and
pharynx; esophagus,
larynx; testis, colon and
rectum
2001; 2002; 2003;
2004; 2005; 2006;
2007; 2008; 2009;
2010; 2011; 2012;
2013; 2014; 2015;
2016; 2017; 2018;
2019;
1999; 2000; 2001; 2002;
2003; 2004; 2005; 2006;
2007; 2008; 2009; 2010;
2011; 2012; 2013; 2014;
2015; 2016; 2017; 2018;
2019
All reportable invasive
cancers; in situ female
breast, and benign and
borderline primary
intracranial and central
nervous system tumors
(diagnosis year 2004)
All reportable invasive and
in situ cancers and benign
and borderline primary
intracranial and central
nervous system tumors
(diagnosis year 2004)
All reportable cancer
sites combined; female
breast; in situ female
breast; cervix uteri;
colon and rectum; lung
and bronchus;
melanoma; bladder;
prostate; oral cavity
and pharynx; brain and
other nervous system;
thyroid; kidney and
renal pelvis; stomach;
ovary; corpus and
uterus, NOS;
leukemias;
non-Hodgkin
lymphoma; liver and
intrahepatic bile duct;
pancreas, esophagus;
and childhood cancers
NPCR-CSS 2021 Data Release Policy
August 2021
1995–2020 Diagnosis Years
20
TABLE 1 – Comparison of the National Program of Cancer Registries-Cancer Surveillance System Datasets
Variables Included
USCS Data
Visualizations
Tool
Geographic Levels
Race/Ethnicity
Age Groups
Summary Stage
Histology
Public Web-Based Query Systems
USCS WONDER
USCS Data for
Partners
All areas combined;
All areas combined;
NPCR/SEER state,
NPCR and SEER
territory,
state or territory;
congressional
county; region; MSA
districts, county;
for cities of >500,000
SEER metropolitan
(additional levels
may be added)
area, IHS regions
(AI/AN data only)
All races combined; White; Black;
Asian/Pacific Islander (API); American
Indian/Alaska Native (AI/AN); Hispanic;
White Hispanic; White non-Hispanic; Black
Hispanic; Black non-Hispanic
All ages combined
and standard 5-year
age groups for adults
and <15, <20, and 5year age groups for
childhood cancers
All ages combined
and standard 5-year
age groups that can
be combined by the
user
Yes (Localized,
Yes
Regional, Distant,
and
unknown/unstaged)
International Classification of Childhood
Cancers, Third Revision (all geographic areas
combined), Mesothelioma (national and state
level), Kaposi Sarcoma (national and state
level), Consensus Conf on Cancer Registration
of Brain, and CNS Tumors (all geographic
areas combined)
NCEH’s Tracking
Network
Analytic datasets
USCS Public-Use
USCS Restricted-Access
Research Database
Dataset
NPCR and SEER state
or territory;
county
NPCR state;
county; sub-county (to
include census tract, 5k,
and 20k aggregations)
All areas combined; U.S.
census region; NPCR
and SEER state or
territory
NPCR and SEER state or
territory; county for
approved requests only
All races combined;
White; Black; AI/AN;
API; Hispanic;
White/Black
Hispanic/non-Hispanic
All races combined;
White; Black; AI/AN;
API; Hispanic; White
non-Hispanic; WhiteHispanic. (Sub-county
displayed for all races
combined)
All races reported; Hispanic;
White Hispanic; White nonHispanic; Black Hispanic;
Black non-Hispanic
Childhood cancers: <15
and <20; all other
cancers: <50, 50–64,
65+
Childhood cancers: <15
and <20
Breast cancer: <50, 50+
All races combined;
White; Black;
Asian/Pacific Islander
(API); American
Indian/Alaska Native
(AI/AN); Hispanic;
White Hispanic; White
non-Hispanic; black
Hispanic; Black nonHispanic
All ages combined,
standard 5-year age
groups
No
Yes (late stage
screening amenable
cancers)
Yes
No
No
Same as USCS Data
Visualizations tool
NPCR-CSS 2021 Data Release Policy
August 2021
1995–2020 Diagnosis Years
21
Standard 5-year age groups
and individual ages (Month
and day of birth not provided
for confidentiality reasons. If
the age at diagnosis >99,
then grouped into one
category. Year of birth is
also grouped.)
Yes
Yes
TABLE 1 – Comparison of the National Program of Cancer Registries-Cancer Surveillance System Datasets
Confidentiality Protection/Disclosure Limitation Measures Employed
USCS Data
Visualizations
Tool
Direct or Record-Level
Identifiers?
Aggregation
Limited Number of
Variables
Grouping/Collapsing of
Variables or Response
Codes; e.g., race and age
recode
(1) Average Annual Counts
Rounded to the Nearest
Whole Number
(2) Average Annual Rates
(3) Annual Averages Are
Based on At Least 5 Years
of Data
Cell Suppression
Public Web-Based Query Systems
USCS WONDER NPCR/SEER USCS
County
Tracking Network
Analytic datasets
USCS Public-Use
USCS Restricted-Access
Research Database
Dataset
No
No
No
No
Yes, but not in output which
will be reviewed by CDC staff
for confidentiality
Yes
Yes
Yes
Yes
Yes
Yes
No
Yes
No
Yes
Yes
No
Yes
Yes
Yes
No
Yes
Yes
No
No
Yes
Yes
Yes
Yes
Yes (output reviewed by CDC
Counts and rates: count of <16
Counts and rates: 5 year
total count of <16
Counts and unsmoothed
rates: count of <16 or RSE
> limit (25% for state and
county-level, 30% for
subcounty level)
Smoothed rates: RSE
> limit (25% for state and
county-level, 30% for
subcounty level)
Counts and rates: count
of <16 enforced
Case listing disabled
analyst to ensure counts of <6
are suppressed)
Complementary Cell
Suppression
Public Release Disclosure
Statement
Data Sharing Agreement
and/or IRB Approval
User Authentication
As needed
As needed
As needed
As needed
As needed
Yes
Yes
Yes
Yes
Yes
No
No
No
Yes
Yes
No
No
No
No
Yes
Logging and Monitoring
Limited
Limited
Limited
Yes, monitoring
databases used, session
type and date only
Yes
NPCR-CSS 2021 Data Release Policy
August 2021
1995–2020 Diagnosis Years
22
APPENDIX A – State and Metro Area Cancer Registries
State, Metropolitan Area, and Territory Cancer Registries by Federal Funding Source, and First
Diagnosis Year* for Which Cancer Cases Were Reportable to CDC’s NPCR or NCI’s
SEER Program
State, Metropolitan Area, or
Territory
Alabama
Alaska
Arizona
Arkansas
California
Los Angeles
San Francisco-Oakland
San Jose-Monterey
Colorado
Connecticut
Delaware
District of Columbia
Florida
Georgia
Atlanta
Hawaii
Idaho
Illinois
Indiana
Iowa
Kansas
Kentucky
Louisiana
Maine
Maryland
Massachusetts
Michigan
Detroit
Minnesota
Mississippi
Missouri
Montana
Nebraska
Nevada
First Diagnosis Year for
Which Cancer Cases
Were Reportable to
NPCR or SEER*
1996
1996
1995
1996
1995/2000
1992
1973
1992
1995
1973
1997
1996
1995
1995/2010
1975
1973
1995/2018
1995/2022
1995
1973
1995
1995/2000
1995/2000
1995
1996
1995/2018
1995
1973
1995
1996
1996
1995
1995
1995
NPCR-CSS 2021 Data Release Policy
August 2021
1995–2020 Diagnosis Years
23
Federal Funding Source
NPCR
NPCR
NPCR
NPCR
NPCR/SEER
SEER
SEER
SEER
NPCR
SEER
NPCR
NPCR
NPCR
NPCR/SEER
SEER
SEER
NPCR/SEER
NPCR/SEER
NPCR
SEER
NPCR
NPCR/SEER
NPCR/SEER
NPCR
NPCR
NPCR/SEER
NPCR
SEER
NPCR
NPCR
NPCR
NPCR
NPCR
NPCR
APPENDIX A – State and Metro Area Cancer Registries
State, Metropolitan Area, and Territory Cancer Registries by Federal Funding Source, and First
Diagnosis Year* for Which Cancer Cases Were Reportable to CDC’s NPCR or NCI’s
SEER Program
State, Metropolitan Area, or
Territory
New Hampshire
New Jersey
New Mexico
New York
North Carolina
North Dakota
Ohio
Oklahoma
Oregon
Pennsylvania
Puerto Rico
Rhode Island
South Carolina
South Dakota
Tennessee
Texas
U.S. Pacific Island Jurisdictions
Utah
Vermont
Virginia
Virgin Islands
Washington
Seattle-Puget Sound
West Virginia
Wisconsin**
Wyoming
First Diagnosis Year for
Which Cancer Cases
Were Reportable to
NPCR or SEER*
1995
1995/2000
1973
1996/2018
1995
1997
1996
1997
1996
1995
1998
1995
1996
2000
1999
1995/2022
2007
1973/2016
1996
1996
2016
1995
1974
1995
1995
1996
Federal Funding Source
NPCR
NPCR/SEER
SEER
NPCR/SEER
NPCR
NPCR
NPCR
NPCR
NPCR
NPCR
NPCR
NPCR
NPCR
NPCR
NPCR
NPCR/SEER
NPCR
SEER/NPCR
NPCR
NPCR
NPCR
NPCR
SEER
NPCR
NPCR/SEER
NPCR
* Diagnosis year is the year during which a reported cancer case was first diagnosed.
** Wisconsin receives research support from SEER but is not under contract to submit data.
CDC = Centers for Disease Control and Prevention
NCI = National Cancer Institute
NPCR = National Program of Cancer Registries
SEER = Surveillance, Epidemiology, and End Results Program
NPCR-CSS 2021 Data Release Policy
August 2021
1995–2020 Diagnosis Years
24
APPENDIX B – NPCR-CSS Overview of Data Security
The NPCR-CSS project data reside on a dedicated server maintained by the NPCR-CSS
contractor. To ensure the security and confidentiality of project data, the following provisions
have been incorporated into the NPCR-CSS Security Plan in accordance with the requirements
of the Assurance of Confidentiality.
The NPCR-CSS server is housed in a secure facility with a guard on duty 24 hours a day. Only
authorized staff is allowed to access the facility. Support people are escorted by an authorized
staff member if needed. The server resides on its own local area network (LAN) behind the
NPCR-CSS contractor’s firewall. NPCR-CSS contractor project staff access the server via VPN
from their primary office location. Elevator and stairwell access is controlled by card key 24
hours. During business hours, an attendant is always present at the reception desk to guide
visitors.
•
•
•
•
•
•
Access to the NPCR-CSS server is limited to authorized NPCR-CSS contractor project
staff. It is password-protected on its own security domain. No one, including NPCR-CSS
contractor non-project staff, is allowed access to the NPCR-CSS data.
All NPCR-CSS contractor project staff must sign a confidentiality agreement before
passwords and keys are assigned. All staff must pass background checks appropriate to
their responsibilities for a public trust position.
NPCR-CSS data that are submitted electronically are encrypted during transmission from
the States. They arrive on a document server behind the NPCR-CSS contractor’s firewall.
Each state has its own directory location so that no state has access to another state’s
data. The data are moved automatically from the document server to the NPCR-CSS
server.
Receipt and processing logs are maintained to document data receipt, file processing, and
report production. All reports and electronic storage media containing NPCR-CSS data
are stored under lock and key when not in use and will be destroyed once they are no
longer needed.
A comprehensive security plan has been developed by the NPCR-CSS contractor’s
security team. The security team consists of the Project Director, Project Manager,
Systems Lead and Security Officer, Database Administrator and LAN/WAN Security
Steward. All project staff receive annual security awareness training covering security
procedures. The NPCR-CSS contractor’s security team oversees operations to prevent
unauthorized disclosure of the NPCR-CSS data.
Periodic (currently quarterly, but no less than once per year) reviews and updates of the
NPCR-CSS contractor’s security processes will be conducted to adjust for rapid changes
in computer technology and to incorporate advances in security approaches. The Security
Plan will be amended as needed to maintain the continued security and confidentiality of
NPCR-CSS data.
NPCR-CSS 2019 Data Release Policy
September 2019
1995–2018 Diagnosis Years
25
APPENDIX C – Data Items for CBTRUS
The dataset for CBTRUS includes individual case-specific data from the NPCR-CSS dataset.
The data items to be included are listed below.
*Diagnosis Years 1995-2003 invasive cases only, ≥2004 invasive, benign, and borderline cases
Item Name
NAACCR Data Item Number
Patient ID (unique)
20
NAACCR Record Version
50
State of Residence at Diagnosis
80
County at Diagnosis-Analysis
89
Comments
Results presented as 5-year average
annual rates as the smallest time
period with <16 cell and
complementary cell suppression
required
Rural/Urban Continuum/Beale Code 2003 3310
Rural/Urban Continuum/Beale Code 2013 3312
NPCR Race Recode
Derived based on [160], [161], and [192] Same as race for USCS
NHIAv2 Derived Hispanic Origin
(Results of NAACCR Hispanic/Latino
Identification Algorithm)
191
NAPIIA
193
Sex
220
Age at Diagnosis
230
Sequence Number—Central
380
Date of Diagnosis (YEAR portion only)
390
Day and month of diagnosis not
requested
Date of Diagnosis (full date)
390
Full date
Primary Site
400
Laterality
410
Grade
440
Diagnostic Confirmation
490
Type of Reporting Source
500
Histologic Type (ICD-O-3)
522
Behavior (ICD-O-3)
523
NPCR-CSS 2021 Data Release Policy
August 2021
1995–2020 Diagnosis Years
26
Single year up to age 84; 85+
grouped into one category
APPENDIX C – Data Items for CBTRUS
Item Name
NAACCR Data Item Number
SEER Summary Stage 1977
760
SEER Summary Stage 2000
759
Derived Summary Stage 2000
3020
NPCR Cancer Stage
Comments
Based on 759 and 3020
RX Summ--Surgery Primary Site
1290
≥2003 diagnosis years
Reason for no surgery
1340
≥2001 diagnosis years
RX Summ—Radiation
1360
≥2003 diagnosis years
RX Summ--Chemo
1390
2006-2011, ≥2015 diagnosis years
RX Summ--BRM
1410
Prior to 2006, reported as available
Rad–Regional RX Modality
1570
≥2003 diagnosis years
Based on 1360 and 1570
1 = had radiation
2 = did not have radiation
3 = patient or guardian refused
radiation
4 = radiation recommended but
unknown if received
Merged Radiation
Applied only for selection below:
8000≤I522_HistTypeICDO3≤9049 |
9056≤I522_HistTypeICDO3≤9139 |
9141≤I522_HistTypeICDO3≤9589
EDITS overrides
1990–2074
CS Site-Specific Factor 1
2880
Date of Last Contact
1750
Vital Status
1760
Vital Status Recode
1762
Record Number Recode
1775
Surv-Date Active Followup
1782
Surv-Flag Active Followup
1783
Survival Months Active Followup
1784
Surv-Date Presumed Alive
1785
NPCR-CSS 2021 Data Release Policy
August 2021
1995–2020 Diagnosis Years
27
WHO Grade
Diagnosis years 2001-2017 for states
included in the NPCR RSA file.
Cause of Death items (1910, 1914,
1915) are not included when review
has determined that high quality
COD information is not available for
specific states.
APPENDIX C – Data Items for CBTRUS
Item Name
NAACCR Data Item Number
Surv-Flag Presumed Alive
1786
Survival Months Presumed Alive
1787
Surv-Date Dx Recode
1788
Follow-Up Source
1790
Follow-Up Source Central
1791
Cause of Death
1910
SEER Cause-Specific COD
1914
SEER Other COD
1915
ICD Revision Number
1920
NPCR-CSS 2021 Data Release Policy
August 2021
1995–2020 Diagnosis Years
28
Comments
APPENDIX D – NPCR/SEER USCS Analytic Data Use Agreement
U.S Cancer Statistics Analytic Data
Submitted [Month,Year] (diagnosis years 1998-xxxx)
To protect the confidentiality of the individuals represented within the National Program of Cancer
Registries – Cancer Surveillance System (NPCR-CSS) data, the Centers for Disease Control and
Prevention (CDC) has obtained an Assurance of Confidentiality under Section 308(d) of the Public
Health Service Act (42 U.S.C. 242m(d)), which provides that these data can only be used for the
purpose for which they were obtained.
When using NPCR and U.S. Cancer Statistics analytic data for research purposes, it is necessary
to ensure, to the extent possible, that use of the data will be limited to research or public health
purposes. In accordance with applicable federal law, there must be no attempt to determine the
identity of individuals represented by reported cases, or to use the information for any purpose
other than for health statistical reporting and analysis.
CDC’s Division of Cancer Prevention and Control (DCPC) takes every possible measure to
ensure that the identity of data subjects cannot be determined. All direct identifiers, as well as
characteristics that might lead to identification of individuals, are omitted from the dataset.
Certain demographic and clinical information has been included for research purposes; thus, all
results must be presented or published in a manner that ensures that no individual can be
identified. In addition, there must be no attempt to identify individuals from any computer file or
to link with a computer file containing patient identifiers.
Data users must agree to the following provisions before to receiving access to
U.S. Cancer Statistics Incidence, U.S. Cancer Statistics Delay Adjusted, NPCR Prevalence
and/or NPCR Survival Analytic Data. Please initial after each statement to indicate agreement.
As the recipient of the U.S. Cancer Statistics Incidence (diagnosis years {year}-{year}), U.S.
Cancer Statistics Delay Adjusted (diagnosis years {year}-{year}), NPCR Prevalence (diagnosis
years {year}-{year}), and/or NPCR Survival Analytic Data (diagnosis years {year}-{year}):
•
I will adhere to the requirements of the Data Use Agreement and understand that my
access to the data will be revoked if these requirements are violated. Initials: ______
•
I understand that NPCR data belong to the states and territories. The states’ and
territories’ agreement to use of the data are obtained through the activities outlined in the
general NPCR-CSS Data Release Policy and by specific requests to the states and
territories through the CSB management team.
Initials: ______
•
I will not use or permit others to use the datasets in any way other than for statistical
reporting and analysis. Initials: ______
I will not release or permit others to release the datasets or any part of them to any person
except with DCPC’s written approval. Initials: ______
I will not attempt to link or permit others to link the datasets with individually
identifiable records from any other dataset without DCPC’s approval. Initials: ______
•
•
NPCR-CSS 2021 Data Release Policy
August 2021
1995–2020 Diagnosis Years
29
APPENDIX D – NPCR/SEER USCS Analytic Data Use Agreement
•
•
•
•
I will not access nor permit others to access (directly or remotely) the data outside the
United States. Initials: ______
I will not attempt to use the datasets or permit others to use them to learn the identity of
any person or establishment included in any dataset. Initials: ______
I will protect the data file(s) I receive with a password and/or encryption. In addition, any
temporary or permanent analysis files, such as those produced with analytic software,
will be protected in the same manner(s). Initials: ______
I will take the following actions if the identity of any person or establishment is
discovered inadvertently:
o Make no use of this knowledge.
o Notify DCPC’s Internal Data Users Groups by emailing npcridug@cdc.gov.
o As requested by DCPC, safeguard or destroy the information that identifies an
individual or establishment
o Inform no one else of the discovered identity. Initials: ______
•
In addition, I will make every effort to release all statistical information in such a way
as to avoid inadvertent disclosure. In order to do this:
o I agree that all oral or written reports will contain only aggregate data and I
will not report counts of fewer than 6 cases or statistics generated from fewer
than 6 cases. Initials: ______
o I understand that calculating rates or other statistics based on small numbers
can raise statistical issues concerning stability and confidentiality. I will use
appropriate caution when presenting and interpreting results based on fewer
than 16 cases. Initials: ______
o I will use complementary cell suppression to ensure that no data on an
identifiable case can be derived through subtraction or other calculation from
the combination of tables in all oral and written presentations. Initials:
______
•
I have completed the Assurance of Confidentiality Overview Course available through
HHS Learning Portal and have emailed my certificate of completion to
npcridug@cdc.gov. Initials: ______
•
I have added my project to the NPCR Internal Analysis SharePoint table and, if
applicable, I will notify and obtain permission from the Internal Data Users Group to
analyze state- and county-level data. Initials: ______
•
I will acknowledge central cancer registries whenever data are presented, released, or
published by including the following (or similar) statement:
These data were provided by central cancer registries participating in the
National Program of Cancer Registries (NPCR) and submitted to CDC in
November {year}, and/or the Surveillance, Epidemiology and End Results (SEER)
program and submitted to NCI in November {year}. The U.S. Cancer Statistics
Incidence Analytic dataset includes diagnosis years {year}–{year} (excluding
NPCR-CSS 2021 Data Release Policy
August 2021
1995–2020 Diagnosis Years
30
APPENDIX D – NPCR/SEER USCS Analytic Data Use Agreement
SEER-Metro Registry data); U.S. Cancer Statistics Delay Adjusted Analytic
dataset includes diagnosis years {year}–{year} (excluding SEER-Metro Registry
data), NPCR Prevalence Analytic dataset includes diagnosis years {year}–{year}
and the NPCR Survival Analytic dataset includes diagnosis years {year}–{year}.
Initials: ______
•
As appropriate, I will cite the data:
National Program of Cancer Registries SEER*Stat Database: {Database file
name} – {year}-{year}. United States Department of Health and Human Services,
Centers for Disease Control and Prevention. Released {date}, based on the
November {year} submission.
Initials: ______
•
I understand that if I require technical assistance in analyzing or interpreting the data and
when such assistance goes beyond providing non-manipulated data, IDUG members
reserves the right to request to be considered as a research collaborator or co-author in
any resulting publications or presentations. Initials: ______
•
I will provide a courtesy copy of papers or abstracts to the NPCR Internal Data Users
Group at npcridug@cdc.gov as they are entered into Documentum for clearance.
Initials: ______
•
I am familiar with the use of SEER*Stat in analyzing data or will complete the needed
training. Initials: ______
If you are requesting access to a U.S. Cancer Statistics database, you must first set-up SEER
Research Plus access as the database includes SEER data.
After you have access to SEER Research Plus, complete the fields below, sign and date the
agreement, and email all pages to npcridug@cdc.gov.
The email address you provide must be the same one used during the SEER Research Plus
verification process.
My signature below indicates that I agree to comply with all the above stated provisions.
__________________________________________________________________
Signature
Date
Name:____________________________________________________________
Title______________________________________________________________
Branch____________________________________________________________
Telephone____________________
E-mail:_______________________
Please return all pages of the completed form to the NPCR Internal Data Users Group at
npcridug@cdc.gov.
NPCR-CSS 2021 Data Release Policy
August 2021
1995–2020 Diagnosis Years
31
APPENDIX E – CDC Non-Disclosure Agreement
NONDISCLOSURE AGREEMENT FOR DATA COVERED BY AN ASSURANCE OF
CONFIDENTIALITY
For use with CDC employees involved in activities with information covered by a Section 308(d)
Assurance of Confidentiality
The success of CDC's operations depends upon the voluntary cooperation of establishments,
including States, and of persons who provide information requested by CDC programs under an
assurance that such information will be kept confidential and be used only for epidemiological or
statistical purposes.
When confidentiality is authorized, CDC operates under the restrictions of Section 308(d) of the
Public Health Service Act (42 U.S.C. §242m(d)), which provides in summary that no
information obtained in the course of its activities may be used for any purpose other than the
purpose for which it was supplied, and that such information may not be published or released in
a manner in which the establishment or person supplying the information or described in it is
identifiable unless such establishment or person has consented. As a CDC employee granted
access to information covered by Section 308(d), I understand and acknowledge that I am bound
to comply with the restrictions provided to the information under Section 308(d).
I am aware that unauthorized disclosure of information covered by Section 308(d) of the Public
Health Service Act may subject me to disciplinary action.
''I am aware that unauthorized disclosure of confidential information is punishable under Title
18, Section 1905 of the U.S. Code, which reads, in relevant part:
'Whoever, being an officer or employee of the United States or of any department or agency
thereof…publishes, divulges, discloses, or makes known in any manner or to any extent not
authorized by law any information coming to him in the course of his employment or official
duties or by reason of any examination or investigation made by, or return, report or record made
to or filed with, such department or agency or officer or employee thereof, which information
concerns or relates to the trade secrets, processes, operations, style of work, or apparatus, or to
the identity, confidential statistical data, amount or source of any income, profits, losses, or
expenditures of any person, firm, partnership, corporation, or association; or permits any income
return or copy thereof or any book containing any abstract or particulars thereof to be seen or
examined by any person except as provided by law; shall be fined not more than $1,000, or
imprisoned not more than one year, or both; and shall be removed from office or employment.'
''I understand that unauthorized disclosure of confidential information is also punishable under
the Privacy Act of 1974, Subsection 552a (i) (1), which reads:
'Any officer or employee of any agency, who by virtue of his employment or official position,
has possession of, or access to, agency records which contain individually identifiable
information the disclosure of which is prohibited by this section or by rules or regulations
established thereunder, and who knowing that disclosure of the specific material is so prohibited,
NPCR-CSS 2021 Data Release Policy
August 2021
1995–2020 Diagnosis Years
32
APPENDIX E – CDC Non-Disclosure Agreement
willfully discloses the material in any manner to any person or agency not entitled to receive it,
shall be guilty of a misdemeanor and fined not more than $5,000.'
These provisions are consistent with and do not supersede, conflict with, or otherwise alter the
employee obligations, rights, or liabilities created by existing statute or Executive order relating
to (1) classified information, (2) communications to Congress, (3) the reporting to an Inspector
General of a violation of any law, rule, or regulation, or mismanagement, a gross waste of funds,
an abuse of authority, or a substantial and specific danger to public health or safety, or (4) any
other whistleblower protection. The definitions, requirements, obligations, rights, sanctions, and
liabilities created by controlling Executive orders and statutory provisions are incorporated into
this agreement and are controlling.
'My signature below indicates that I have read, understood, and agreed to comply with the
above statements.
Typed/Printed Name
Signature
Center/Institute/Office
NPCR-CSS 2021 Data Release Policy
August 2021
1995–2020 Diagnosis Years
33
Date
APPENDIX E – CDC Non-Disclosure Agreement
NON-EMPLOYEE 308(d) PLEDGE OF CONFIDENTIALITY
For use when Non-Employees are provided access to data covered by a 308(d) Assurance of
Confidentiality
I, as a non-CDC Employee (e.g., Guest Researcher, Visiting Fellow, Student, Trainee, employee
of a federal agency other than CDC, etc.) may be given access to information that is identifiable
or potentially identifiable to a person and that is covered by Section 308(d) of the Public Health
Service Act (42 U.S.C. §242m(d)), or an Assurance of Confidentiality. As a condition of this
access, I am required to comply with the following safeguards for the protection of this covered
data.
1. I agree to be bound by the following assurance:
In accordance with Section 308(d) of the Public Health Service Act (42 U.S.C.
§242m(d)), I agree that no information obtained in the course of the activity
described in the Assurance of Confidentiality will be used for any purpose other
than the purpose for which it was supplied, unless I am informed in writing that
such person has consented to its use for such other purposes. Further, I agree that
no information obtained in the course of the activity described in the Assurance of
Confidentiality will be disclosed in a manner in which the establishment or person
supplying the information or described in it is identifiable, unless I am informed
in writing that the establishment or person has consented to such disclosure, to
anyone other than authorized staff of CDC or staff covered under this 308(d)
Assurance.
2. I agree to maintain the following safeguards to assure that confidentiality is protected and to
provide for the physical security of the records:
To preclude observation of confidential information by persons not authorized to
have access to the information on the project, I shall maintain all records that I am
provided access to that identify establishments or persons covered by this
Assurance of Confidentiality or from which establishments or persons covered by
this Assurance of Confidentiality could be identified in locked containers or
protected computer files when not under immediate supervision by me or another
authorized member of the project. The keys or means of access to these containers
or files are not to be given to anyone other than those authorized to have access. I
further agree to abide by any additional requirements imposed by CDC for
safeguarding the identity of establishments or persons covered by this Assurance
of Confidentiality.
My signature below indicates that I have carefully read and understand this agreement and the
Assurance of Confidentiality, which pertains to the confidential nature of this project. As a(n)
_________________________ (e.g., visiting scientist, guest researcher, fellow, trainee,
employee of a federal agency other than CDC, etc.), I understand that I am prohibited from
disclosing any such confidential information that has been obtained under this project to anyone
NPCR-CSS 2021 Data Release Policy
August 2021
1995–2020 Diagnosis Years
34
APPENDIX E – CDC Non-Disclosure Agreement
other than authorized staff of CDC or persons covered under this Section308(d) Assurance of
Confidentiality. I understand that any disclosure in violation of this Confidentiality Pledge may
lead to termination of my employment, fellowship, training experience, or scientific
collaboration, as well as other penalties.
_______________________________________
Printed Name
_______________________________________
Signature
_______________________________________
Date
NPCR-CSS 2021 Data Release Policy
August 2021
1995–2020 Diagnosis Years
35
APPENDIX E – CDC Non-Disclosure Agreement
Agreement of CDC Contractors for Safeguards Against Invasions of Privacy for Certain
Establishments or Persons Covered by an Assurance of Confidentiality
For use where Contractors/Subcontracts have access to information covered by a 308(d)
Assurance of Confidentiality
Access to data covered by an Assurance of Confidentially, titled Assurance of Confidentiality for
the National Program of Cancer Registries Cancer Surveillance System, (“Assurance”) as
provided by Section 308(d) of the Public Health Service Act (42 U.S.C. §242m(d)), is necessary
for certain projects funded through contract task order number ______________________.
Consistent with Section 308(d), the contractor is required to give an assurance of confidentiality
and to provide for safeguards to assure that confidentiality of the data covered by the Assurance
is maintained.
To provide this assurance and these safeguards in performance of the contract, the contractor
shall
1. Be bound by the following assurances:
a. No information that is identifiable or potentially identifiable to an establishment or
person covered by the Assurance and obtained in the course of this activity may be used
for any purpose other than the purpose for which it was supplied, unless CDC informs
contractor in writing that such establishment or person has consented to its use for such
other purposes.
b. No information that is identifiable or potentially identifiable to an establishment or
person covered by the Assurance and obtained in the course of this activity may be
disclosed to anyone other than authorized staff of CDC or others noted in the Assurance,
unless CDC informs contractor in writing that such establishment or person has
consented to its disclosure to such other persons.
c. No preliminary data from studies or projects that identifies or potentially identifies an
establishment or person covered by the Assurance may be disclosed to anyone other than
authorized staff of CDC or others noted in the Assurance of Confidentiality statement,
unless this information is otherwise in the public domain or CDC has provided written
permission for use of this information to be made public. For example, if CDC clears an
abstract for a scientific presentation, this constitutes permission for public presentation.
d. New research study ideas that are not already funded through the above-referenced
contract task order may be discussed or presented during calls/meetings as part of normal
communications and coordination between CDC and the contractor; should these ideas
lead to further activities with information covered by this Assurance, these protections
will extend to those activities only if agreed to in writing by CDC.
2. Maintain the following safeguards to assure that the confidentiality provided by Section
308(d) and the Assurance is protected by the contractor and to provide for the physical
security of the records:
NPCR-CSS 2021 Data Release Policy
August 2021
1995–2020 Diagnosis Years
36
APPENDIX E – CDC Non-Disclosure Agreement
a. After having read the above Assurance, each employee of the contractor participating in
this project is to sign the following pledge of confidentiality:
I have carefully read and understand the CDC assurance, which pertains to the
confidential nature of identifiable or potentially identifiable data covered by the
Assurance of Confidentiality to be handled in regard to these studies and reviewed as part
of activities under task order _____________________. As an employee of the
contractor, I understand that I am prohibited by law from disclosing any such confidential
information that identifies or potentially identifies an establishment or person covered by
the Assurance of Confidentiality, which has been obtained under the terms of this
contract, to anyone other than authorized staff of CDC and that I may use this
information only for the purposes for which it was obtained and consistent with the task
order.
b. To preclude observation of confidential information that identifies or potentially
identifies an establishment or person covered by the Assurance by persons not employed
on the project, the contractor shall maintain all confidential records that identify
establishments or persons or from which establishments or persons could be identified
under lock and key.
Specifically, at each site where these items are processed or maintained, all confidential
records that will permit identification of establishments or persons are to be kept in
locked containers when not in use by the contractor’s employees. The keys or means of
access to these containers are to be held by a limited number of the contractor staff at
each site. When confidential records that will permit identification of establishments or
persons are being used in a room, admittance to the room is to be restricted to employees
pledged to confidentiality and employed on this project. If at any time the contractor’s
employees are absent from the room, it is to be locked.
c. The contractor and his professional staff will take steps to insure that the intent of the
pledge of confidentiality is enforced at all times through appropriate qualifications
standards for all personnel working on this project and through adequate training and
periodic follow-up procedures.
3. Flow down all requirements set forth in this Agreement to all subcontracts and all
subcontract employees.
__________________________________
(Typed/printed Name)
__________________________________
(Signature)
_________________________________
(Date)
NPCR-CSS 2021 Data Release Policy
August 2021
1995–2020 Diagnosis Years
37
APPENDIX E – CDC Non-Disclosure Agreement
CONFIDENTIALITY
AGREEMENT
for
Access to Information Technology Resources at the
Centers for Disease Control and Prevention
and
Limitation on Disclosure of Sensitive Information Under
Contract No. ______________, Task Order __________
As an employee or subcontractor of ______________________, THE PARTICIPANT requires
a wide range of access to confidential information and Federal information technology (IT)
resources and information maintained by the Centers for Disease Control and Prevention,
(CDC), an agency of the U.S. Department of Health and Human Services.
In consideration for the following mutual covenants, the parties agree as follows:
1. Within the context of CDC Contract No. _________________, Task Order _________,
and in accordance with the terms of this agreement, CDC grants limited access to the
following:
a. The Federal information technology (IT) resources generally described in Table 1.
b. Datasets and/or public use data tapes derived from information collected under an
Assurance of Confidentiality authorized by Section 308(d) of the Public Health
Service Act, also listed in Table 1.
2. THE PARTICIPANT acknowledges that within the CDC environment, a variety of
restricted access information is held, the vast bulk of which is categorized as “Sensitive
but Unclassified”, and that in the performance of CDC Contract No.
_________________, Task Order _________, the participant may require access to such
limited access information. Categories of limited access information include the
following:
Health & health-related data on individuals, groups, entities, some of which identify
individuals
Federal Privacy Act “systems of records”
Information exempted from release under Freedom of Information Act
Proprietary data
National Defense-related information
Information subject to contractual restrictions on access
Information covered by a Certificate or Assurance of Confidentiality [P.H.S. Act,
Sects. 301(d) & 308(d)]
NPCR-CSS 2021 Data Release Policy
August 2021
1995–2020 Diagnosis Years
38
APPENDIX E – CDC Non-Disclosure Agreement
Data collected under other specific legislative mandates (i.e. tobacco, transfer of
biological, etc.)
Data identified as pre-release, internal working papers, etc., of federal agency
Therefore, THE PARTICIPANT further agrees to not attempt to identify any person
contained in contract data and to make no use of the identity of any person or
establishment discovered inadvertently and advise CDC of any such discovery.
3. THE PARTICIPANT acknowledges the sensitive and confidential nature of the
information covered by this agreement and agrees to employ all reasonable efforts to
maintain such information secret and confidential, such efforts to be no less than the
degree of care employed by ICF Incorporated to preserve and safeguard ICF
Incorporated’s own information.
4. THE PARTICIPANT agrees to utilize any information accessed through the
performance of CDC Contract No. _________________, Task Order _________ solely
for the purpose of performing that Contract;
5. THE PARTICIPANT has read and agrees to be bound by CDC policies and standards
regarding confidentiality and use of Federal IT resources. Further, THE PARTICIPANT
agrees to attend one hour of training by CDC on information security and the use of IT
resources at CDC.
6. THE PARTICIPANT agrees to refrain from any of the following prohibited uses:
a. Disclosing, revealing, or giving to anyone information accessed under CDC
Contract No. _________________, Task Order _________ except to employees
of ICF Incorporated who have a need for the information and who are bound to it
by like obligation as to confidentiality, without the express written permission of
CDC.
b. Attempting to override or avoid security and integrity procedures and devices
established by CDC, or its components, to control access to federal IT resources.
c.
Attempting to override or avoid security and integrity procedures and devices
established by outside organizations to control access to their information systems
and IT resources.
d. Using hardware and/or software or downloading software within the scope of the
project that is not specifically authorized in writing by the Project Officer.
e. Violating copyrights or software licensing agreements.
f.
Using CDC’s name or logos to misrepresent, as falling under CDC auspices,
personal materials, or materials one produces on behalf of an approved group.
NPCR-CSS 2021 Data Release Policy
August 2021
1995–2020 Diagnosis Years
39
APPENDIX E – CDC Non-Disclosure Agreement
7. Upon expiration of this Agreement or CDC Contract No. _________________, Task
Order _________, THE PARTICIPANT agrees to destroy or return to CDC any
information accessed through the performance of contract that falls under one or more of
the categories listed under paragraph 2 above and that was copied, printed, or otherwise
duplicated.
8. CDC has the capability and the authority to audit its federal IT resources, and under
appropriate circumstances, monitor their use.
9. CDC may terminate this access with or without cause at any time without advance
notice.
10. THE PARTICIPANT’S authorized access automatically expires at the end of the
contract period, or sooner if so indicated in the space at the top of Table 1. A written
renewal request must be submitted two months prior to the termination, with appropriate
justification for each access to be continued. A new Agreement for Access and Limitation
on Disclosure is required for each renewal.
11. The construction, interpretation, and performance of this Agreement shall be governed by
U.S. Federal law.
Violations of this agreement or misuse of CDC’s federal IT resources may subject THE
PARTICIPANT to criminal penalties in accordance with Federal law (attached). In
addition, THE PARTICIPANT understands that other Federal laws and regulations
govern CDC’s maintenance and operation of these Federal IT resources and may apply to
THE PARTICIPANT.
NPCR-CSS 2021 Data Release Policy
August 2021
1995–2020 Diagnosis Years
40
APPENDIX E – CDC Non-Disclosure Agreement
12. I have read, understood, and agree to comply with the above statements.
Print Name: Last, First, MI (Person Requesting Access)
Print Name: Last, First, MI (Contractor’s Official Witness)
Current Position
____________________________________________________
Position
Signature
Signature
Date: (mm/dd/yyyy)
Date: (mm/dd/yyyy)
CDC Point of Contact (Technical Monitor or Project Officer):
______________________________________________________ Print Name: Last, First, MI
______________________________________________________ Position
______________________________________________________ Signature
______________________________________________________ Date: (mm/dd/yyyy)
Copies of the following CDC Policy statements are to be provided to each person requesting
access.
NPCR-CSS 2021 Data Release Policy
August 2021
1995–2020 Diagnosis Years
41
APPENDIX E – CDC Non-Disclosure Agreement
Laws, Policies and Procedures Governing Use of Electronic Mail, Intranet, Internet and Other
Information Technology (IT) ADP Security Policy (Manual Guide-Information Resources
Management, No. CDC-3, 3/15/89) 18 U.S.C. Sections 641 and 1030.
Table 1
Federal Information Resources
Authorized
Federal IT Resource Name or
Description
Location
Authorizing Official(s)
Main point of entry to CDC IT resources:
Information Resources Management Office
None authorized
Other LAN account(s)
None authorized
CDC mainframe account
None authorized
CDC e-mail account
None authorized
Internet access
None authorized
CDC Intranet access
None authorized
Cancer incidence data from awardees funded
by CDC’s Program Announcement DP171701 for a cooperative agreement under for
the National Program of Cancer Registries
CDC Contracting Officer’s
Representative
CDC Project Officer
Mortality data from the National Center for
Health Statistics (NCHS)1
NCHS
Population data from the U.S. Census Bureau
Data publicly available on
Internet
1
By signing this agreement, THE PARTICIANT agrees to abide by the conditions
stipulated by NCHS in the NCHS Data Use Agreement.
Access to a specific resource does not imply access to any other resource.
NPCR-CSS 2021 Data Release Policy
August 2021
1995–2020 Diagnosis Years
42
APPENDIX E – CDC Non-Disclosure Agreement
Appendix 1
Access to additional resources may be granted upon written request, as described below.
A written request shall be provided to ______________________________, who will forward
the request with a statement of support of the justification provided, to
_______________________, the CDC Contracting Officer’s Representative (COR) for Contract
No. _________________, Task Order _________ in the __________________ Branch, Center
for Disease Control and Prevention (CDC).
If the requested access involves a physically separate or limited access device or dataset, the
appropriate steward of that device or dataset shall be provided with a copy of the request for
review and authorization.
Upon acceptance of the request by all appropriate parties, an amendment to the Agreement for
Access and Limitation on Disclosure will be executed, and a copy of any appropriate limitations
on access and use will be provided. When this has been done, access will be provided.
If effective access not contained in table 1 is recognized, or if another relationship is established
with a CDC organization that may lead to additional access to federal IT resources at CDC,
written notice of such shall be provided to ____________________________, and
_____________________, the CDC COR for Contract No. _________________, Task Order
_________ _______________ Branch, Center for Disease Control and Prevention (CDC).
NPCR-CSS 2021 Data Release Policy
August 2021
1995–2020 Diagnosis Years
43
APPENDIX F – Data Items for NPCR/SEER USCS Incidence Analytic Dataset
SEER*Stat Category
SEER*Stat Variable Name
Age at Diagnosis
Age recode with <1 year olds
Race, Sex, Year Dx, Registry,
County
Sex
Site and Morphology
Year of diagnosis
Addr at DX – state
*County at DX Analysis
*State-county
USCS standard
USCS9819
USCS9919
USCS1019
USCS1519
Race recode for USCS
Program
*Econ status
*Region/Division
Region
Origin recode NHIA (Hispanic, Non-Hisp)
Restrictions
KS and MN data unavailable
KS and MN data unavailable
Primary Site – labeled
*Primary Site
Histologic Type ICD-O-3
*Behavior Code ICD-O-3
Grade
Grade clinical
Grade pathological
Grade post therapy
Diagnostic confirmation
ICD-O-3 Hist/behavior, labeled
*ICD-O-3 Hist/behavior, malig, labeled
Site recode ICD-O-3/WHO 2008
ICCC site recode ICD-O-3/WHO 2008
ICCC site rec extended ICD-O-3/WHO 2008
AYA site recode 2020
Lymphoid neoplasm recode 2021 revision
Behavior recode for analysis derived/WHO2008
Stage – LRD [Summary
and Historic]
*Derived SS2000
*SEER Summary Stage 2000
*SEER Summary Stage 1977
*SEER Summary Stage 2018
Merged Summary Stage
Therapy
*RX summ – surg prim site
NPCR-CSS 2021 Data Release Policy
August 2021
1995–2020 Diagnosis Years
44
Diagnosis years ≥2003
APPENDIX F – Data Items for NPCR/SEER USCS Analytic Dataset
Extent of Disease – CS
*RX summ – chemo
Female and male breast only,
diagnosis years ≥2003, and
NPCR CCRs† only
Phase I Radiation Treatment Modality
Female & male breast and
colorectal only, diagnosis
years ≥2018
*Merged radiation
Female & male breast and
colorectal only, diagnosis
years ≥2003, and NPCR
CCRs only
*CS site-specific factor 1
Brain & other CNS and
diagnosis years 2011-2017
-
Merged estrogen receptor
Female and male breast only
and diagnosis years ≥2004
Merged progesterone receptor
Female and male breast only
and diagnosis years ≥2004
Merged HER2 receptor
Female and male breast only
and diagnosis years ≥2010
Laterality
Multiple Primary Fields
Sequence number - central
Race and Age (case data only)
Age at Diagnosis
Race 1
*IHS Link
Geographic Locations
Ruralurban continuum 2013
*Census Tract Poverty Indicator
Dates
Diagnosis years ≥ 2014,
NPCR CCRs only
Year of Birth
Month of diagnosis
Other
Type of Reporting Source
Merged System-Supplied
Alcohol-related cancers
HPV-related cancers
Obesity-related cancers
Physical inactivity-related cancers
Tobacco-related cancers
State race eth suppress
* Variable is only available in the internal incidence database; it is not available in the NPCR/SEER U.S. Cancer Statistics Public
Use Database
NPCR-CSS 2021 Data Release Policy
August 2021
1995–2020 Diagnosis Years
45
APPENDIX G – Data Items for NPCR Internal Survival Dataset
SEER*Stat Category
SEER*Stat Variable Name
Age at Diagnosis
Age recode with single ages and 85+
Race, Sex, Year Dx, Registry,
County
Sex
Restrictions
Year of diagnosis
Addr at DX – state
County at DX Analysis
State-county
Rural-urban continuum 2013
NPCR project flag
Economic status
Race and origin recode (NHW, NHB,
NHAIAN, NHAPI, Hispanic)
Race recode (White, Black, Other)
Site and Morphology
Primary Site – labeled
Histologic Type ICD-O-3
Behavior Code ICD-O-3
Grade
Diagnostic confirmation
ICD-O-3-Hist/behavior, labeled
ICD-O-3-Hist/behavior, malig, labeled
Site recode ICD-O-3/WHO 2008
ICCC site recode ICD-O-3/WHO 2008
Behavior recode for analysis
derived/WHO2008
Stage – LRD [Summary
and Historic]
Derived SS2000
SEER Summary Stage 2000
Merged Summary Stage 2000
Therapy
Extent of Disease – CS
*RX summ – surg prim site
Diagnosis years ≥2003
*Merged radiation
Female breast and colorectal
only, diagnosis years ≥2003, and
NPCR CCRs only
CS Site-Specific Factor 1
Brain/CNS and diagnosis
years 2011-2017
Merged estrogen receptor
Female and male breast only
and diagnosis years ≥2004
Merged progesterone receptor
Female and male breast only
and diagnosis years ≥2004
Merged HER2 receptor
Female and male breast only
and diagnosis years ≥2010
Laterality
Cause of Death (COD) and
Follow-up
Survival months – presumed alive
Survival months flag – presumed alive
NPCR-CSS 2021 Data Release Policy
Auugst 2021
1995–2020 Diagnosis Years
46
APPENDIX G – Data Items for NPCR Internal Survival Dataset
SEER*Stat Category
SEER*Stat Variable Name
Cause of death (ICD-10)
ICD revision number
Vital status
Follow-up source central
COD exclusion flag
Original vital status
Vital status recode (study cutoff used)
Cause of death recode
COD recode with Kaposi and mesothelioma
Multiple Primary Fields
Sequence number - central
Race and Age (case data only)
Age at Diagnosis
Race 1
NHIA derived Hispanic origin
Age recode with <1 year olds
Dates
Presumed alive year of last contact recode
Presumed alive month of last contact recode
Presumed alive day of last contact recode
Year of birth
Month of diagnosis
Day of diagnosis
Original day of last contact
Original month of last contact
Original year of last contact
Original year of diagnosis
Original day of diagnosis
Original month of diagnosis
Other
Type of Reporting Source
User-Specified
EDPMDE LinkVar
NPCR-CSS 2021 Data Release Policy
August 2021
1995–2020 Diagnosis Years
47
Restrictions
APPENDIX H – NPCR-CSS 308(d) Assurance of Confidentiality Statement
A public health surveillance system of population-based cancer incidence data received from
cooperative agreement holders for the National Program of Cancer Registries is being conducted
by the National Center for Chronic Disease Prevention and Health Promotion (NCCDPHP) of
the Centers for Disease Control and Prevention (CDC), an agency of the U.S. Department of
Health and Human Services, and ICF Incorporated, a contractor of CDC. The information to be
received by CDC is a subset of a standard set of data items that the state central cancer registry
routinely receives from hospitals, pathology labs, clinics, private physicians, and other mandated
reporters on all cancer cases diagnosed in the state. This information includes patient
demographics and cancer diagnosis, treatment, and outcome data.
Each year, CDC requests cumulative data from central cancer registries. The variables reported
to CDC may vary from year to year. The data submitted to CDC do not contain any direct
identifiers, such as name or Social Security Number. Though project data do not contain direct
identifiers, CCRS do report indirect identifiers such as patient demographic data items (e.g., a
unique identifier, birth date, sex, race, ethnicity, birth place, county of residence, census tract,
zipcode) and information about the type of cancer (e.g., date of diagnosis, stage at diagnosis,
treatment). The cancer registries maintain these data permanently in longitudinal databases that
are used for public health surveillance, program planning and evaluation, and data analyses.
CDC updates its longitudinal database each year with data received from central cancer
registries. NCCDPHP, recognizing the sensitivity of the data being furnished by the states, has
applied for and obtained an Assurance of Confidentiality to provide a greater level of protection
for the data while at CDC and at the contractor site.
Individual record-level data received by CDC or its contractors as part of this public health
surveillance system that could lead to direct or indirect identification of cancer patients is
collected and maintained at CDC under Section 306 of the Public Health Service (PHS) Act (42
U.S.C. 242k) with an assurance that it will be held in strict confidence in accordance with
Section 308(d) of the PHS Act (42 U.S.C. 242m). It is used only for purposes stated in this
assurance and are not otherwise disclosed or released, even following the death of cancer
patients in this surveillance system. These data are used by CDC scientists for routine cancer
surveillance, program planning and evaluation, and to provide data for cancer-related research
questions that support the purpose of this public health surveillance program, e.g., monitoring the
frequency and distribution of disease, evaluating cancer prevention and control activities,
program planning and evaluation.
Researchers within CDC, including contract employees and qualified organizations, will be able
to access individual, record-level data (i.e., data that do not directly identify individuals but that
could lead to identification when combined with other information) for legitimate cancer-related
research questions and reporting purposes through the full NPCR CSS analytic dataset, a less
restricted dataset with information not included in the restricted-access datasets but one that does
not contain all data submitted by the CCRs. A separate complete dataset (i.e., all information
submitted by the CCRs) is available for data quality assessments only. “Qualified organizations”
are defined as organizations with staff qualified to undertake the proposed analyses by means of
specific academic training or demonstrable, related experience in cancer epidemiologic, medical,
NPCR-CSS 2021 Data Release Policy
August 2021
1995–2020 Diagnosis Years
48
APPENDIX H – NPCR-CSS 308(d) Assurance of Confidentiality Statement
biomedical, or statistical research and the organization is identified in the NPCR CSS Data Release
Policy. These individuals and organizations will be required to adhere to a strict security and
confidentiality protocol.
Restricted access will be provided to researchers outside of the CDC, its contractors, or qualified
organizations through the National Center for Health Statistics Research Data Centers (NCHS
RDC) or, in limited instances, an aggregated data file for federal and trusted partners. A restrictedaccess dataset is defined as the version of the full NPCR CSS analytic dataset, either aggregated
data or individual, record-level data that have been modified as needed to minimize the potential
for disclosure of confidential information. For restricted-access datasets, some variables such as
county at diagnosis will only be released in a modified format. The unique identifying number
assigned to each individual by the central cancer registry is replaced by a random number assigned
at CDC to reduce the possibility of linkage to other state- or territory-level files with indirect
identifiers. This restricted access will be controlled in such a way as to limit the researchers’
ability to publish or otherwise provide others access to data that could lead to identification of an
individual (i.e., small numbers of cases, unique cancer types in a small geographic area, or
aggregated in a way that a case could be identified).
Information collected by CDC is used without personal identifiers for publication in statistical
and analytic summaries and for release in restricted release datasets for research. Information
that could lead to direct or indirect identification of cancer patients is not made available to any
group or individual that have not met the qualifications established by CDC and are not
described in the NPCR CSS Data Release Policy. In particular, such information is not disclosed
to insurance companies, any party involved in civil, criminal, or administrative litigation,
agencies of federal, state, or local government, or any other member of the public.
Collected information that could lead to direct or indirect identification of cancer patients is kept
confidential and—with the exception of CDC employees, their contractors, and qualified
researchers—no one is allowed to see or have access to the information. CDC employees and
contractors are required to handle the information in accordance with principles outlined in the
CDC Staff Manual on Confidentiality and to follow the specific procedures documented in the
Confidentiality Security Statement for this project. Qualified researchers are required to sign the
NCHS RDC data sharing agreements and abide by the NCHS RDC confidentiality procedures.
Access to data released through public-use datasets requires the user to complete and return a
signed data use agreement acknowledging confidentiality requirements. Qualified organizations
(e.g., the North American Association of Central Cancer Registries, American Cancer Society,
National Cancer Institute, and the Central Brain Tumor Registry of the United States) are
required to sign a detailed data release agreement to have access to restricted release data.
NPCR-CSS 2021 Data Release Policy
August 2021
1995–2020 Diagnosis Years
49
APPENDIX I – NPCR-CSS 308(d) Assurance of Confidentiality FAQ
Background
The Centers for Disease Control and Prevention (CDC) is responsible for public health
surveillance in the United States. CDC collects, compiles, and publishes a large volume of
personal, medical, epidemiologic, and statistical data. The success of CDC’s operations depends,
in part, on the agency’s ability to protect the confidentiality of these data. While it is a matter of
principle for CDC to guard sensitive information and federal statutes such as the Privacy Act of
1974 provide a degree of protection for personally identifiable data, Section 308(d) of the Public
Health Service Act (42 U.S.C. 242m(d)) enables CDC to provide the highest level of
confidentiality protection for sensitive and mission-significant research and surveillance data.
CDC received a formal delegation of authority from the National Center for Health Statistics
(NCHS) (formally a separate agency) to grant 308(d) confidentiality protection in 1983. Section
308(d) of the Public Health Service Act ensures the confidentiality of data collected under
Sections 304 and 306 of the Public Health Service Act. These special legislative authorities were
the provisions under which NCHS collects and safeguards most of its survey data, along with the
mortality data within the National Death Index. CDC was required to establish a stringent
application process and continues to use the authority sparingly. The agency has granted
confidentiality assurances to projects deemed significant to CDC’s mission, such as surveillance
of hospital infections, AIDS and HIV infections, pregnancy-related mortality, and congenital
defects. Fewer than 65 projects have received 308(d) protection since CDC received this
authority, and currently there are approximately 20-5 active projects with 308(d) confidentiality
assurances. As a testament to the importance of this project to the mission of CDC, the National
Program of Cancer Registries (NPCR) has been afforded this special data protection.
What is stated in Public Health Service Act, Section 308(d)?
The first clause of Section 308(d) states that CDC must explain the purpose for collecting data to
persons or agencies supplying information, and it guarantees that CDC will be limited to those
specified uses unless an additional consent is obtained. Moreover, the information obtained may
be used only by CDC staff or CDC’s contractors in the pursuit of such stated purposes. The
second clause states that CDC may never release identifiable information without the advance,
explicit approval of the person or establishment supplying the information or by the person or
establishment described in the information.
What process did NPCR undertake to obtain 308(d) confidentiality protection?
NPCR staff worked with the CDC Office of General Counsel and the CDC Confidentiality and
Privacy Officer to prepare the application for the NPCR Cancer Surveillance System (CSS)
project. The application contained the following four components:
•
A Justification Statement summarizing the NPCR-CSS project’s programmatic purpose,
the type of data to be collected, and the uses to be made of the information. This
statement also included an assurance that a) the requested data would not be furnished
without the guarantee of a confidentiality assurance, b) confidentiality assurance is
important to protect the individuals described in the data and to reassure the institutions
NPCR-CSS 2021 Data Release Policy
August 2021
1995–2020 Diagnosis Years
50
APPENDIX I – NPCR-CSS 308(d) Assurance of Confidentiality FAQ
•
•
•
submitting data, c) the information cannot reliably be obtained from other sources, d) the
information is essential to the project’s success, e) granting the confidentiality assurance
would not prohibit CDC from fulfilling its responsibilities, and f) the advantages of
assuring confidentiality outweigh the disadvantages.
An Assurance of Confidentiality Statement delineating anticipated data uses and those
with whom identifiable data would be shared, along with general advisements regarding
the confidentiality protection.
A Confidentiality Security Statement detailing the stringent safeguarding measures in
place to ensure that the promise of confidentiality would not be jeopardized by practices
of staff handling the data.
An Institutional Review Board (IRB) Review Status Statement verifying NPCR-CSS’s
exemption from CDC IRB approval. (The Human Subjects Administrator at the National
Center for Chronic Disease Prevention and Health Promotion determined that NPCRCSS activities are routine surveillance and not research on human subjects. Therefore,
protocol review by CDC IRB was deemed unnecessary.)
The application was submitted to the CDC Confidentiality Officer for review and modification,
prepared for presentation to the CDC Confidentiality Review Group (CRG), and in May 2000
NPCR received 308(d) confidentiality protection approval for NPCR-CSS data, including
authorization for retroactive confidentiality protection beginning with diagnosis year 1995.
NPCR must file for continuation every 5 years to maintain the assurance.
What makes 308(d) confidentiality assurance the best protection for NPCR-CSS data?
The 308(d) confidentiality assurance is the only confidentiality protection that covers routine
surveillance activities, such as those conducted by NPCR-CSS. The assurance specifies that data
protected by 308(d) may be used only for statistical or epidemiological purposes and not released
further in identifiable form without consent. Another exclusive advantage of 308(d) is that it also
protects indirectly identifiable data. Operationally, this means that NPCR may never release a
directly identifiable variable (e.g., Social Security number) or any combination of variables that
could be used to indirectly identify an individual. Finally, 308(d) provides protection for
information on both living and deceased individuals.
Are there any disadvantages to individuals or institutions protected by the 308(d)
confidentiality assurances?
A 308(d) confidentiality assurance does not pose a disadvantage for individuals or institutions
submitting data to CDC. In fact, 308(d) provides an added benefit because it prevents CDC from
freely releasing data to researchers and any other persons or entities that could request access to
the data. With the confidentiality assurance protecting NPCR-CSS data, NPCR staff members
are prohibited from sharing data except for the purposes stated at the time of data collection,
unless consent from those who provided the assurance is obtained.
Does NPCR’s 308(d) confidentiality assurance protect the data from subpoena and
Freedom of Information Act (FOIA) requests?
NPCR-CSS 2021 Data Release Policy
August 2021
1995–2020 Diagnosis Years
51
APPENDIX I – NPCR-CSS 308(d) Assurance of Confidentiality FAQ
The 308(d) assurance is the strongest protection against compulsory legal disclosure that CDC
can offer. Although CDC receives FOIA requests, the FOIA (b)(6) exemption enables CDC to
withhold sensitive, individually identified data that would constitute a “clearly unwarranted
invasion of personal privacy.” It is CDC’s firm position that all projects covered by a 308(d)
confidentiality assurance, including NPCR-CSS, meet this exemption.
Has a case involving 308(d) been tested in court?
Yes. CDC’s ability to protect data submitted to the agency was upheld in court. The case
involved a National Institute for Occupational Safety and Health project collecting death
certificate information, which is widely accepted as the least sensitive data protected by 308(d).
The court’s ruling in favor of the non-release of these data establishes an effective precedent for
restricting access to more sensitive data, such as that collected by a cancer registry.
How long are confidential data submitted to NPCR-CSS protected?
NPCR-CSS data are covered by the 308(d) confidentiality assurance forever. Individual records
in the NPCR-CSS surveillance system are protected even following the death of the cancer
patients.
Will NPCR release CSS data to persons or agencies outside of CDC?
An assurance of confidentiality protects NPCR-CSS data held at CDC and by its contractor. The
308(d) confidentiality protection does not go with the data whether released publicly or through
restricted means, and any data released to qualified researchers by CDC are subject to the limits
of any coverage afforded by the requesting agency. However, it is important to note that NPCR’s
confidentiality assurance prohibits the release of any data that are directly or indirectly
identifiable. Therefore, CDC would not release highly sensitive NPCR-CSS data. Restricted
access data that are released to external researchers are done so in accordance with the NCHS
RDC proposal process and confidentiality procedures, prohibiting attempts to identify subjects
within the record system. Under the 308(d), NPCR is permitted to release NPCR-CSS data to
qualified researchers and organizations, such as the North American Association of Central
Cancer Registries (NAACCR), American Cancer Society (ACS), and National Cancer Institute
(NCI). This is so because these entities were specifically mentioned in the NPCR-CSS
confidentiality assurance as anticipated recipients of identifiable data. Prior to the restricted
release of NPCR-CSS data to qualified organizations, a detailed data use agreement must be
signed by the requesting party (attachment I). Information that could lead to the identification of
cancer patients, through direct or indirect methods, cannot be made available to any other group
or individual. In particular, NPCR cannot disclose information to insurance companies; any party
involved in civil, criminal, or administrative litigation; agencies of federal, state, or local
government; or any other member of the public.
Are there penalties for violating the confidentiality assurance?
NPCR employees and NPCR-CSS contractor staff working on the NPCR-CSS project may be
subject to fine, imprisonment, and termination of employment for unauthorized disclosure of
confidential information. To assure that all NPCR employees are aware of their responsibilities
to maintain and protect NPCR-CSS records and the penalties for failing to comply, CDC
employees must read and sign a data use agreement. Contract employees with access to NPCRCSS data are required to sign a confidentiality agreement.
NPCR-CSS 2021 Data Release Policy
August 2021
1995–2020 Diagnosis Years
52
APPENDIX J – Data Items for NPCR/SEER USCS Incidence Public Use Research Dataset
The research use NPCR/SEER USCS Incidence Public Use dataset contains individual casespecific data from the USCS dataset with enforced <16 cell suppression and case listing disabled.
SEER*Stat Category
SEER*Stat Variable Name
Age at Diagnosis
Age recode with <1 year olds
Race, Sex, Year Dx,
Registry
Sex
Restrictions
Year of diagnosis
Addr at DX – state
USCS standard
Race recode for USCS
Program
Region
USCS0119
USCS1019
USCS1519
Origin recode NHIA (Hispanic, Non-Hisp)
Site and Morphology
Stage – LRD [Summary
and Historic]
Therapy
Extent of Disease – CS
Primary site – labeled
Histologic type ICD-O-3
Grade
Grade clinical
Grade pathological
Diagnostic confirmation
ICD-O-3 hist/behavior, labeled
Site recode ICD-O-3/WHO 2008
ICCC site recode ICD-O-3/WHO 2008
ICCC site rec extended ICD-O-3/WHO 2008
AYA site recode 2020
Lymphoid neoplasm recode 2021 revision
Behavior ICD-O-3
Merged summary stage 2000
RX summ – surg prim site
Female breast only and
diagnosis years ≥2003
CS site-specific factor 1
Brain & other CNS and
diagnosis years 2011-2017
Merged estrogen receptor
Female and male breast only
and diagnosis years ≥2004
Merged progesterone receptor
Female and male breast only
and diagnosis years ≥2004
Merged HER2 receptor
Female and male breast only
and diagnosis years ≥2010
NPCR-CSS 2021 Data Release Policy
August 2021
1995–2020 Diagnosis Years
53
APPENDIX J – Data Items for NPCR/SEER USCS Incidence Public Use Research Dataset
SEER*Stat Category
SEER*Stat Variable Name
Restrictions
Laterality
Multiple Primary Fields
Sequence number – central
Geographic Locations
Ruralurban continuum 2013 calc
Dates
Year of birth
Month of diagnosis
Merged SystemSupplied
Alcohol-related cancers
HPV-related cancers
Obesity-related cancers
Physical inactivity-related cancers
Tobacco-related cancers
State race eth suppress
NPCR-CSS 2021 Data Release Policy
August 2021
1995–2020 Diagnosis Years
54
Grouped into 3 categories:
metro (RUCC 1-3); nonmetro
(RUCC 4-9); unknown
APPENDIX K – NPCR/SEER – U.S. Cancer Statistics Public Use Research Database Data
Use Agreement
National Program of Cancer Registries (NPCR) and Surveillance, Epidemiology, and End
Results (SEER) Incidence – U.S. Cancer Statistics
Public Use Research Database Data Use Agreement
For data submitted November, {year}
The Centers for Disease Control and Prevention (CDC) and the National Cancer Institute (NCI)
make NPCR and SEER data available to the public and researchers through various data release
activities. The NPCR and SEER Incidence – U.S. Cancer Statistics Public Use Research
Databases are an unrestricted subset of data submitted to CDC and NCI and made available only
through the National Cancer Institute’s SEER*Stat statistical software.
CDC has obtained an assurance of confidentiality for NPCR pursuant to Section 308(d) of the
Public Health Service Act, 42 U.S.C. 242m(d). Any effort to determine the identity of any
reported cases, or to use the information for any purpose other than statistical reporting and
analysis, is a violation of the assurance. All direct identifiers, as well as characteristics that might
easily lead to identifying individuals, are omitted from the NPCR and SEER Incidence – U.S.
Cancer Statistics Public Use Research Databases. Certain demographic information has been
included for research purposes; thus, all SEER*Stat results must be presented or published in a
manner that ensures that no individual can be identified. In addition, there must be no attempt to
identify individuals from any computer file or to link with a computer file containing patient
identifiers.
Data users must agree to the following provisions before receiving access to the NPCR and
SEER Incidence – U.S. Cancer Statistics {year}–{year} and {year}– {year} Public Use Research
Databases. Please initial after each statement to indicate agreement.
As the recipient of access to NPCR and SEER Incidence – U.S. Cancer Statistics Public Use
Research Databases:
• I will adhere to the requirements of the Data Use Agreement and understand that my
access to the data will be revoked if these requirements are violated. Initials: ______
•
I understand that all NPCR data are owned by the states and territories. The states and
territories have established agreements with CDC regarding the use and dissemination of
the data. Initials: ______
•
I will not use or permit others to use the analytic results in any way other than for
statistical reporting and analysis. Initials: ______
•
I will use appropriate safeguards to prevent use or disclosure of the information other
than as provided for by this agreement. Initials: ______
•
I will ensure all members of the research team who have access to the NPCR and SEER
Incidence – U.S. Cancer Statistics Public Use Research Database through SEER*Stat
have signed this agreement. Initials: ______
NPCR-CSS 2021 Data Release Policy
August 2021
1995–2020 Diagnosis Years
55
APPENDIX K – NPCR/SEER – U.S. Cancer Statistics Public Use Research Database Data
Use Agreement
•
I will not attempt to link or permit others to link NPCR and SEER Incidence – U.S.
Cancer Statistics Public Use Research Data with individually identifiable records from
any other dataset without CDC approval. Initials: ______
•
I will not attempt to use the analytic results or permit others to use them to learn the
identity of any person or establishment included in any dataset. Initials: ______
•
I will take the following actions if the identity of any person or establishment is
discovered inadvertently:
o Make no use of this knowledge.
o Notify CDC by sending an e-mail to uscsdata@cdc.gov.
o As requested by CDC, safeguard or destroy the information that identifies an
individual or establishment.
o Inform no one else of the discovered identity. Initials: ______
•
I will make every effort to release all statistical information in such a way as to avoid
inadvertent disclosure by:
o Ensuring that no data on an identifiable case can be derived through subtraction or
other calculation from the combination of tables in the given publication. Initials:
______
o Ensuring that no data permit disclosure when used in combination with other
known data. Initials: ______
o Not disclosing or otherwise making public data on any unit smaller than 16. If the
total number of cases in a cell is fewer than 16, the cell data will be suppressed in
oral and written presentations. Initials: ______
•
I have read the data documentation file and have an understanding of the data available in
the database and the restrictions related to their use. If I have questions regarding my
analytic approach, I will contact CDC NPCR (uscsdata@cdc.gov) for assistance. Initials:
______
•
I am familiar with the use of SEER*Stat in analyzing data or will complete the needed
training. Initials: ______
•
I understand that I am responsible for the results of my own analysis. The findings and
conclusions resulting from the analysis of these data are those of the authors and do not
necessarily represent the official position of CDC. Initials: ______
•
I will acknowledge central cancer registries whenever data are presented, released, or
published by including the following (or similar) statement:
These data were provided by central cancer registries participating in CDC’s
National Program of Cancer Registries (NPCR) and/or NCI’s Surveillance,
Epidemiology, and End Results (SEER) Program and submitted to CDC and NCI
in November {date}. Initials: ______
NPCR-CSS 2021 Data Release Policy
August 2021
1995–2020 Diagnosis Years
56
APPENDIX K – NPCR/SEER – U.S. Cancer Statistics Public Use Research Database Data
Use Agreement
•
As appropriate, I will cite the data –
For the {date}-{date}database: National Program of Cancer Registries and
Surveillance, Epidemiology, and End Results SEER*Stat Database: NPCR and
SEER Incidence – U.S. Cancer Statistics Public Use Research Database, Nov
{year} submission ({year}-{year}), United States Department of Health and
Human Services, Centers for Disease Control and Prevention and National Cancer
Institute. Released {date}, based on November {year} submissions. Available at
www.cdc.gov/cancer/public-use.
For the {year}-{year} database: National Program of Cancer Registries and
Surveillance, Epidemiology, and End Results SEER*Stat Database: NPCR and
SEER Incidence – U.S. Cancer Statistics Public Use Research Database with
Puerto Rico, Nov {year} submission ({year}-{year}, United States Department of
Health and Human Services, Centers for Disease Control and Prevention and
National Cancer Institute. Released {date}, based on November
{year}submissions. Available at www.cdc.gov/cancer/public-use.
Initials: ______
Users cannot be given access to the U.S. Cancer Statistics databases until SEER Research
Plus access is set-up. See the instructions on page 1.
When you have access to SEER Research Plus, complete the fields below, sign and date the
agreement, and e-mail both pages to uscsdata@imsweb.com.
The e-mail address you provide must be the same one used to obtain access to SEER Research
Plus.
________________________________________________
Signature
__________________
Date
Name: ________________________________________________________________________
Title and organization: ___________________________________________________________
Telephone number: _________________________
E-mail address: ____________________
NPCR-CSS 2021 Data Release Policy
August 2021
1995–2020 Diagnosis Years
57
APPENDIX L – NPCR Data at the NCHS RDC Q&A
Can you summarize the data access process?
CDC uses the National Center for Health Statistics (NCHS) Research Data Center (RDC) as a
mechanism for researchers outside of the Division of Cancer Prevention and Control (DCPC) to
request and gain access to the Restricted-Access NPCR/SEER data for research purposes. The
data are available through the NCHS RDC only after the standard data quality reviews that occur
as part of the preparation for USCS.
The use of the NCHS RDC to manage data access provides the highest level of data security and
protection of confidentiality that is available for analysis of data. Any researcher must submit a
proposal that is reviewed and approved by CDC and may be reviewed by representatives from
the participating central cancer registries (CCRs) before any data analysis begins. Trained data
analysts at the NCHS RDC create a dataset that is customized to each analysis. The researcher
can run his or her own statistical analysis or have the NCHS RDC analyst run the analysis. The
NCHS RDC analyst reviews all output from statistical analysis to ensure that the researcher only
conducts analyses relevant to the approved protocol and that small cell sizes are suppressed.
Absolutely no individual level data will leave the NCHS RDC facilities. The data can only be
accessed onsite; the NCHS RDC remote option is not available for the Restricted-Access
NPCR/SEER data.
What is National Center for Health Statistics (NCHS)?
NCHS is one of the national centers at CDC and is located in Hyattsville, Maryland. As the
Nation's principal health statistics agency, staff at NCHS compile statistical information to guide
actions and policies to improve the health of our people. More information about NCHS is
available at: http://www.cdc.gov/nchs/about.htm.
What is the Research Data Center (RDC)?
The NCHS RDC began in 1998 and has a long-standing history of managing access to health and
vital statistics data through a rigorous proposal review process as well as review of the statistical
output. The NCHS RDC mission is to give public access to the full range of health and vital
statistics data, while protecting the confidentiality of the respondents and institutions that
collected the information. There have been no breeches of confidentiality for data access through
the NCHS RDC.
The NCHS RDC houses sensitive, but not classified, data. It allows access to individual data
without the possibility of disclosure of identifying information. The NCHS RDC offers
statistical, programming, and consulting expertise to facilitate the data analysis for research.
The NCHS RDC is a data hosting center, not a data repository. The data extracts that are hosted
on the NCHS RDC are tailored specifically to the proposal and have a research life cycle. Once
the analysis is completed, the data extract is archived for 2 years and then destroyed.
There are currently three modes of access through the NCHS RDC, each with specific
restrictions. Access is available on-site at two locations (Hyattsville, MD and Atlanta, GA), nine
Census RDCs, or through remote electronic access. More information about the NCHS RDC is
available at: http://www.cdc.gov/rdc/
NPCR-CSS 2021 Data Release Policy
August 2021
1995–2020 Diagnosis Years
58
APPENDIX L – NPCR Data at the NCHS RDC Q&A
Why does CDC use the NCHS RDC?
Maintaining confidentiality is the primary objective of the NCHS RDC. Staff at NCHS RDC
have statistical expertise to address confidentiality and disclosure risk. Using the NCHS RDC
will allow CDC to comply with the Assurance of Confidentiality [308(d)] that was obtained for
the NPCR-CSS data. All researchers must take confidentiality orientation, complete
confidentiality forms, and review the disclosure manual, all of which outline practices that are
essential to protecting the data and preventing disclosure of confidential information.
Additionally, data housed at the NCHS RDC are not subject to the Freedom of Information Act
(FOIA). More information about confidentiality is available at:
http://www.cdc.gov/rdc/B4ConfiDisc/CfD400.htm.
What is the research proposal process?
The NCHS RDC has a rigorous review process for analyses proposed by any researchers wanting
to use RADS data. All proposals will be evaluated by a Review Committee consisting of the
NCHS RDC Director, the Confidentiality Officer, the assigned NCHS RDC analyst, and NPCR
representatives. The iterative review and comment process may take 6 to 8 weeks.
Through this process, the NCHS RDC staff, the NPCR staff, and the CCR staff will fully
understand the intended analysis and will be able to provide any needed direction or restrictions
on the analysis and describe any limitations in what is proposed. It will be possible for CDC and
participating registries to disapprove a proposal. However, guidance and re-direction as needed
should be the norm. More information about the review process is available at:
http://www.cdc.gov/rdc/B3Prosal/PP300.htm.
Once a proposal has been approved, the NCHS RDC offers a secure environment for data
analyses and has processes in place to review data output for small cell sizes. This will ensure
that the NPCR suppression rules are properly applied. Through the NCHS RDC, the user can
conduct analyses and have remote access to data but cannot download the individual record level
data or obtain counts for inappropriately small cell sizes.
The use of the NCHS RDC to host the NPCR data are a win-win opportunity because of the
confidence in knowing that the data are being used correctly and safely, while at the same time
making the data available for external researchers in an appropriate way. In addition, this
approach will not overtax resources here in the Branch or in the CCRs. The NCHS RDC
provides a level of data control beyond that of any other data access system used for registry
data.
Who has access to the data and at what level?
The NCHS RDC analysts will have access to the individual record level data since it is easier to
create an analytic dataset using these data. The NCHS RDC analysts will be bound by the same
data use agreements that CDC staff sign on an annual basis. Researchers with approved
proposals will be able to conduct analyses through the NCHS RDC on the created dataset or have
the NCHS RDC analyst do the analysis for them. However, they will not be able to download
NPCR-CSS 2021 Data Release Policy
August 2021
1995–2020 Diagnosis Years
59
APPENDIX L – NPCR Data at the NCHS RDC Q&A
any part of the data from the NCHS RDC. Any additional variables that were not included in the
original analysis proposal will need a separate approval process.
Note that this is different from the process that NPCR has used in the past where researchers
with approved proposals would have direct access to the dataset itself including the ability to
download the data and create a listing of individual record level data and all variables in the
RADS.
Researchers have several possible modes of access to the data set created for their specific
research proposal. More information is available at:
http://www.cdc.gov/rdc/B2AccessMod/ACs200.htm.
When a researcher conducts an analysis, what type of output will he or she get?
If a researcher is on-site at the NCHS RDC, he or she can save the results on the hard drive of the
NCHS RDC computer. The NCHS RDC analyst will review the output for disclosure then either
load the output onto a flash drive supplied by the researcher or e-mail the output files to the
researcher. If a researcher is accessing the NCHS RDC remotely, he or she will send program by
e-mail and, after disclosure review by the NCHS RDC analyst, will receive the output files by email. No individual record level data are released to the researcher.
Will the CCRs be able to decide whether their data will be available through the NCHS
RDC?
Starting with DP17-1701, participation in all CDC-created and hosted analytic datasets and webbased data query systems, as outlined in the annual NPCR-CSS Data Release Policy, is a
required strategy. [DP17-1701, 2. CDC Project Description, a. Approach, iii. Strategies and
Activities, Program 3: National Program of Cancer Registries (NPCR) – Component 1, Strategy
3 Cancer Data and Surveillance (Domain 1), Data Submission (page 19)]. Therefore, data from
all CCRs meeting eligibility requirements are included. Data use is important to NPCR and for
continued support of the registries.
Will the CCRs be able to decide if their county-identifying variable (County at Dx
[NAACCR#90]) is to be available for use in the NCHS RDC?
Starting with DP17-1701, participation in all CDC-created and hosted analytic datasets and webbased data query systems, as outlined in the annual NPCR-CSS Data Release Policy, is a
required strategy. [DP17-1701, 2. CDC Project Description, a. Approach, iii. Strategies and
Activities, Program 3: National Program of Cancer Registries (NPCR) – Component 1, Strategy
3 Cancer Data and Surveillance (Domain 1), Data Submission (page 19)]. Therefore, data from
all CCRs meeting eligibility requirements are included. County data will be used only in
approved analyses and in the following ways:
•
Used as a linkage variable (linkage to census data, for example) only by the NCHS RDC
analyst. The county variable will not be available to the researcher but the NCHS RDC
analyst would use it to create a linked dataset and then remove the county variable.
NPCR-CSS 2021 Data Release Policy
August 2021
1995–2020 Diagnosis Years
60
APPENDIX L – NPCR Data at the NCHS RDC Q&A
•
Included as a confounder or other control variable, but no data are presented by county.
The NCHS RDC analyst will create dummy variables to mask the actual county name.
•
Used in geographically aggregated form such as large metropolitan statistical areas (e.g.,
those with a population of 1 million or larger), multi-county regions, or geographical
areas (e.g., Appalachia or IHS Contract Health Services Delivery Areas (CHSDA)
counties). It will be possible for the NCHS RDC analyst to create these areas for the
researcher.
Previous data release policies indicate that the project proposals for RADS would be
reviewed by the RADS working group, facilitated by CDC with representation by the
CCRs. Does this procedure change now that the NCHS RDC is used?
The CCRs will still have input on the RADS proposals. The NCHS RDC review process also
includes the NCHS RDC analyst and the confidentiality officer, who will be responsible mainly
for disclosure review to ensure that we abide by the 308(d) assurance of confidentiality obtained
for NPCR-CSS. More information about the NCHS RDC review process is available at:
http://www.cdc.gov/rdc/B3Prosal/PP340.htm.
NPCR will obtain comments on each proposal from CCRs through the NPCR Central Cancer
Registry Council.
Will SEER data be included for analysis or will the data be limited to NPCR data?
Yes. Both NPCR and SEER data may be accessed through the NCHS RDC.
Will the NCHS RDC staff have access to SEER*Prep and SEER*Stat?
No. NPCR previously provided a SEER*Stat file to the NCHS RDC but found that researchers
only used the SAS file. Therefore, the SEER*Stat file is no longer provided.
Will researchers have access to SEER*Stat?
No. As noted above, NPCR is no longer providing a SEER*Stat file to the NCHS RDC.
What suppression rules will be used for the RADS?
The same suppression rules that are used for United States Cancer Statistics. More detailed
information is available at:
https://www.cdc.gov/cancer/npcr/uscs/technical_notes/stat_methods/suppression.htm.
In addition, the suppression rules for Asians/Pacific Islanders (A/PI) and American
Indians/Alaska Natives (AI/AN) will also apply.
Wouldn’t it be better for researchers to contact CCRs directly for linkage studies?
CDC doesn’t collect personal identifiers like name or social security number.
Yes, it would be best for researchers to contact CCRs directly for linkage studies that require
individual identifiers. However, valuable public health research can be conducted with access to
NPCR-CSS 2021 Data Release Policy
August 2021
1995–2020 Diagnosis Years
61
APPENDIX L – NPCR Data at the NCHS RDC Q&A
county-level data. Examples include linkage with U.S. Census data for socioeconomic analyses,
or to examine regional differences in the prevalence of a specific cancer
Will IRB review be required for each proposal? If not, will NCHS require the researcher to
obtain IRB approval before they submit their proposal?
The NCHS RDC has an umbrella ethics review board (ERB) protocol that covers CDC
employees and can be extended to external researchers. The principal investigator and all
research team members who come in contact with the data must take the confidentiality
orientation and complete the confidentiality forms. One of the confidentiality forms is the
designated agent form (http://www.cdc.gov/rdc/Data/B4/DesignatedAgent.pdf), which extends
the ERB to cover external researchers.
Note that the ERB protocol serves the same function as an institutional review board (IRB)
protocol. At CDC, there is one office that coordinates the submission and tracking of human
research protocols. However, other centers such as NCHS and the National Institute of
Occupational Safety and Health, have different names for these review boards: Research Ethics
Review Board (ERB) at NCHS and Human Subjects Review Board (HSRB) at NIOSH.
Researchers may choose to obtain an IRB from their own institution, but it will not be a
requirement in the application process given the ERB extension that the NCHS RDC provides.
Does access to the RADS cost anything?
No. CDC covers the cost of analyzing RADS through the NCHS RDC.
As more researchers become aware of the RADS, they may want access to additional
variables that CCRs submit to CDC. How will this process be handled?
The addition of new variables in RADS will be discussed with CCRs prior to their inclusion in
the data release policy, which is updated annually.
How is access to the comparative effectiveness research (CER) dataset managed?
Access to the CER dataset are managed through the same NCHS RDC process. The proposal
process will not differ except that staff from the Specialized Registries funded for CER data
collection will review these proposals.
NPCR-CSS 2021 Data Release Policy
August 2021
1995–2020 Diagnosis Years
62
APPENDIX M – Data Items for Restricted-Access Dataset (RDC)
The restricted access dataset are individual case-specific data derived from the NPCR-CSS dataset.
The data are available to researchers at NCHS Research Data Centers as a SAS file. SAS files are
created specifically for each project’s needs. The data items that may be requested by researcher are
listed below.
Variable Name
Alternate Patient ID Number
Address at Diagnosis – State
Address at Diagnosis – County at Analysis*
USCS Standard
USCS9919
USCS1519
USCS9819
USCS1019
Address at Diagnosis – Census Region
Race 1
Race 2
Race Recode
Econ Status
State race eth suppress
Spanish/Hispanic Origin
IHS Link
Sex
Age at Diagnosis**
Age Recode
Birth Date***
Econ status
Rural-urban continuum 2013
Sequence Number – Central
Date of Diagnosis****
Primary Site
Laterality
Grade
Grade Clinical
Grade Pathological
Grade Post Therapy
Diagnostic Confirmation
Type of Reporting Source
Histologic Type ICD-O-3
Behavior Code ICD-O-3
NPCR-CSS 2021 Data Release Policy
August 2021
1995–2020 Diagnosis Years
63
APPENDIX M – Data Items for Restricted-Access Dataset (RDC)
Behavior Recode for Analysis Derived/WHO 2008
Primary Site Recode
SEER International Classification of Childhood Cancer (ICCC) Recode
Extended ICD-O-3/WHO 2008
AYA Site Recode 2020
Lymphoma Neoplasm Recode 2020 Revision
SEER Summary Stage 2000
SEER Summary Stage 1977
Derived SS2000
Summary Stage 2018
Merged Summary Stage
RX Summ – Surg Prim Site
Merged radiation
CS Site-Specific Factor 1
Merged Estrogen Receptor
Merged Progesterone Receptor
Merged HER2 Receptor
Over-ride Age/Site/Morph
Over-ride SeqNo/DxConf
Over-ride Site/Lat/Sequence Number
Over-ride Site/Type
Over-ride Histology
Over-ride Report Source
Over-ride Ill-define Site
Over-ride Leuk, Lymphoma
Over-ride Site/Behavior
Over-ride Site/Lat/Morph
Alcohol-related cancers
HPV-related cancers
Obesity-related cancers
Physical activity-related cancers
Tobacco-related cancers
* County data will be used only in approved analyses and in the following ways: a) used as a linkage variable
(linkage to census data, for example) only by the NCHS RDC analyst; b) included as a confounder or other
control variable, but no data are presented by county; c) used in geographically aggregated form such as large
metropolitan statistical areas (e.g., those with a population of 1 million or larger), multi-county regions, or
geographical areas (e.g., Appalachia or IHS Contract Health Services Delivery Areas (CHSDA) counties)
**Age over 99 is recoded
***Only year is provided; if age is over 99, year of birth is recoded
****Day of diagnosis is not provided
NPCR-CSS 2021 Data Release Policy
August 2021
1995–2020 Diagnosis Years
64
APPENDIX N – NPCR-CSS Levels of Data Access
NPCR-CSS Levels of Data Access
Internal Analytic Datasets
Federal/Trusted Partners
Dataset
Includes: Record level
Includes: Record level
information, Survival dataset,
Prevalence dataset, Delay
Adjusted dataset
Criteria: USCS criteria met, <6
cases cell suppression,
complementary cell suppression
Availability: DCPC, SEER, IHS
researcher or contractor
information; may include Survival
dataset, Prevalence dataset, Delay
Adjusted dataset
Criteria: USCS criteria met, <16
cases cell suppression,
complementary cell suppression
Availability: ACS, CBTRUS, IACR,
CONCORD, AHRQ, OWH, CDI,
CDC’s Tracking Program
Access: Signed Data Use
Agreement and NonDisclosure Agreement,
Assurance of
Confidentiality training
External Restricted Access
Includes: Includes record-level
information
Criteria: USCS criteria met
Availability: Researcher outside
DCPC through NCHS RDC
Access: Proposal
submitted to NCHS RDC,
signed Data Use
Agreement and NonDisclosure Agreement
Access: Signed Data Use
Agreement and NonDisclosure Agreement; may
include MOU
Is a state or
county used
and
identified?
NPCR and
RDC review;
may include
state
Yes
No
Data
published in
USCS?
No
Yes
No additional permission
needed; should document its
use and include proper
acknowledgment
States notified of
study results
NPCR-CSS 2021 Data Release Policy
August 2021
1995–2020 Diagnosis Years
65
APPENDIX N – NPCR-CSS Levels of Data Access
NPCR-CSS Levels of Data Access
Public Use Datasets
USCS Data Visualizations tool
Includes: State, county, region,
and Congressional-district
levels, no record-level
information
Criteria: USCS criteria met,
permission provided on Dataset
Participation Agreement, <16
State
Cancer Profiles
cases cell suppression enforced
Availability: Public
CDC WONDER
Includes: State, county, region,
and MSA levels, no record-level
information
Criteria: USCS criteria met,
permission provided on Dataset
Participation Agreement, <16
cases Cancer
cell suppression
enforced
Profiles
State
Availability: Public
No additional permission
needed; users should
document its use and include
proper acknowledgment
State Cancer Profiles
Includes: State and county
levels, no record-level
information
Criteria: USCS criteria met,
permission provided on
Dataset Participation
Agreement, <16 cases cell
suppression enforced
Availability: Public
NPCR/SEER USCS Public Use Dataset
Includes: State record-level
information, no case listing
Criteria: USCS criteria met,
permission provided on Dataset
Participation Agreement, <16
cases cell suppression enforced
Availability: Public after signed
Data Use Agreement and NonDisclosure Agreement, annual
agreements required
NPCR-CSS 2021 Data Release Policy
August 2021
1995–2020 Diagnosis Years
66
APPENDIX O – Data Items for NPCR/SEER USCS Delay Adjusted Database
SEER*Stat Category
Age at Diagnosis
SEER*Stat Variable Name
Delay age Age recode with single ages and 85+
Age recode with <1 year olds
Race, Sex, Year Dx, Registry,
County
Sex
Year of diagnosis
Addr at DX – state
County at DX Analysis
State-county
Origin recode NHIA (Hispanic, Non-Hispanic)
Required Delay Fields
Delay factor
Delay site
Race and origin recode (NHW, NHB, NHAIAN, NHAPI,
Hispanic)
Delay race (All, Race recode (White, Black, AIAN,
CHSDA, API, Hisp, Non-Hisp)
Site and Morphology
Behavior recode for analysis derived/WHO2008
Multiple Primary Fields
Sequence number - central
NPCR-CSS 2021 Data Release Policy
August 2021
1995–2020 Diagnosis Years
67
APPENDIX P – Data Items for NPCR Prevalence Database
SEER*Stat Category
SEER*Stat Variable Name
Age at Prevalence Date
Age at Prevalence Data (Calculated)
Age at Diagnosis
Age recode with single ages and 85+
Race, Sex, Year Dx, Registry,
County
Sex
Year of diagnosis
Addr at DX – state
County at DX Analysis
State-county
NPCR project flag
Economic status
Race and origin recode (NHW, NHB, NHAIAN, NHAPI,
Hispanic)
Race and origin recode (NHW, NHB, NHAIAN, NHAPI,
Hispanic)
State
County
Race recode (White, Black, Other)
Site and Morphology
Primary Site – labeled
Histologic Type ICD-O-3
Behavior Code ICD-O-3
Grade
Diagnostic confirmation
ICD-O-3-Hist/behavior, labeled
ICD-O-3-Hist/behavior, malig, labeled
Site recode ICD-O-3/WHO 2008
ICCC site recode ICD-O-3/WHO 2008
Behavior recode for analysis derived/WHO2008
Stage – LRD [Summary and Historic]
Derived SS2000
SEER Summary Stage 2000
Merged Summary Stage 2000
Extent of Disease – CS
CS Site-Specific Factor 1
CS Site-Specific Factor 2
CS Site-Specific Factor 15
Laterality
Cause of Death (COD) and
Follow-up
Survival months – presumed alive
Survival months flag – presumed alive
Cause of death (ICD-10)
ICD revision number
Vital status
Follow-up source central
NPCR-CSS 2021 Data Release Policy
August 2021
1995–2020 Diagnosis Years
68
APPENDIX P – Data Items for NPCR Prevalence Database
SEER*Stat Category
SEER*Stat Variable Name
COD exclusion flag
Original vital status
Vital status recode (study cutoff used)
Cause of death recode
COD recode with Kaposi and mesothelioma
Multiple Primary Fields
Sequence number - central
Race and Age (case data only)
Age at Diagnosis
Race 1
NHIA derived Hispanic origin
Dates
Presumed alive year of last contact recode
Presumed alive month of last contact recode
Presumed alive day of last contact recode
Year of birth
Month of diagnosis
Day of diagnosis
Original day of last contact
Original month of last contact
Original year of last contact
Original year of diagnosis
Original day of diagnosis
Original month of diagnosis
Other
Type of Reporting Source
User-Specified
EDPMDE LinkVar
NPCR-CSS 2021 Data Release Policy
August 2021
1995–2020 Diagnosis Years
69
File Type | application/pdf |
Author | C.L.Zadoretzky |
File Modified | 2021-12-28 |
File Created | 2021-08-11 |