CDC, NCI launch public database with millions of cancer samples
Click Here to Manage Email Alerts
The CDC has released a public-use research database that combines data from CDC’s National Program of Cancer Registries and NCI’s SEER program.
It includes data from all 50 U.S. states, Washington, D.C., and Puerto Rico.
“More researchers now have the ability to analyze our high-quality cancer data through this new publicly available data resource. This will help further our work to positively impact cancer prevention and control as well as the care and quality of lives for those diagnosed with cancer,” Vicki Benard, PhD, branch chief of the Cancer Surveillance Branch in the CDC’s division of cancer prevention and control, told HemOnc Today.
Detailed, de-identified information on more than 22 million cancer cases recorded between 2001 and 2014 are now available online. Researchers can analyze these data to better understand cancer, inform coordinated efforts to address cancer through prevention and evaluate progress in cancer control.
HemOnc Today spoke with Benard about how the idea for this database came about, as well as the value of the information it contains.
Question: How did the idea for this database come about?
Answer: Information on newly diagnosed U.S. cancer cases is collected by the CDC’s National Program of Cancer Registries and the NCI’s SEER program on an annual basis. We have received many requests and feedback from cancer researchers that they are interested in a nationally representative database that is easy to use. This year, CDC and NCI worked together to create the first combined, de-identified population cancer incidence database to meet researchers’ needs. The database provides information about more than 22 million cancer cases in the United States diagnosed as early as 2001.
Q: Can you describe the need for this type of database?
A: Prior to the release of these databases, the public primarily accessed the combined CDC and NCI cancer data through a web-based report, which was limited in terms of what one could do with the information. In order to be more responsive to researchers and to make the data available that they need, this public-use database will allow researchers to perform more in-depth analyses. For example, these databases allow researchers to examine trends based upon disease histology and behavior, and public health professionals can evaluate progress in cancer control efforts by their state.
Q: What information is included in the database?
A: The database includes 14 years of data on new cancer cases diagnosed from 2001 to 2014 — the most recent year for which cancer data are available — providing information on more than 20 million cancer cases from all 50 states and Washington, D.C. We also released a secondary database that includes 10 years of data on new cancer cases diagnosed from 2004 to 2014, providing researchers with data on 17 million cancer cases from all 50 states, Washington, D.C., and Puerto Rico. Hospitals, physicians and laboratories across the nation report data to central cancer registries supported by CDC and NCI. These two databases are intended for researchers to conduct focused analyses beyond what is available through the United States Cancer Statistics web-based report and data visualization tool. We have about 30 variables available, including demographic characteristics such as age, sex, race and state of residence. We also are able to collect tumor characteristics, including tumor type, location and the extent of the tumor.
Q: What can researchers do with these data?
A: There is a lot that can be done with the combined National Program of Cancer Registries and SEER research data. Researchers can evaluate the burden of cancer by different geographic regions and demographic characteristics, including age, sex, race and ethnicity. They can look at trends by tumor characteristics, such as tumor location, type and stage. It is also possible to look at rare cancers, thanks to the large dataset.
Q: Is there anything else that you would like to mention?
A: These databases are available to researchers and will be updated annually. Information on how to access these databases and supporting documentation — including the data dictionary, checklist and technical methodology notes — is available on our website, www.cdc.gov/cancer/public-use. Links to our new data visualization tool are available at www.cdc.gov/cancer/dataviz. – by Jennifer Southall
For more information:
Vicki Benard, PhD, can be reached at the CDC, 1600 Clifton Road, Atlanta, GA 30329; email: vbenard@cdc.gov.
Disclosure: Benard reports no relevant financial disclosures.