December 05, 2017
4 min read
Save

Childhood cancer data lab designed to ‘improve pace, efficiency of discovery’

You've successfully added to your alerts. You will receive an email when new content is published.

Click Here to Manage Email Alerts

We were unable to process your request. Please try again later. If you continue to have this issue please contact customerservice@slackinc.com.

Casey S. Greene

Alex’s Lemonade Stand Foundation announced the opening of the first-of-its-kind Childhood Cancer Data Lab, with the goal of integrating data to promote childhood cancer research and continue the foundation’s vision of finding cures for childhood cancer.

The Childhood Cancer Data Lab (CCDL) is expected to include more than 1 million public genome-wide assays.

“The CCDL is a unique effort to build the capacity for data-intensive research, not across an institution but across a field. I am excited that we have the opportunity to harness data that were generated, shared and then often forgotten, for the fight against childhood cancer,” Casey S. Greene, PhD, assistant professor in the department of pharmacology at Pharmacology Graduate Group Affiliations at Perelman School of Medicine at University of Pennsylvania, and also director of CCDL, said in a press release.

“If we succeed, childhood cancer researchers will be able to rapidly connect their own findings to everyone else’s, not only via the results that they write about in their papers, but by the patterns that exist in their data,” Greene added. “We hope that this will improve the pace and efficiency of discovery in the field leading to treatments being identified much earlier than they otherwise would be.”

HemOnc Today spoke with Greene about how this initiative came about, how it will fill a knowledge gap, and what those who access the database will be able to do with the information they glean.

 

Question: How did the idea for this initiative come about?

Answer: This is a tough question for me to answer in some regards, primarily because I became involved later in the process. The directors of Alex’s Lemonade Stand were looking for an opportunity to make a direct impact on an area for which they saw a need. Childhood cancer research is the area of focus for Alex’s Lemonade Stand, and one of the things that they saw a need to support was data-intensive analyses.

Q: What kind of knowledge gap will the initiative fill?

A: We hope to target two areas. First, we would like to build software infrastructure for childhood cancer researchers. Essentially, there are many data that exist, but they are not all easily accessible. Therefore, we hope to make data science easily accessible for childhood cancer researchers. Secondly, we are interested in training researchers on how to use the infrastructure once it has been put to use.

PAGE BREAK

Q: Where is the initiative in development?

A: We want to get the refinery up and running by the beginning of 2018. Then we would like to have a full-scale release of the first set of harmonized data during the first quarter of 2018. Right now, the database software is available on GitHub and everyone can watch it being developed. Internally, we have so far processed 12,000 samples through the system. These samples have been downloaded, a processor has been applied to them and there is some form of output that has been generated. We will then go back in and see which of these were fully processed. Currently, there are about 7,000 samples that have been fully processed. We plan to process public RNA expression assays, and we estimate there are about 2 million samples when combining RNA data and microarray data. This is only the first stage of the CCDL. Longer term, we will aim to make the data minable.

Q: Who will have access to the database?

A: Our goal is to make it available to anyone with an internet connection. The data refinery project is focused on data that are already publicly available. The goal is to make these publicly available data more easily accessible by removing many of the preprocessing hurdles that investigators must deal with in order to use the data. Because these data are public, our goal is to make this database a public resource. In the future, there may be some resources from the CCDL that we cannot make fully public, because we may work with data where sharing restrictions are designed to protect the identity of the children in the data set. However, our goal is to make the CCDL resources as freely available as possible without compromising participants.

 

Q: What will those who access the database be able to do with the information they glean?

A: Initially, we hope researchers in the field will have made their data publicly available when they publish their papers. Nearly all journals now require this. If their data have been made publicly available and we have processed it, they will be able to directly compare their data with any other public data. We expect that many will use this to discover consistent patterns between their data and others’ data. Our ultimate goal is to get all public data available in the refinery.

 

Q: Is there anything else that you would like to mention?

A: There is a project in Philadelphia called Project Cognoma, which is a citizen science project that arose out of a collaboration between Code for Philly and Data Philly. This is an example of the types of things that people can build when the data are available. This project aims to allow any investigator to build signature of genetic changes. The way that we have seen this being used thus far is analyzing the inactivation of tumor suppressor. The Cognoma webserver builds a signature of the gene expression associated with whatever genetic changes the investigator puts in across the dataset among many different cancer types. Although this is not directly related to the Alex’s Lemonade Stand CCDL right now, we have agreed to take over the maintenance of the system. We hope to keep improving the system and we are interested in adapting this to more datasets, including those focused on childhood cancers. – by Jennifer Southall

 

For more information:

Casey S. Greene, PhD, can be reached at 10-131 Smilow Center for Translational Research

Philadelphia, PA 19104; email: csgreene@mail.med.upenn.edu.

 

Disclosure: Greene reports no relevant financial disclosures.