November 14, 2016
2 min read
Save

Internet search engines can identify those at risk for lung cancer

You've successfully added to your alerts. You will receive an email when new content is published.

Click Here to Manage Email Alerts

We were unable to process your request. Please try again later. If you continue to have this issue please contact customerservice@slackinc.com.

Pattern analysis of data from internet searches may help with early detection of lung cancer, according to a study published in JAMA Oncology.

“Screening for lung carcinoma involves identifying high-risk individuals and subsequent studies to detect tumors,” Ryen W. White, PhD, chief technology officer of health intelligence at Microsoft Health and principal researcher at Microsoft Research in Redmond, Washington, and colleagues wrote. “Standing challenges of false-positives and false-negatives, and the costs associated with screening and follow-up, motivate the pursuit of new and complementary methods for early identification of lung carcinoma. We examined the feasibility of a nontraditional yet promising direction for detecting early signs of lung carcinoma.”

The researchers used search logs from Bing.com to conduct a retrospective log analysis of web searches conducted between May 2014 and October 2015 that were related to possible symptoms of lung cancer (n = 4,813,985). The investigators identified searchers whose use of the website suggested they had recently been diagnosed with lung cancer.

White and colleagues then identified patterns of searches that focused on possible symptoms in the months before diagnosis, and built a statistical classifier to identify users who eventually carried out searches indicating they had been diagnosed with lung cancer.

Of users who searched lung cancer symptoms, 5,443 later entered queries that strongly suggested a recent cancer diagnosis. The remaining searchers were considered not to have cancer.

Model performance overall was strong, according to the researchers. They calculated an area under the receiver operating characteristic curve of 0.9535, with true-positive rates ranging from 3% to 57% for false-positive rates ranging from 0.00001 to 0.001.

Evidence of family history (RR = 7.54; 95% CI, 3.93-14.47), radon (RR = 2.52; 95% CI, 1.13-5.62), primary location (RR = 2.46; 95% CI, 1.36-4.44), age (RR = 3.55; 95% CI, 3.35-3.77) and occupation (RR = 1.96; 95% CI, 1.14-3.39) were the five most prominent risk factors for lung carcinoma. Evidence of smoking also was a major risk factor (RR = 1.64; 95% CI, 1.03-2.26), but was not ranked in the top five because it was difficult to identify a history of smoking by search terms, White and colleagues wrote.

“In a real-world deployment, web search engines could serve as a filter to identify patients who would benefit from clinical screening,” White and colleagues wrote. “Health-conscious patients may volunteer to receive alerts if concerning activity is detected. Communicating early detection outcomes with searchers without causing unnecessary alarm and associated costs needs more attention.” – by Andy Polhamus

Disclosure: The researchers report no relevant financial disclosures.