Fact checked byShenaz Bagha

Read more

December 23, 2024
4 min read
Save

Google, X data useful in tracking seasonal allergy trends

Fact checked byShenaz Bagha
You've successfully added to your alerts. You will receive an email when new content is published.

Click Here to Manage Email Alerts

We were unable to process your request. Please try again later. If you continue to have this issue please contact customerservice@slackinc.com.

Key takeaways:

  • Twitter and Google data had a significant relationship with ED records.
  • Future research could focus on predicting allergy trends in response to climate change.
Perspective from Tori Martel, MPH

Using online data surveillance methods on seasonal allergy trends can help in predictive modeling, according to a study in PNAS Nexus.

Elias Stallard-Olivera, a PhD student in ecology and evolutionary biology at the University of Colorado, Boulder, and Noah Fierer, PhD, a professor of ecology and evolutionary biology at CU Boulder wrote that even though more than a quarter of adults in the U.S. suffer from seasonal allergies, there is a knowledge gap in spatiotemporal patterns.

Social media
The research team hopes to expand their data collection to other online media sources such as Facebook. Image: Adobe Stock

The gap exists due to difficulties in quantifying allergy symptoms and a lack of hospital visits as a result of mild symptom severity, according to the authors.

“There isn’t a good metric for measuring the intensity of seasonal allergies,” Stallard-Olivera said in a press release related to the study. “Traditional allergy prediction methods, like pollen counts, often fall short, making it essential to develop more reliable ways of identifying and tracking allergen sources.”

Methods

This study extracted data from Twitter (known as X since July 2023) posts and Google searches against hospital record data from certain counties in California to validate the researchers’ approach.

Hospital ED records from the California Department of Health Care Access and Information (HCAI) were obtained and included seasonal allergy-related IDC-10 codes from January 1, 2016, to December 31, 2020. Data from 2020 were discarded due to the COVID-19 pandemic.

To extract data from Twitter, Stallard-Olivera and Fierer developed a list of terms and hashtags that would point to the person posting experiencing seasonal allergy symptoms. Noting that only 1% to 3% of all Twitter posts use geotags, the researchers found 67,033 posts with geotags that met their criteria.

The same data collection method was used for the nationwide analysis within the continental United States. However, since the nationwide analysis was not compared with ED records, posts made during the COVID-19 pandemic were included.

Machine learning was used to identify Twitter posts that were accurate and related to users experiencing seasonal allergies. To build an annotated test and training set, researchers took 4,000 Twitter posts from the national dataset and labeled them as either relevant or irrelevant. They also built a time series for each county in California that measured daily allergy Twitter posts.

Data from Google searches were extracted based on the daily probability that seasonal allergy-related terms were used in searches. The list of terms included keywords and Google Trends’ Freebase ID codes.

The research team looked at codes that included seasonal allergies, antihistamines, environmental allergies and other topics relating to searches seasonal allergy information. This data was taken from the Google Extended Trends for Health (GT-E) API.

Nielsen designated market area (DMA) spatial resolutions that measure local television consumption were used for a second dataset that had daily DMA-level Twitter and ED counts to compare with GT-E data.

California dataset

Twitter posts and Google searches were cointegrated with hospital ED records and each other across majority California counties and DMAs, according to the research team’s analysis. The cointegration shows that the datasets have a long-term equilibrium and that internet data is stationary along with hospital ED record data.

Researchers also found a significant linear relationship between seasonal allergy-related online activity and hospital ED records. They noted that this suggests a close alignment between these occurrences. Pearson correlations were greater than 0.5 in California counties that had higher populations, which again demonstrated a significant time-independent relationship between internet activity (Twitter posts and Google searches) and ED records.

Nationwide dataset

The nationwide analysis used posts from 144 counties with populations higher than 500,000 across the continental U.S. Researchers emphasized that less populous counties did not have enough geolocated Twitter data, but the 144 counties used in analysis represented over half of the U.S. population.

Data showed that each county experienced a spring allergy season from March to May each year, as well as a fall allergy season from September to October. High variability was observed across different regions of when allergy seasons would begin or end, with earlier peaks occurring in more southerly latitudes.

For example, peak allergy season occurs in March in Florida and in April and May in the Midwest and Northeast. Colorado, Florida and Texas also experience small upticks in allergies during the winter months.

Allergy season intensity also varied year-to-year. The Twitter-based model allowed the research team to quantify interannual spatiotemporal trends based on time and allergy intensity across locations and years.

The Southeastern U.S. experienced the most severe seasonal allergies, and Florida and Southern California experienced the least severe seasonal allergies. Within California, the Central Valley experienced the most intense allergies.

The fall allergy season in Southern California sees so much variation, the authors continued, that in some years it is nearly absent despite relatively strong seasons on average.

In conclusion

Stallard-Olivera and Fierer further emphasized the significant relationship present between seasonal allergy-related online data and hospital ED records. They explained that internet tracking methods such as these could allow researchers in the future to predict allergy trends in response to climate change. Data from Facebook and other online media could be used to improve accuracy.

“We are just thankful that so many allergy sufferers are willing to complain about their allergies on social media,” Fierer said in the press release.

“Now that we know when allergies are most likely to spike, the next step is better to determine the specific triggers of allergies and why the spikes occur when they do.”

References: