Fact checked byShenaz Bagha

Read more

March 18, 2025
1 min read
Save

‘Pitch-shifting’ protects speakers’ identities in voice data for cognitive assessments

Fact checked byShenaz Bagha

Key takeaways:

  • Researchers developed a computational framework to preserve the confidentiality of voice data used for cognitive assessments.
  • The method accurately determined cognitive states in more than 60% of participants.

A “pitch-shifting” technique may help mitigate privacy risks of digital voice analyses while preserving features needed to determine cognitive health, according to researchers.

“Research has demonstrated that digital voice measures can detect early signs of cognitive decline by analyzing features such as speech rate, articulation, pitch variation and pauses, which may signal cognitive impairment when deviating from normative patterns,” Vijaya B. Kolachalama, PhD, FAHA, associate professor of medicine at Boston University’s Chobanian & Avedisian School of Medicine, and colleagues wrote in Alzheimer’s & Dementia.

Image of a waveform of the human voice
New research from Boston University suggests that digital voice analysis may be a reliable, noninvasive method for determining cognitive status, despite concerns over privacy. Image: Adobe Stock

Kolachalama and colleagues developed a computational framework that can alter the pitch of a sound to protect the identity of speakers who provide voice data for cognitive assessments. The framework also incorporates modifications like speed alteration and noise addition, according to a press release related to the study.

The researchers then conducted a study to examine the tool’s ability in distinguishing between normal cognition, mild cognitive impairment and dementia.

The computational framework was applied to WAV file voice recordings of 128 individuals from the Framingham Heart Study (FHS) and 85 others taken by mp3 and converted to WAV files from the DementiaBank Delaware (DBD) corpus, all of whom had their spoken responses to neuropsychological tests recorded.

The researchers measured voice obfuscation via equal error rate (EER) and used machine learning to assess diagnostic utility.

When incorporating the top 20 acoustic features that the researchers said are relevant to cognitive assessment, they found that the framework operated with 62.2% accuracy (EER = 0.3335) in cognitive differentiation on the FHS dataset and 63.7% (EER = 0.1796) on the DBD dataset.

Kolachalama and colleagues noted that the FHS dataset had a longer average speech duration than the DBD dataset (74.36±26.62 minutes vs. 10.81± 4.82 minutes), which resulted in comparatively broader acoustic range with fuller vocal characteristics.

“By leveraging techniques such as pitch-shifting as a means of voice obfuscation, we demonstrated the ability to mitigate privacy risks while preserving the diagnostic value of acoustic features,” Kolachalama said in the release.

Reference:

BU researchers develop computational tools to safeguard privacy without degrading voice-based cognitive markers. https://www.eurekalert.org/news-releases/1076857. Published March 14, 2025. Accessed March 14, 2025.