Speech Analysis

Understanding relevant speech features to characterize the potential onset of Alzheimer's Disease

Alzheimer's Disease is a neurodegenerative disorder that causes progressive damage to the brain. As the disease progresses, it affects cognitive abilities, including memory, language, and communication.

Therefore, analyzing speech can help to detect potential cognitive decline, that can lead to the onset of Alzheimer's Disease.

More specifically, research shows that linguistic and acoustic features from speech can be used as potential biomarkers for the early detection of Alzheimer's Disease. Changes in these features can indicate cognitive decline, and tracking these changes over time allows for monitoring the disease progression.

The following work has been conducted on the training set of the ADReSSo Challenge 2021, that consists of 166 speech recordings, where 79 healthy control and 87 "Probable Alzheimer's Disease" participants perform the Cookie Theft picture description task.

Linguistic Features

Linguistic features of speech refer to the specific characteristics of language used to understand how people communicate, like word choice or sentence structure. 

Lexical Diversity

People with Alzheimer's Disease tend to speak with a more limited range of words than before, since the disease affects their ability to remember and use different words.

Lexical diversity can be measured by counting the number of different words used in produced speech. Individual affected by Alzheimer's Disease may have a lower number of unique words than healthy control people.

Figure 1 - Boxplots of number of unique words in the picture description task to evaluate lexical diversity, for both Control and AD subjects. Control speech has an average numbere of unique words of 53.18 (standard deviation: 20.81), while AD speech has an average of 41.61 (standard deviation: 17.26).

Sentiment Expression: Valence & Arousal

Individuals with Alzheimer's Disease can experience a blunting of affect, which can lead to a reduction of expressed sentiments. This may result in less emotionally charged descriptions: speech can show reduced valence and arousal expression.

Valence refers to the positive or negative emotional content of a stimulus, while arousal refers to the intensity associated with an emotion (e.g. excitement). Since brain regions involved in emotion regulation and processing can be affected by Alzheimer's Disease, the expression or experience of a full range of emotion can become more challenging. Therefore, individuals with Alzheimer's Disease can show reduced emotional reactivity e.g. less valence and arousal expression.

Figure 2 - Sentiment Analysis boxplots of valence (left) and arousal (right) expression scores, for both Control and AD subjects. Control speech has an average valence expression of 0.52 (standard deviation: 0.06), while AD speech has an average of 0.49 (standard deviation: 0.08). Regarding arousal, control speech has an average neutrality of 0.53 (standard deviation: 0.06), while AD speech has an average of 0.49 (standard deviation: 0.08).

Nouns VS Pronouns Usage

As Alzheimer's Disease progresses, individuals may experience increasing difficulties with word retrieval and entity naming, which can result in using fewer nouns and more pronouns.

The noun-to-pronoun ratio, which refers to the ratio of nouns (e.g., dog, kitchen, house) to pronouns (e.g., he, she, they) used in speech. 

Figure 3 - Boxplots of noun-to-pronoun ratios, for both Control and AD subjects. Control speech has an average noun-to-pronoun ratio of 2.12 (standard deviation: 1.23), while AD speech has an average of 1.71 (standard deviation: 1.79).

Being affected by Alzheimer's Disease can lead to a decrease of this pronoun-to-noun ratio. This change may be particularly evident in conversations or tasks that require the use of specific nouns, such as describing a picture or recalling a specific event.

Word Length

In Alzheimer's Disease, people are more likely to use shorter words than before, since the disease cause difficulties remembering and using complex words.

The average word length refers to the average number of letters in words used in speech. As the neurodegenerative disease progresses, individuals may have increasing difficulties with word retrieval and language production, resulting in further reductions in the average word length.

Figure 4 - Boxplots of average word length values, for both Control and AD subjects. Control speech has an average word length of 3.20 (standard deviation: 0.16), while AD speech has an average of 3.05 (standard deviation: 0.18).

Acoustic Features

Acoustic features of speech refer to the physical aspects of the signal produced by speech that can be analyzed, such as energy, amplitude or frequency. 

Quality of Pronunciation

Alzheimer's Disease can affect the ability to articulate words correctly, leading to changes in the pronunciation of certain words. 

To measure pronunciation quality, the confidence level of the Automatic Speech Recognition (ASR) system can be used: it gives an indication of how accurately the system is able to transcribe the spoken words.

Figure 1 - Boxplots of the ASR system's confidence level to evaluate the quality of pronunciation, for both Control and AD subjects. Control speech has an average confidence level of -0.39 (standard deviation: 0.11), while AD speech has an average of -0.44 (standard deviation: 0.14). All speech samples with a confidence level below -1.0 are considered as failed regarding the transcription task.

More precisely, ASR systems are designed to recognize patterns in speech and convert them into text. The confidence level of an ASR system is a measure of how confident the system is in its transcription of the speech. Thus, a speech with a low quality of pronunciation is more likely to be transcripted with more difficulty by the ASR system.

Loudness: General Level & Variability

Individuals with Alzheimer's Disease can exhibit changes in their speech volume, including decreased general loudness level and reduced loudness variability.

This can be related to changes in the brain regions responsible for controlling vocal production. Furthermore, it may also be related to changes in their emotional state and cognitive processing abilities. For example, people with Alzheimer's Disease may exhibit decreased emotional expression, leading to reduced loudness and loudness variability in speech.

Figure 2 - Boxplots of loudness median values (left) and number of loudness peaks per second (right) to evaluate speech loudness, for both Control and AD subjects. Control speech has an average loudness median of 0.51 (standard deviation: 0.41), while AD speech has an average of 0.33 (standard deviation: 0.26). Regarding the number of loudness peaks per second, control speech has an average number of 1.69 (standard deviation: 0.69), while AD speech has an average of 1.46 (standard deviation: 0.63).

Speech Consistency

Changes in the vocal tract and vocal folds caused by Alzheimer's Disease can lead to a less consistent speech, then more variability in spectral flux.

Spectral flux variability is a measure of how much the pitch and energy of speech change over time. By taking its invere, the consistency of speech can be measured.

Individuals with Alzheimer's Disease have increased variability in their spectral flux, compared to healthy individuals. This means that their spectral flux values are more spread out and less consistent across time.

Figure 3 - Boxplots of spectral flux standard deviation inverse values to evaluate the consistency of speech, for both Control and AD subjects. Control speech has an average spectral flux inverse deviation of 0.75 (standard deviation: 0.22), while AD speech has an average of 0.64 (standard deviation: 0.17).

Voice Pitch & Resonance

Vocal tract and folds changes can also affect the way the sound is produced, resulting in a more breathy or strained speech, with higher in pitch and less resonant voice.

The F1/F0 amplitude ratio measures the relationship between the pitch of a person's voice (F0) and the shape of their mouth and throat (F1).

Figure 4 - Boxplots of F1/F0 amplitude ratio mean values to evaluate the relationship between voice pitch and resonance, for both Control and AD subjects. Control speech has an average F1/F0 amplitude mean value of -82.62 (standard deviation: 32.24), while AD speech has an average of -94.36 (standard deviation: 35.21).

For Alzheimer's Disease individuals, F1/F0 amplitude ratio is lower compared to healthy individuals. Thus, the F1 amplitude (related to the vocal tract) is reduced while the F0 amplitude (related to the fundamental frequency) is increased, resulting in a higher-pitched, less resonant voice.

REFERENCES: