The following work has been conducted on the training set of the ADReSSo Challenge 2021, that consists of 166 speech recordings, where 79 healthy control and 87 "Probable Alzheimer's Disease" participants perform the Cookie Theft picture description task.
Linguistic features of speech refer to the specific characteristics of language used to understand how people communicate, like word choice or sentence structure.
Lexical diversity can be measured by counting the number of different words used in produced speech. Individual affected by Alzheimer's Disease may have a lower number of unique words than healthy control people.
Valence refers to the positive or negative emotional content of a stimulus, while arousal refers to the intensity associated with an emotion (e.g. excitement). Since brain regions involved in emotion regulation and processing can be affected by Alzheimer's Disease, the expression or experience of a full range of emotion can become more challenging. Therefore, individuals with Alzheimer's Disease can show reduced emotional reactivity e.g. less valence and arousal expression.
The noun-to-pronoun ratio, which refers to the ratio of nouns (e.g., dog, kitchen, house) to pronouns (e.g., he, she, they) used in speech.
Being affected by Alzheimer's Disease can lead to a decrease of this pronoun-to-noun ratio. This change may be particularly evident in conversations or tasks that require the use of specific nouns, such as describing a picture or recalling a specific event.
The average word length refers to the average number of letters in words used in speech. As the neurodegenerative disease progresses, individuals may have increasing difficulties with word retrieval and language production, resulting in further reductions in the average word length.
Acoustic features of speech refer to the physical aspects of the signal produced by speech that can be analyzed, such as energy, amplitude or frequency.
To measure pronunciation quality, the confidence level of the Automatic Speech Recognition (ASR) system can be used: it gives an indication of how accurately the system is able to transcribe the spoken words.
More precisely, ASR systems are designed to recognize patterns in speech and convert them into text. The confidence level of an ASR system is a measure of how confident the system is in its transcription of the speech. Thus, a speech with a low quality of pronunciation is more likely to be transcripted with more difficulty by the ASR system.
This can be related to changes in the brain regions responsible for controlling vocal production. Furthermore, it may also be related to changes in their emotional state and cognitive processing abilities. For example, people with Alzheimer's Disease may exhibit decreased emotional expression, leading to reduced loudness and loudness variability in speech.
Spectral flux variability is a measure of how much the pitch and energy of speech change over time. By taking its invere, the consistency of speech can be measured.
Individuals with Alzheimer's Disease have increased variability in their spectral flux, compared to healthy individuals. This means that their spectral flux values are more spread out and less consistent across time.
The F1/F0 amplitude ratio measures the relationship between the pitch of a person's voice (F0) and the shape of their mouth and throat (F1).
For Alzheimer's Disease individuals, F1/F0 amplitude ratio is lower compared to healthy individuals. Thus, the F1 amplitude (related to the vocal tract) is reduced while the F0 amplitude (related to the fundamental frequency) is increased, resulting in a higher-pitched, less resonant voice.
REFERENCES:
Qiao et al, "Alzheimer’s Disease Detection from Spontaneous Speech through Combining Linguistic Complexity and (Dis)Fluency Features with Pretrained Language Models" (2021)
Luz et al, "Detecting cognitive decline using speech only: The ADReSSo Challenge" (2021)
Fraser et al, "Linguistic features identify Alzheimer’s disease in narrative speech" (2015)