Developing a spontaneous speech-based artificial intelligence for Alzheimer's disease detection

Laurine Dargaud, Abhista Partal, Anton Birn, Nicki Skafte Detlefsen


Background: Alzheimer's Disease (AD) is a neurodegenerative syndrome affecting over 35 million elderly people worldwide and ranking, with other dementia forms, as the seventh leading cause of death. Since the FDA has recently accepted promising treatments for reducing cognitive decline, there is a growing need for fast and reliable detection of AD. Meanwhile, research shows spontaneous speech provides valuable insights into brain cognitive abilities, as AD patients exhibit subtle speech alterations that a trained speech-based AI can recognize.

Objective: This study aims to present a machine learning model developed for the detection of AD based on picture description speech samples. 

Method: The proposed model processes speech to predict the patient's state (Healthy/AD) and the corresponding class probabilities. It follows a multi-modal approach, analyzing both linguistic (what is said) and acoustic (how it is said) features extracted from pre-trained language models (BERT and GPT-3) and an automatic speech recognition model (Whisper). The dataset used is the AD classification dataset, from the ADReSSo Challenge, which contains 237 Cookie Theft picture description recordings produced by cognitively normal subjects and AD-diagnosed patients. 166 recordings are used for training (with 52% AD), and 71 for testing (with 50% AD). Leave-One-Out cross-validation is used for training, feature selection, and model validation. The final prediction is determined using a soft majority voting system with 12 classifiers.

Results: On the unseen test set, the proposed model achieved an accuracy of 87.32% (ROC AUC = 94%), an F1-score of 86.57%, a specificity of 91.67%, and a sensitivity of 82.86%. An absolute Pearson correlation coefficient with a standard cognitive test (MMSE) of 0.725 is reached. Conclusions: This study shows the potential of AI-empowered cognitive assessment tools based on speech, such as language models and automatic speech recognition, to detect AD, yielding promising results similar to existing pen-and-paper tests.

Keywords: artificial intelligence, large language model, automatic speech recognition, Alzheimer’s disease, speech biomarker


In this work, the dataset from the ADReSSo challenge made for the Alzheimer’s disease classification task, is employed. 

This dataset contains the speech recording of English descriptions of the Cookie Theft picture from the Boston Diagnostic Aphasia Examination.

Data is split into two distinct groups: healthy control individuals with normal cognition (HC) and individuals diagnosed with Alzheimer's Disease (AD). A total of 237 participants contributed to this dataset, with 166 individuals included in the training subset and 71 in the test subset (ratio: 70:30). To reduce bias during training, age, and gender are matched with propensity scores.

“Cookie Theft” picture,
from the Boston Diagnostic Aphasia Examination


The final multi-model model reaches an accuracy of 87.32% (F1-Score: 86.57%), with a high specificity of 91.67% and a sensitivity of 82.86% on the unseen test set. In addition, Youden’s J index of 74.52% is obtained, which is a common indicator of the precision of a diagnostic tool in the medical field.

Confusion Matrix on the test set