Using vocal characteristics to classify psychological distress in adult helpline callers: Retrospective observational study
Iyer, R., Nedeljkovic, M., & Meyer, D.
Background: Elevated psychological distress has demonstrated impacts on individuals’ health. Reliable and efficient ways to detect distress are key to early intervention. Artificial intelligence has the potential to detect states of emotional distress in an accurate, efficient, and timely manner.
Objective: The aim of this study was to automatically classify short segments of speech obtained from callers to national suicide prevention helpline services according to high versus low psychological distress and using a range of vocal characteristics in combination with machine learning approaches.
Methods: A total of 120 telephone call recordings were initially converted to 16-bit pulse code modulation format. Short variable-length segments of each call were rated on psychological distress using the distress thermometer by the responding counselor and a second team of psychologists (n=6) blinded to the initial ratings. Following this, 24 vocal characteristics were initially extracted from 40-ms speech frames nested within segments within calls. After highly correlated variables were eliminated, 19 remained. Of 19 vocal characteristics, 7 were identified and validated as predictors of psychological distress using a penalized generalized additive mixed effects regression model, accounting for nonlinearity, autocorrelation, and moderation by sex. Speech frames were then grouped using k-means clustering based on the selected vocal characteristics. Finally, component-wise gradient boosting incorporating these clusters was used to classify each speech frame according to high versus low psychological distress. Classification accuracy was confirmed via leave-one-caller-out cross-validation, ensuring that speech segments from individual callers were not used in both the training and test data.
Results: The sample comprised 87 female and 33 male callers. From an initial pool of 19 characteristics, 7 vocal characteristics were identified. After grouping speech frames into 2 separate clusters (correlation with sex of caller, Cramer’s V =0.02), the component-wise gradient boosting algorithm successfully classified psychological distress to a high level of accuracy, with an area under the receiver operating characteristic curve of 97.39% (95% CI 96.20-98.45) and an area under the precision-recall curve of 97.52 (95% CI 95.71-99.12). Thus, 39,282 of 41,883 (93.39%) speech frames nested within 728 of 754 segments (96.6%) were classified as exhibiting low psychological distress, and 71455 of 75503 (94.64%) speech frames nested within 382 of 423 (90.3%) segments were classified as exhibiting high psychological distress. As the probability of high psychological distress increases, male callers spoke louder, with greater vowel articulation but with greater roughness (subharmonic depth). In contrast, female callers exhibited decreased vocal clarity (entropy), greater proportion of signal noise, higher frequencies, increased breathiness (spectral slope), and increased roughness of speech with increasing psychological distress. Individual caller random effects contributed 68% to risk reduction in the classification algorithm, followed by cluster configuration (23.4%), spectral slope (4.4%), and the 50th percentile frequency (4.2%).
Conclusions: The high level of accuracy achieved suggests possibilities for real-time detection of psychological distress in helpline settings and has potential uses in pre-emptive triage and evaluations of counseling outcomes.