Objective: To design a Natural Language Processing (NLP) algorithm capable of detecting suicide content from patients’ written communication to their therapists, to support rapid response and clinical decision making in telehealth settings.
Method: A training dataset of therapy transcripts for 1,864 patients was established by detecting patient content endorsing suicidality using a proxy-model anchored on therapists’ suicide prevention interventions; human expert raters then assessed the level of suicide risk endorsed by patients identified by the proxy-model (i.e., no risk, risk factors, ideation, method, or plan). A bag-of-words classification model was then iteratively built using the annotations from the expert raters to detect suicide risk level in 85,216 labeled patients’ sentences from the training dataset.
Results: The final NLP model identified risk-related content from non-risk content with good accuracy (AUC = 82.78).
Conclusions: Risk for suicide could be reliably identified by the NLP algorithm. The risk detection model could assist telehealth clinicians in providing crisis resources in a timely manner. This modeling approach could also be applied to other psychotherapy research tasks to assist in the understanding of how the psychotherapy process unfolds for each patient and therapist.