Public surveillance of social media for suicide using advanced deep learning models in Japan: Time series study from 2012 to 2022

Year: 2023 Source: Journal of Medical Internet Research. (2023). 25, 1-12. DOI: 10.2196/47225 SIEC No: 20231703

Background: Social media platforms have been increasingly used to express suicidal thoughts, feelings, and acts, raising public concerns over time. A large body of literature has explored the suicide risks identified by people’s expressions on social media. However, there is not enough evidence to conclude that social media provides public surveillance for suicide without aligning suicide risks detected on social media with actual suicidal behaviors. Corroborating this alignment is a crucial foundation for suicide prevention and intervention through social media and for estimating and predicting suicide in countries with no reliable suicide statistics.

Objective: This study aimed to corroborate whether the suicide risks identified on social media align with actual suicidal behaviors. This aim was achieved by tracking suicide risks detected by 62 million tweets posted in Japan over a 10-year period and assessing the locational and temporal alignment of such suicide risks with actual suicide behaviors recorded in national suicide statistics.

Methods: This study used a human-in-the-loop approach to identify suicide-risk tweets posted in Japan from January 2013 to December 2022. This approach involved keyword-filtered data mining, data scanning by human efforts, and data refinement via an advanced natural language processing model termed Bidirectional Encoder Representations from Transformers. The tweet-identified suicide risks were then compared with actual suicide records in both temporal and spatial dimensions to validate if they were statistically correlated.

Results: Twitter-identified suicide risks and actual suicide records were temporally correlated by month in the 10 years from 2013 to 2022 (correlation coefficient=0.533; P<.001); this correlation coefficient is higher at 0.652 when we advanced the Twitter-identified suicide risks 1 month earlier to compare with the actual suicide records. These 2 indicators were also spatially correlated by city with a correlation coefficient of 0.699 (P<.001) for the 10-year period. Among the 267 cities with the top quintile of suicide risks identified from both tweets and actual suicide records, 73.5% (n=196) of cities overlapped. In addition, Twitter-identified suicide risks were at a relatively lower level after midnight compared to a higher level in the afternoon, as well as a higher level on Sundays and Saturdays compared to weekdays.

Conclusions: Social media platforms provide an anonymous space where people express their suicidal thoughts, ideation, and acts. Such expressions can serve as an alternative source to estimating and predicting suicide in countries without reliable suicide statistics. It can also provide real-time tracking of suicide risks, serving as an early warning for suicide. The identification of areas where suicide risks are highly concentrated is crucial for location-based mental health planning, enabling suicide prevention and intervention through social media in a spatially and temporally explicit manner.

Table of Contents

Wang, S., Ning, H., Huang, X., Xiao, Y., Zhang, M., Yang, E.F., ... & Zeng, Y.