Translate this page into:
Artificial Intelligence-driven Assessment of Sleep Quality: Comparing Artificial Intelligence-generated Sleep Questionnaire with Pittsburgh Sleep Quality Index in Undergraduate Medical Students

*Corresponding author: B. Dharani, Department of Physiology, Dr. M.G.R. Educational and Research Institute, Chennai, Tamil Nadu, India. doctordharanibhaskaran@gmail.com
-
Received: ,
Accepted: ,
How to cite this article: Dharani B, Suba A, Abeetha S. Artificial Intelligence-driven Assessment of Sleep Quality: Comparing Artificial Intelligence-generated Sleep Questionnaire with Pittsburgh Sleep Quality Index in Undergraduate Medical Students. Glob J Med Pharm Biomed Update. 2025;20:13. doi: 10.25259/GJMPBU_31_2025
Abstract
Objectives:
Recently, artificial intelligence (AI) has been increasingly utilized in sleep medicine for various tasks, including scoring respiratory events and staging sleep. Regardless of these AI developments in the field of sleep, sleep assessment devices require stringent validation to enhance the reliability and accuracy of their reports. The present study aims to analyze the performance of an AI-generated assessment tool of sleep quality in comparison with the Pittsburgh Sleep Quality Index (PSQI) among undergraduate medical students. By evaluating this agreement and comparing these two assessment tools, we aim to explore the potential of AI in expanding personalized sleep medicine.
Material and Methods:
This was a cross-sectional study conducted among 300 undergraduate medical students using two sleep assessment tools. The first is the traditional PSQI tool, and the second is an AI-generated sleep quality assessment tool for undergraduate medical students, structured and designed by Chat-GPT, which utilizes the same seven domains as the PSQI.
Results:
The current study found out that the mean score of the AI-generated assessment (10.65 ± 2.30) was lower than the mean PSQI score (11.94 ± 2.45). In addition, a slight agreement was found between these two scores, which was shown by Cohen’s weighted kappa coefficient of 0.133. This suggests that the AI tool may underestimate the sleep quality assessment compared to PSQI scoring.
Conclusion:
While the AI-generated questionnaire for sleep quality assessment has the potential benefit of scalability and automation in research, the present study highlights the crucial necessity for careful training and validation of AI tools.
Keywords
Artificial intelligence questionnaire
Chat-GPT
Personalized medicine
Sleep quality
Young adults
INTRODUCTION
Sleep is regarded as a crucial component in maintaining homeostasis, mental integrity, cognitive well-being, and physical health.[1] A normal sleep-wake cycle helps provide the sufficient sleep needed to maintain a resting state of the body.
Previous studies have shown that a person’s sleep habit is, in general, heritable, which is correlated with the structure and function of the brain and is fundamentally related to overall health.[2] Furthermore, it has been found that recent technological advances have highlighted the integral role played by sleep in physical and cognitive well-being.[3] Sleep disorders are increasingly prevalent among young individuals.[4,5]
Historically, the Pittsburgh Sleep Quality Index (PSQI) has been established as a gold standard for determining a person’s sleep quality. It was developed in 1989. It is a self-performed questionnaire that analyses the quality of sleep. It encompasses various dimensions, including sleep duration, disturbances, and latency.[6]
Recently, the emergence of artificial intelligence (AI) has been prominent and is evolving as an innovative tool in the healthcare sector. AI has been increasingly utilized in various aspects of sleep medicine, including scoring respiratory events, staging sleep, predicting circadian rhythms, diagnosing insomnia, and profiling obstructive sleep apnea (OSA).[7] These upgrades suggest that AI has immense potential to radically change the assessments of sleep by extending more efficient and personalized evaluations.
It was found out that evolved machine learning algorithms can assess and analyze complex sleep data, such as polysomnography recordings, to pinpoint specific patterns of sleep disorders. This increases the effectiveness of the diagnosis and its accuracy. This will pave the way for the development of personalized treatment plans as part of a person’s unique sleep profile.[8]
Devices such as the Belun Ring use AI to track the stages of sleep and identify conditions like OSA, offering real-time data that can inform customized interventions. This demonstrates that the integration of AI in wearable technology enhances the effectiveness of personalized sleep medicine. Furthermore, AI-enabled software may provide personalized recommendations for improving sleep hygiene by analyzing user data and suggesting behavioral modifications. These breakthroughs in AI enhance its potential to redefine sleep medicine by enabling tailored approaches that can address individual preferences and needs.[9]
Regardless of these AI developments in the field of sleep, sleep assessment devices require stringent validation to enhance the reliability and accuracy of their reports. Studies that compare sleep assessment tools, such as the PSQI, with AI-driven assessments are crucial for analyzing the efficacy of AI in sleep medicine.
The present study aims to analyze the performance of an AI-generated assessment tool of sleep quality in comparison with the conventional sleep assessment tool, PSQI, among undergraduate medical students. By evaluating this agreement and comparing these two assessment tools, we aim to explore the potential of AI in expanding personalized sleep medicine.
Aim
This study aims to analyze the effectiveness and performance of an AI-generated assessment tool for sleep quality by comparing it with the PSQI among undergraduate medical students and to investigate its potential integrated application in personalized sleep medicine.
MATERIAL AND METHODS
Ethical considerations
The current study was conducted after obtaining Institutional Ethics Committee clearance (IEC NO: 562/2022/IEC/ ACSMCH). All participants were thoroughly informed about the study, and written informed consent was obtained from them prior to data collection. The anonymity and confidentiality of the participants were maintained.
Study design
The current study is a cross-sectional study that was conducted among 300 undergraduate medical students.
Study participants
Undergraduate medical students who belong to age group of 18–30 years and had given informed and written consent to participate in the present study were included in the study.
Undergraduate medical students with any history of OSA, acute or chronic cardiac or respiratory illness, other sleep disorder, substance use, chronic diseases like diabetes, arthritis, or night shift work were excluded.
Assessment tools
Two different sleep quality assessment tools were used. The first one is the traditional PSQI tool, a validated sleep quality assessment tool used over the past month.
The second tool is an AI-generated sleep quality assessment tool for undergraduate medical students, structured and designed by Chat-GPT, which utilizes the same seven domains of the PSQI [Supplementary File].
In this study, Chat-GPT was chosen as an AI tool because of its sophisticated Natural Language Processing capabilities, adaptability, and easy accessibility in obtaining personalized questionnaire-based assessments.[10] Furthermore, it has the ability to facilitate human-like conversational interactions, allowing for a user-friendly and engaging experience compared to conventional static questionnaire formats.[11] Hence, we have chosen Chat-GPT as a potential tool that could align with PSQI tool.
Collection of data
The study participants were administered both AI-generated and PSQI sleep quality assessment questionnaires that cover the seven similar domains of sleep.
From the collected data, statistical analysis was performed using the Statistical Package for the Social Sciences (SPSS) software, version 20. Normality was first assessed using the Shapiro–Wilk test. Since the data obtained was not normally distributed, we have used Wilcoxon signed-rank test to compare the two tools. Agreement about the categorization between them was performed using Cohen’s weighted kappa coefficient. Cross-tabulation was performed to identify discrepancies in classification between the two tools.
RESULTS
The mean scores of the AI-generated sleep assessment questionnaire and the PSQI questionnaire were found to be not normally distributed by the Shapiro–Wilk test (P < 0.0001); therefore, they were compared using the Wilcoxon signed-rank test. The results show a statistically significant difference between the two sleep assessment tools with a Wilcoxon statistic of 8841.0 with P < 0.0001. This proves that one of the two assessment tools systematically underestimates or overestimates sleep disturbances.
The mean PSQI score was 11.94 ± 2.45, and for the AI-sleep assessment score, it was 10.65 ± 2.30. From these findings, it is derived that AI-tool score was lower than PSQI score. This highlights the finding that AI-generated tools tend to underestimate sleep quality when compared to the PSQI.
To find the agreement between AI-generated classification and PSQI classification of sleep quality, Cohen’s weighted kappa coefficient was computed. The resulting value was 0.133, which indicates a slight agreement between these two classifications. Cross-tabulation has found that AI-generated questionnaires often assign different categories to participants than the categories of the PSQI.
Table 1 depicts the comparison of PSQI and AI tool sleep quality using Cohen’s Kappa Cross-Tabulation. It was found out that AI misclassified 45 participants with poor sleep quality as having good sleep. Furthermore, it has underestimated 25 participants with severe sleep disturbances as poor sleep. Approximately four participants were accurately identified by AI as having severe sleep disturbances. Thus, this low agreement between these two tools explains the statistical Cohen’s kappa value of 0.133.
| PSQI sleep category | AI: Good sleep | AI: Poor sleep | AI: Severe sleep disturbances |
|---|---|---|---|
| Poor sleep | 45 | 37 | 0 |
| Severe sleep disturbances | 25 | 189 | 4 |
AI: Artificial intelligence
This form of low agreement (kappa = 0.133) between the two tools indicates that the AI-generated questionnaire categorizes the sleep quality of participants differently compared to the PSQI. This could be due to a difference in scoring sensitivity that potentially underestimates the quality of sleep. The analysis shows that AI-generated questionnaire classifies participants with moderate sleep quality as good sleep quality rather than categorizing them under poor sleep quality. This suggests a lower sensitivity in identifying individuals with severe sleep disturbances, potentially leading to discrepancies in classification.
DISCUSSION
From the current study [Figure 1], it was found that a statistically significant difference was observed between PSQI scores and AI-generated sleep quality assessment scores among the undergraduate medical students. The mean score of the AI-generated assessment (10.65 ± 2.30) was lower than the mean PSQI score (11.94 ± 2.45), indicating that the AI tool might underestimate sleep quality assessment relative to PSQI scoring.

- Diagrammatic representation of Pittsburgh sleep quality index versus artificial intelligence sleep assessment comparison. PSQI: Pittsburgh sleep quality index, AI: Artificial intelligence
In addition, a slight agreement was found between these two assessment scores, as indicated by a Cohen’s weighted kappa coefficient of 0.133. This again highlights the potential discrepancy in classification of sleep quality.
PSQI is a well-known validated questionnaire for the assessment of sleep quality among various groups with proven validity and reliability. Previous studies have revealed a good internal consistency that is proven with a Cronbach’s alpha values from 0.69 to 0.84 and good test-retest reliability.[12] When these results were compared with the present study, it was found to be in contrast as the AI-generated sleep quality assessment tool in the current study was found to have a lower kappa coefficient that shows a slight agreement with PSQI.
This form of discrepancy reveals that the AI-tool lacks sensitivity in identifying severe forms of sleep disturbances leading to misinterpreting participants with moderate forms of sleep disturbances as having good quality sleep.
Even though AI-driven sleep assessment has the advantage of scalability and systematization, its exactness is contingent upon the data quality and the solidity of the algorithms engaged. Recent developments in AI-driven assessment of sleep quality tools have revealed potential with few models attaining noteworthy agreement in categorizing sleep quality.[13,14] For example, a previous study has shown that an AI model classified sleep stages with Cohen’s kappa values spanning from 0.70 to 0.84.[15] Nevertheless, these AI-tools need a vast data for training and high-tech refined modeling to meet the standards and reliability of validated existing tools like PSQI.
Furthermore, the underestimation of the AI-tool that was found in the present study highlights the limited ability of AI to analyze subjective factors of sleep quality that are inbuilt components of PSQI. Furthermore, the algorithm of the AI-tool may not be adequate enough to capture the individual variations in patterns of sleep and other environmental components that influence sleep quality. Table 2 summarizes the pros and cons of using AI-generated sleep questionnaire versus PSQI, using the result observed from this study.
| Key aspects | PSQI | AI-generated sleep questionnaire |
|---|---|---|
| Validation of tool | Globally validated sleep assessment tool | Not yet validated |
| Accuracy | Proven reliability with high specificity and sensitivity | Likely to underestimate sleep quality with reported low sensitivity and agreement (kappa=0.133) from the current study |
| Reliability of sleep quality categorization | It has been proven to provide accurate categorization and was traditionally used as a reference standard. | Tends to misclassify moderate or severe sleep disturbances with reported slight agreement with PSQI |
| Analysis of subjective component | Consists of both objective and subjective components of sleep quality | It was observed to have limited capability to assess the subjective component of sleep |
| Scalability | Manual scoring is done conventionally and hence it is less scalable | It is highly scalable because it is fully automated and can be used for large-scale population studies |
| Time consumption | Consumes a lot of time to score manually and to interpret | Very easy and faster to assess, analyze, and interpret |
| Price | Might need professional administration support or licensing to use | After proper development and validation, it becomes a cost-effective option |
| Customization | Since it has a static structure, it is not customizable | It can be updated regularly and tailored according to the study population |
| Transparency of analysis | Totally transparent in regard to scoring and interpretation | Lack transparency in analysis since they are regarded as “black box” models |
| Role of data quality | Since it has validated scoring framework, it is totally independent of training data | The analysis and performance are highly dependent upon model robustness and training data |
AI: Artificial intelligence, PSQI: Pittsburgh sleep quality index
Limitation and future directions
From the present study, it was found out that an AI-generated sleep quality assessment tool powered by ChatGPT tends to underestimate the quality of sleep when compared to a conventional, validated PSQI tool.
To enhance the specificity and sensitivity of AI-driven sleep assessment tools, several suggestions should be considered for future research.
The questions formulated by AI tools need to be refined rigorously to align with validated tools like PSQI. This can be achieved by refining the questions that can identify both objective and subjective sleep quality to enhance accuracy as part of a personalized assessment. Integrating the questionnaire data with data from wearable devices like actigraphy might provide improved categorization accuracy.[16,17] It is essential to fine-tune the AI model with datasets on sleep, accounting for various groups of people across different demographics to increase the capability of AI to identify even minor variations in the quality of sleep.
Frequent calibration and validation of AI-generated sleep assessment scores against gold-standard instruments like polysomnography are required to obtain a consistent and accurate categorization of sleep quality.[18]
Further, the preference of study participants between PSQI and AI-generated sleep questionnaire was not assessed in the current study, which holds a potential limitation and an area for further exploration.
When these points are combined with AI advancements, these tools can become more reliable, accurate, and clinically relevant, thereby facilitating their broader use in research and clinical settings of sleep medicine. Future studies shall explore such modified healthcare AI-versions to individualize and ease AI-driven sleep evaluation methods with low cost and time.
CONCLUSION
In the end, while an AI-generated questionnaire for sleep quality assessment has potential benefits in research due to its scalability and automation, the present study underscores the crucial necessity for careful training of AI algorithms, refinement, and validation of AI tools.
Therefore, these AI tools can provide highly sensitive, reliable, and accurate sleep quality assessments. In the future, further research should focus on improving AI algorithms by utilizing diverse data inputs to align more closely with existing, validated instruments, such as the PSQI.
With such advancements in AI, the AI-generated sleep questionnaire may potentially complement and replace conventional sleep tools, such as the PSQI, in the context of research and clinical practice, paving the way for personalized sleep medicine.
Ethical approval:
The research/study was approved by the Institutional Review Board at A.C.S MEDICAL COLLEGE AND HOSPITAL, number 562/2022/IEC/ACSMCH, dated 6th October, 2022.
Declaration of patient consent:
The authors certify that they have obtained all appropriate patient consent.
Conflicts of interest:
There are no conflicts of interest.
Use of artificial intelligence (AI)-assisted technology for manuscript preparation:
The authors confirm that there was no use of artificial intelligence (AI)-assisted technology in the writing or editing of the manuscript and no images were manipulated using AI.
Financial support and sponsorship: Nil.
References
- Application of Noninvasive Brain Stimulation for Sleep Quality Enhancement and Cognitive Improvement. Stud Health Technol Inform. 2023;308:556-61.
- [CrossRef] [PubMed] [Google Scholar]
- The Interrelation of Sleep and Mental and Physical Health is Anchored in Grey-matter Neuroanatomy and Under Genetic Control. Commun Biol. 2020;3:171.
- [CrossRef] [PubMed] [Google Scholar]
- The Functions of Sleep: A Cognitive Neuroscience Perspective. Proc Natl Acad Sci U S A. 2022;119:e2201795119.
- [CrossRef] [PubMed] [Google Scholar]
- Prevalence of Obstructive Sleep Apnoea Risk and Its Association with Anthropometric Indices of Cardiometabolic Risks and Cognition in Young and Middle-aged Adults. Indian J Physiol Pharmacol. 2024;68:42-9.
- [CrossRef] [Google Scholar]
- A Comparative Analysis of Partial Sleep Restriction Versus Split Sleep Regimen On Cognitive Processing, Declarative Memory and Affective Behaviour in Nursing Students. Indian J Physiol Pharmacol. 2024;68:316-24.
- [CrossRef] [Google Scholar]
- The Pittsburgh Sleep Quality Index: A New Instrument for Psychiatric Practice and Research. Psychiatry Res. 1989;28:193-213.
- [CrossRef] [PubMed] [Google Scholar]
- Artificial Intelligence in Sleep Medicine: Present and Future. World J Clin Cases. 2023;11:8106-10.
- [CrossRef] [PubMed] [Google Scholar]
- Artificial Intelligence in Sleep Medicine: The Dawn of a New Era. Nat Sci Sleep. 2024;16:445-50.
- [CrossRef] [PubMed] [Google Scholar]
- Revolutionizing Sleep Health: The Emergence and Impact of Personalized Sleep Medicine. J Pers Med. 2024;14:598.
- [CrossRef] [PubMed] [Google Scholar]
- Applications of the Natural Language Processing Tool ChatGPT in Clinical Practice: Comparative Study and Augmented Systematic Review. JMIR Med Inform. 2023;11:e48933.
- [CrossRef] [PubMed] [Google Scholar]
- ChatGPT: Perspectives from Human-computer Interaction and Psychology. Front Artif Intell. 2024;7:1418869.
- [CrossRef] [PubMed] [Google Scholar]
- Test-retest Reliability and Validity of the Pittsburgh Sleep Quality Index in Primary Insomnia. J Psychosom Res. 2002;53:737-40.
- [CrossRef] [PubMed] [Google Scholar]
- Artificial Intelligence in Sleep Medicine: Background and Implications for Clinicians. J Clin Sleep Med. 2020;16:609-18.
- [CrossRef] [PubMed] [Google Scholar]
- Enhanced Sleep Staging with Artificial Intelligence: A Validation Study of New Software for Sleep Scoring. Front Artif Intell. 2023;6:1278593.
- [CrossRef] [PubMed] [Google Scholar]
- Clinical Applications of Artificial Intelligence in Sleep Medicine: A Sleep Clinician's Perspective. Sleep Breath. 2023;27:39-55.
- [CrossRef] [PubMed] [Google Scholar]
- AI-Driven Sleep Staging from Actigraphy and Heart Rate. PLoS One. 2023;18:e0285703.
- [CrossRef] [PubMed] [Google Scholar]
- Validation of Zulu Watch against Polysomnography and Actigraphy for On-wrist Sleep-wake Determination and Sleep-depth Estimation. Sensors (Basel). 2020;21:76.
- [CrossRef] [PubMed] [Google Scholar]

