TY - JOUR
T1 - SARS detection in chest CT scan images using the bootstrapped ViT-B/16 model
AU - Appati, Justice Kwame
AU - Ziamah, Bless
AU - Akrofi, Herbert Ansah
AU - Dodoo, Albert Ankomah
N1 - Publisher Copyright:
© The Author(s), under exclusive licence to Springer Nature Switzerland AG 2025.
PY - 2025
Y1 - 2025
N2 - This study investigates the application of vision transformer (ViT) models for the automated detection of COVID-19 from chest CT scan images. While convolutional neural networks (CNNs) have been widely used for this task, they face limitations in capturing long-range dependencies and global context. To address these challenges, we explore the potential of ViT models, which leverage self-attention mechanisms to analyze images as sequences of patches. We develop and evaluate two ViT-based approaches: a custom ViT model built from scratch and a fine-tuned pre-trained ViT-B/16 model. Using a dataset of 2482 chest CT scan images (1252 COVID-19 positive and 1230 negative), we compare the performance of these models against state-of-the-art CNN-based methods. Our results demonstrate the superiority of the ViT-based approach, with the fine-tuned ViT-B/16 model achieving an accuracy of 98.83%, precision of 99.29%, recall of 98.23%, and F1-score of 98.76%. This performance surpasses that of existing CNN-based models, including DenseNet201 and VGG19. The study highlights the effectiveness of transfer learning in adapting pre-trained ViT models for COVID-19 detection. It demonstrates the potential of ViT architecture in capturing subtle patterns and global context in medical images. These findings contribute to advancing AI-assisted COVID-19 diagnosis and pave the way for further exploration of transformer-based architectures in medical image analysis.
AB - This study investigates the application of vision transformer (ViT) models for the automated detection of COVID-19 from chest CT scan images. While convolutional neural networks (CNNs) have been widely used for this task, they face limitations in capturing long-range dependencies and global context. To address these challenges, we explore the potential of ViT models, which leverage self-attention mechanisms to analyze images as sequences of patches. We develop and evaluate two ViT-based approaches: a custom ViT model built from scratch and a fine-tuned pre-trained ViT-B/16 model. Using a dataset of 2482 chest CT scan images (1252 COVID-19 positive and 1230 negative), we compare the performance of these models against state-of-the-art CNN-based methods. Our results demonstrate the superiority of the ViT-based approach, with the fine-tuned ViT-B/16 model achieving an accuracy of 98.83%, precision of 99.29%, recall of 98.23%, and F1-score of 98.76%. This performance surpasses that of existing CNN-based models, including DenseNet201 and VGG19. The study highlights the effectiveness of transfer learning in adapting pre-trained ViT models for COVID-19 detection. It demonstrates the potential of ViT architecture in capturing subtle patterns and global context in medical images. These findings contribute to advancing AI-assisted COVID-19 diagnosis and pave the way for further exploration of transformer-based architectures in medical image analysis.
KW - Convolutional neural network
KW - Deep learning
KW - Fine-tuning
KW - Multi-head attention
KW - Patch embeddings
KW - Pre-trained
KW - Vision transformer
UR - http://www.scopus.com/inward/record.url?scp=85217625909&partnerID=8YFLogxK
U2 - 10.1007/s42044-025-00231-1
DO - 10.1007/s42044-025-00231-1
M3 - Article
AN - SCOPUS:85217625909
SN - 2520-8438
JO - Iran Journal of Computer Science
JF - Iran Journal of Computer Science
ER -