TY - GEN
T1 - Benchmarking Akan ASR Models Across Domain-Specific Datasets
T2 - Future Technologies Conference, FTC 2025
AU - Mensah, Mark Atta
AU - Wiafe, Isaac
AU - Ekpezu, Akon
AU - Appati, Justice Kwame
AU - Abdulai, Jamal Deen
AU - Wiafe-Akenten, Akosua Nyarkoa
AU - Yeboah, Frank Ernest
AU - Odame, Gifty
N1 - Publisher Copyright:
© The Author(s), under exclusive license to Springer Nature Switzerland AG 2026.
PY - 2026
Y1 - 2026
N2 - Most existing automatic speech recognition (ASR) research evaluates models using in-domain datasets. However, they seldom evaluate how they generalize across diverse speech contexts. This study addresses this gap by benchmarking seven Akan ASR models built on transformer architectures, such as Whisper and Wav2Vec2, using four Akan speech corpora to determine their performance. These datasets encompass various domains, including culturally relevant image descriptions, informal conversations, biblical scripture readings, and spontaneous financial dialogues. A comparison of the word error rate and character error rate highlighted domain dependency, with models performing optimally only within their training domains, while showing marked accuracy degradation in mismatched scenarios. This study also identified distinct error behaviors between the Whisper and Wav2Vec2 architectures. Whereas fine-tuned Whisper Akan models led to more fluent but potentially misleading transcription errors, Wav2Vec2 produced more obvious yet less interpretable outputs when encountering unfamiliar inputs. This trade-off between readability and transparency in ASR errors should be considered when selecting architectures for low-resource language (LRL) applications. These findings highlight the need for targeted domain adaptation techniques, adaptive routing strategies, and multilingual training frameworks for Akan and other LRLs.
AB - Most existing automatic speech recognition (ASR) research evaluates models using in-domain datasets. However, they seldom evaluate how they generalize across diverse speech contexts. This study addresses this gap by benchmarking seven Akan ASR models built on transformer architectures, such as Whisper and Wav2Vec2, using four Akan speech corpora to determine their performance. These datasets encompass various domains, including culturally relevant image descriptions, informal conversations, biblical scripture readings, and spontaneous financial dialogues. A comparison of the word error rate and character error rate highlighted domain dependency, with models performing optimally only within their training domains, while showing marked accuracy degradation in mismatched scenarios. This study also identified distinct error behaviors between the Whisper and Wav2Vec2 architectures. Whereas fine-tuned Whisper Akan models led to more fluent but potentially misleading transcription errors, Wav2Vec2 produced more obvious yet less interpretable outputs when encountering unfamiliar inputs. This trade-off between readability and transparency in ASR errors should be considered when selecting architectures for low-resource language (LRL) applications. These findings highlight the need for targeted domain adaptation techniques, adaptive routing strategies, and multilingual training frameworks for Akan and other LRLs.
KW - Akan ASR
KW - Automatic speech recognition
KW - Cross-dataset validation
KW - Low-resource languages
KW - Natural language processing
KW - Transformer models
UR - https://www.scopus.com/pages/publications/105028355105
U2 - 10.1007/978-3-032-07995-4_15
DO - 10.1007/978-3-032-07995-4_15
M3 - Conference contribution
AN - SCOPUS:105028355105
SN - 9783032079947
T3 - Lecture Notes in Networks and Systems
SP - 210
EP - 225
BT - Proceedings of the Future Technologies Conference, FTC 2025, Volume 3
A2 - Arai, Kohei
PB - Springer Science and Business Media Deutschland GmbH
Y2 - 6 November 2025 through 7 November 2025
ER -