TY - JOUR
T1 - Content-based Image Retrieval using Tesseract OCR Engine and Levenshtein Algorithm
AU - Adjetey, Charles
AU - Adu-Manu, Kofi Sarpong
N1 - Publisher Copyright:
© 2021. All Rights Reserved.
PY - 2021
Y1 - 2021
N2 - Image Retrieval Systems (IRSs) are applications that allow one to retrieve images saved at any location on a network. Most IRSs make use of reverse lookup to find images stored on the network based on image properties such as size, filename, title, color, texture, shape, and description. This paper provides a technique for obtaining full image document given that the user has some portions of the document under search. To demonstrate the reliability of the proposed technique, we designed a system to implement the algorithm. A combination of Optical Character Recognition (OCR) engine and an improved text-matching algorithm was used in the system implementation. The Tesseract OCR engine and Levenshtein Algorithm was integrated to perform the image search. The extracted text is compared to the text stored in the database. For example, a query result is returned when a significant ratio of 0.15 and above is obtained. The results showed a 100% successful retrieval of the appropriate file base on the match even when partial query images were submitted.
AB - Image Retrieval Systems (IRSs) are applications that allow one to retrieve images saved at any location on a network. Most IRSs make use of reverse lookup to find images stored on the network based on image properties such as size, filename, title, color, texture, shape, and description. This paper provides a technique for obtaining full image document given that the user has some portions of the document under search. To demonstrate the reliability of the proposed technique, we designed a system to implement the algorithm. A combination of Optical Character Recognition (OCR) engine and an improved text-matching algorithm was used in the system implementation. The Tesseract OCR engine and Levenshtein Algorithm was integrated to perform the image search. The extracted text is compared to the text stored in the database. For example, a query result is returned when a significant ratio of 0.15 and above is obtained. The results showed a 100% successful retrieval of the appropriate file base on the match even when partial query images were submitted.
KW - Image Retrieval Systems
KW - Levenshtein Algorithm
KW - Optical Character Recognition (OCR)
KW - Tesseract OCR engine
KW - image processing
KW - text matching algorithm
UR - http://www.scopus.com/inward/record.url?scp=85112207654&partnerID=8YFLogxK
U2 - 10.14569/IJACSA.2021.0120776
DO - 10.14569/IJACSA.2021.0120776
M3 - Article
AN - SCOPUS:85112207654
SN - 2158-107X
VL - 12
SP - 666
EP - 675
JO - International Journal of Advanced Computer Science and Applications
JF - International Journal of Advanced Computer Science and Applications
IS - 7
ER -