Implementasi Tesseract OCR untuk Pembuatan Aplikasi Pengenalan Nota pada Android

Yoel Andreas(1*), Kartika Gunadi(2), Anita Nathania Purbowo(3),


(1) Program Studi Informatika
(2) Program Studi Informatika
(3) Program Studi Informatika
(*) Corresponding Author

Abstract


The development of a practical era makes humans more inclined to find a fast way to do something. The same thing when we want to record the expenses we have spent in the day, of course it takes time to do it. To solve this problem, you can use the application to read the receipt using the Android device's camera, the application can help to record expenses and categorize their expenses. To achieve this, it is necessary to do Optical Character Recognition which can be done using Tesseract-OCR. The results will be processed to get expenses, categories, and item names. To get maximum results, several stages of pre-processing are needed on the image to be used. The test is carried out with a scenary study and tried several cases, for example notes with dotted fonts, or notes that have many lines. The test results show that the OCR results from the Tesseract are very dependent on the pre-processing stage being carried out. Tesseract itself will experience a decrease in performance when processing images with dotted fonts. If the pre-processing stage cannot unite separate letters due to dots, the tesseract has a very drastic decrease in accuracy. Notes with multiple lines also reduce the performance of the tesseract. The results of the tesseract when conducting Handwritten Character Recognition are also affected by how the handwriting are written, if the handwriting is cursive or not neat, then the tesseract will have difficulty in carrying out the HCR process.


Keywords


Optical Character Recognition; Tesseract; Receipt; OCR; Handwritten Character Recognition; HCR

Full Text:

PDF

References


Apriyanti, K., & Widodo, T.W. 2016. Implementasi Optical Character Recognition Berbasis Backpropagation untuk Text to Speech Perangkat Android. Indonesian Journal of Electronics and Instrumentation Systems (IJEIS). 6(1). 13-24.

Gjoreski, M., Zajkovski, G., Bogatinov, A., Madjarov, G., Gjorgjevikj, D. 2014. Optical character recognition applied on receipts printed in Macedonian Language. International Conference on Informatics and Information Technologies (CIIT). 59-62.

Mobile Operating System Market Share Worldwide – April 2019. 2019. Statcounter. Retrieved from http://gs.statcounter.com/os-market-share/mobile/worldwide

Morphological Transformations. Retrieved from https://docs.opencv.org/3.4/d9/d61/tutorial_py_morphological_ops.html

Structural Analysis and Shape Descriptors. Retrieved from https://docs.opencv.org/2.4/modules/imgproc/doc/structural_analysis_and_shape_descriptors.html

Ullah, R., Sohani, A., Rai, A., Ali, F., & Messier, R. (2018). OCR Engine to Extract Food-Items, Prices, Quantity, Units from Receipt Images, Heuristics Rules Based Approach. International Journal of Scientific & Engineering Research, Vol.9(2). pp. 1334-1341.


Refbacks

  • There are currently no refbacks.


Jurnal telah terindeks oleh :