Implementasi Tesseract OCR untuk Pembuatan Aplikasi Pengenalan Nota pada Android

Authors

  • Yoel Andreas Program Studi Informatika
  • Kartika Gunadi Program Studi Informatika
  • Anita Nathania Purbowo Program Studi Informatika

Keywords:

wanita dalam dunia konstruksi, jumlah staf wanita, minat, dasar penerimaan, kontraktor, konsultan

Abstract

The development of a practical era makes humans more inclined to find a fast way to do something. The same thing when we want to record the expenses we have spent in the day, of course it takes time to do it. To solve this problem, you can use the application to read the receipt using the Android device's camera, the application can help to record expenses and categorize their expenses. To achieve this, it is necessary to do Optical Character Recognition which can be done using Tesseract-OCR. The results will be processed to get expenses, categories, and item names. To get maximum results, several stages of pre-processing are needed on the image to be used. The test is carried out with a scenary study and tried several cases, for example notes with dotted fonts, or notes that have many lines. The test results show that the OCR results from the Tesseract are very dependent on the pre-processing stage being carried out. Tesseract itself will experience a decrease in performance when processing images with dotted fonts. If the pre-processing stage cannot unite separate letters due to dots, the tesseract has a very drastic decrease in accuracy. Notes with multiple lines also reduce the performance of the tesseract. The results of the tesseract when conducting Handwritten Character Recognition are also affected by how the handwriting are written, if the handwriting is cursive or not neat, then the tesseract will have difficulty in carrying out the HCR process.

References

[1] Apriyanti, K., & Widodo, T.W. 2016. Implementasi Optical Character Recognition Berbasis Backpropagation untuk Text to Speech Perangkat Android. Indonesian Journal of Electronics and Instrumentation Systems (IJEIS). 6(1). 13-24.

[2] Gjoreski, M., Zajkovski, G., Bogatinov, A., Madjarov, G., Gjorgjevikj, D. 2014. Optical character recognition applied on receipts printed in Macedonian Language. International Conference on Informatics and Information Technologies (CIIT). 59-62.

[3] Mobile Operating System Market Share Worldwide – April 2019. 2019. Statcounter. Retrieved from http://gs.statcounter.com/os-market-share/mobile/worldwide

[4] Morphological Transformations. Retrieved from https://docs.opencv.org/3.4/d9/d61/tutorial_py_morphological_ops.html

[5] Structural Analysis and Shape Descriptors. Retrieved from https://docs.opencv.org/2.4/modules/imgproc/doc/structural_analysis_and_shape_descriptors.html

[6] Ullah, R., Sohani, A., Rai, A., Ali, F., & Messier, R. (2018). OCR Engine to Extract Food-Items, Prices, Quantity, Units from Receipt Images, Heuristics Rules Based Approach. International Journal of Scientific & Engineering Research, Vol.9(2). pp. 1334-1341.

Downloads

Published

2020-04-22

Issue

Section

Articles