Perbandingan Character Recognition dan Text Recognition Menggunakan Extended MNIST dan IAM Database dan Tesseract pada Tulisan Tangan Ijazah

Authors

  • Made Yoga Mahardika Program Studi Informatika
  • Kartika Gunadi Program Studi Informatika
  • Alexander Setiawan Program Studi Informatika

Abstract

The problem with handwriting is how a technique can recognize various types of writing in various forms. Different from computer letters that consistent, each human’s handwriting is unique in the form and consistency. These problems can be found in ijazah documents where the data is handwriting.

Data location segmentation uses run length smoothing algorithm with dots as segmentation features. Handwritten text recognition (HTR) technique requires data segmented into words. Handwritten character recognition (HCR) technique requires data segmented into characters. HCR uses the LeNet5 model with the EMNIST dataset. HTR uses tesseract tool and convolutional recurrent neural networks with the IAM database.

Experiment on 10 samples of scan images, segmentation obtained an average accuracy of 95.6%. The HCR technique failed in the letter segmentation process in cursive handwriting. The best technique is the HTR with tesseract tool managed to get word accuracy above 69% tested on 5 scan samples, 15 data fields.

References

[1] Bunke, H., Marti, U. 2002. The IAM-database: an English sentence database for offline handwriting recognition. IJDAR 5, 39–46. DOI: https://doi.org/10.1007/s100320200071

[2] Borlepwar, A. P., Borakhade, S. R., & Pradhan, B. 2017. Run Length Smoothing Algorithm for Segmentation

[3] G. Cohen, S. Afshar, J. Tapson and A. van Schaik. 2017. EMNIST: Extending MNIST to handwritten letters. International Joint Conference on Neural Networks (IJCNN), Anchorage, AK, 2017, pp. 2921-2926, DOI: 10.1109/IJCNN.2017.7966217.

[4] SuperDataScienceTeam. 2018. Convolutional Neural Networks (CNN): Step 4 – Full Connection. (Aug 2018). Retrieved from superdatascience: https://www.superdatascience.com/blogs/convolutional-neural-networks-cnn-step-4-full-connection

[5] Shi, B., Bai, X., & Yao, C. 2016. An End-to-End Trainable Neural Network for Image-based Sequence Recognation and Its Application to Scene Text Recognation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2298-2304.

[6] Supriana, I., Ramadhan, E. 2015. Pengenalan Tulisan Tangan untuk Angka tanpa Pembelajaran. Bandung, Indonesia: Konferensi Nasional Informatika 2015.

[7] Ujjwalkarn. 2017, May 29. An Intuitive Explanation of Convolutional Neural Networks. Retrieved from ujjwalkarn: https://ujjwalkarn.me/2016/08/11/intuitive-explanation-convnets/

[8] Wirayuda, T.A.B., Syilvia, V. & Retno, N.D. (2009). Pengenalan Huruf Komputer Menggunakan Algoritma Berbasis Chain Code dan Algoritma Sequence Alignment, 19-24. Bali, Indonesia: Konferensi Nasional Sistem dan Informatika

Downloads

Published

2020-10-03

Issue

Section

Articles