Perbandingan Character Recognition dan Text Recognition Menggunakan Extended MNIST dan IAM Database dan Tesseract pada Tulisan Tangan Ijazah

Made Yoga Mahardika, Kartika Gunadi, Alexander Setiawan

Abstract


The problem with handwriting is how a technique can recognize various types of writing in various forms. Different from computer letters that consistent, each human’s handwriting is unique in the form and consistency. These problems can be found in ijazah documents where the data is handwriting.

Data location segmentation uses run length smoothing algorithm with dots as segmentation features. Handwritten text recognition (HTR) technique requires data segmented into words. Handwritten character recognition (HCR) technique requires data segmented into characters. HCR uses the LeNet5 model with the EMNIST dataset. HTR uses tesseract tool and convolutional recurrent neural networks with the IAM database.

Experiment on 10 samples of scan images, segmentation obtained an average accuracy of 95.6%. The HCR technique failed in the letter segmentation process in cursive handwriting. The best technique is the HTR with tesseract tool managed to get word accuracy above 69% tested on 5 scan samples, 15 data fields.

Keywords


run length smoothing algorithm; Extended MNIST; IAM database; handwritten character recognition; handwritten text recognition; convolutional recurrent neural network; tesseract; segmentasi; ijazah

Full Text:

PDF

References


Bunke, H., Marti, U. 2002. The IAM-database: an English sentence database for offline handwriting recognition. IJDAR 5, 39–46. DOI: https://doi.org/10.1007/s100320200071

Borlepwar, A. P., Borakhade, S. R., & Pradhan, B. 2017. Run Length Smoothing Algorithm for Segmentation

G. Cohen, S. Afshar, J. Tapson and A. van Schaik. 2017. EMNIST: Extending MNIST to handwritten letters. International Joint Conference on Neural Networks (IJCNN), Anchorage, AK, 2017, pp. 2921-2926, DOI: 10.1109/IJCNN.2017.7966217.

SuperDataScienceTeam. 2018. Convolutional Neural Networks (CNN): Step 4 – Full Connection. (Aug 2018). Retrieved from superdatascience: https://www.superdatascience.com/blogs/convolutional-neural-networks-cnn-step-4-full-connection

Shi, B., Bai, X., & Yao, C. 2016. An End-to-End Trainable Neural Network for Image-based Sequence Recognation and Its Application to Scene Text Recognation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2298-2304.

Supriana, I., Ramadhan, E. 2015. Pengenalan Tulisan Tangan untuk Angka tanpa Pembelajaran. Bandung, Indonesia: Konferensi Nasional Informatika 2015.

Ujjwalkarn. 2017, May 29. An Intuitive Explanation of Convolutional Neural Networks. Retrieved from ujjwalkarn: https://ujjwalkarn.me/2016/08/11/intuitive-explanation-convnets/

Wirayuda, T.A.B., Syilvia, V. & Retno, N.D. (2009). Pengenalan Huruf Komputer Menggunakan Algoritma Berbasis Chain Code dan Algoritma Sequence Alignment, 19-24. Bali, Indonesia: Konferensi Nasional Sistem dan Informatika


Refbacks

  • There are currently no refbacks.


Jurnal telah terindeks oleh :