Perbandingan Character Recognition dan Text Recognition Menggunakan Extended MNIST dan IAM Database dan Tesseract pada Tulisan Tangan Ijazah

Made Yoga Mahardika, Kartika Gunadi, Alexander Setiawan


The problem with handwriting is how a technique can recognize various types of writing in various forms. Different from computer letters that consistent, each human’s handwriting is unique in the form and consistency. These problems can be found in ijazah documents where the data is handwriting.

Data location segmentation uses run length smoothing algorithm with dots as segmentation features. Handwritten text recognition (HTR) technique requires data segmented into words. Handwritten character recognition (HCR) technique requires data segmented into characters. HCR uses the LeNet5 model with the EMNIST dataset. HTR uses tesseract tool and convolutional recurrent neural networks with the IAM database.

Experiment on 10 samples of scan images, segmentation obtained an average accuracy of 95.6%. The HCR technique failed in the letter segmentation process in cursive handwriting. The best technique is the HTR with tesseract tool managed to get word accuracy above 69% tested on 5 scan samples, 15 data fields.


run length smoothing algorithm; Extended MNIST; IAM database; handwritten character recognition; handwritten text recognition; convolutional recurrent neural network; tesseract; segmentasi; ijazah

Full Text:



