Deteksi Rumus Matematika pada Halaman Dokumen Digital dengan Metode Convolutional Neural Network

Martina Marcelline Taslim, Kartika Gunadi, Alvin Nathaniel Tjondrowiguno

Abstract


Mathematical formulae in academic papers or scientific journals are an important part of said documents. However, mathematical formulae are oftentimes not properly recognized by Optical Character Recognition (OCR) processes. One of the causes of this failure is the difference between mathematical formulae and ordinary text. Therefore, mathematical formula detection in those document pages might help with this problem.

The formula detection is done by converting digital document pages into images, then performing text line segmentation and word segmentation and classifying those results with a Convolutional Neural Network. The aim is to help OCR processes by recognizing which parts of the document pages contain formulae and which parts do not. The CNN architectures used to perform classification comes with 64 kernels in each convolutional layer.

For displayed formulae (formulae that doesn’t share its space with regular text), the model uses 10 groups of Convolutional-ReLU-Max Pooling layers. For inline formulae (formulae that shares its text line with regular text), 12 groups of Convolutional-ReLU-Max Pooling layers are used. Results of the CNN architectures mentioned above are an F1 score of 0,980 for displayed formulae classification in 1-column documents, 0,940 for 2-column documents, and 0,916 for inline formulae. 


Keywords


Machine Learning; Artificial Neural Network; Convolutional Neural Network; Image

Full Text:

PDF

References


Aggarwal, C. C. 2018. Neural Networks and Deep Learning: A Textbook. Cham: Springer Nature.

Amarnath, R., & Nagabhushan, P. 2018. Text line Segmentation in Compressed Representation of Handwritten Document using Tunneling Algorithm. International Journal of Intelligent Systems and Applications in Engineering, 251-261.

Chen, K., & Seuret, M. 2017. Convolutional Neural Networks for Page Segmentation of Historical Document Images. URI=http://arxiv.org/abs/1704.01474

Chu, W., & Liu, F. 2013. Mathematical Formula Detection in Heterogenous Document Images. 2013 Conference on Technologies and Applications of Artificial Intelligence, doi:10.1009/taai.2013.38.

Ciaburro, G., & Venkateswaran, B. 2017. Neural Networks with R: Smart models using CNN, RNN, Deep Learning, and artificial intelligence principles. Birmingham: Packt Publishing.

Fisher, R., Perkins, S., Walker, A., & Wolfart, E. 2003. Hypermedia Image Processing Reference. URI=https://homepages.inf.ed.ac.uk/rbf/HIPR2/dilate.htm

Lin, X., Gao, L., Tang, Z., Lin, X., & Hu, X. 2011. Mathematical Formula Identification In PDF Documents. 2011 International Conference on Document Analysis and Recognition, doi:10.1109/icdar.2011.285.

Poushter, J., Bishop, C., & Chwe, H. 2018. Social Media Use Continues to Rise In Developing Countries. URI=http://www.pewglobal.org/2018/06/19/across-39-countries-three-quarters-say-they-use-the-internet

Shanmugamani, R. 2018. Deep Learning for Computer Vision. Birmingham: Packt Publishing.

Simonyan, K., & Zisserman, A. 2015. Very Deep Convolutional Networks for Large-Scale Image Recognition. International Conference on Learning Representations. San Diego.

Singh, C. D. 2016. Image Classification: CIFAR-10 Neural Networks vs Support Vector Machine. URI=http://chahatdeep.github.io/docs/NNvsSVM.pdf


Refbacks

  • There are currently no refbacks.


Jurnal telah terindeks oleh :