Perencanaan dan Pembuatan Aplikasi Android Pengkonversian Suara Menjadi Teks dalam Bahasa Indonesia dengan Machine Learning untuk Membantu Tunarungu

David Wibisono(1*), Rolly Intan(2), Endang Setyati(3),


(1) Program Studi Teknik Informatika
(2) Program Studi Teknik Informatika
(3) Program Studi Teknik Informatika
(*) Corresponding Author

Abstract


The Speech Recognition System has achieved WER (Word Error Rate) up to 11.85% in English Words. Big data in speech can helps machine learning to become popular because it can maintain a good generalization to boost machine learning in speech recognition. This paper inspired by Baidu (Deep Speech), we will implement its architecture to achieve the same goal in Indonesian Words. For this research, we use many variations of datasets according to its source such as clean environment voice, noise environment voice, and speech synthesizer from Apple and Bing. The main problem is many variations of datasets influence the results of WER according to its size. Bigger variations of datasets maintain good generalization for the machine learning, but also it has big ambiguity in language model.

 


Keywords


Deep Speech, Machine Learning, Speech Recognition, Neural Network, CTC, Language Model.

Full Text:

PDF

References


Amodei, D., Anubhai, R., Battenberg, E., and Case, C., et all.

Deep Speech : End-to-End Speech Recognition in

English and Mandarin. Proceedings of the 33rd International

Conference on Machine Learning, New York, NY,

02595. DOI= https://arxiv.org/abs/1512.02595.

Dahl, G., Yu, D., Deng, L., and Acero, A. 2011. Contextdependent

pre-trained deep neural networks for large

vocabulary speech recognition. IEEE Transactions on Audio,

Speech, and Language Processing. 30-42.

Graves, A., Fernandez, S., Gomez, F., Schmidhuber, J. 2006.

Connectionist Temporal Classification: Labelling

Unsegmented Sequence Data with Recurrent Neural

Networks. Proceedings of the 23rd International Con- ference

on Machine Learning. ACM, Pittsburgh, PA, 369-376. DOI=

https://dl.acm.org/citation.cfm?doid=1143844.1143891.

Hannun, A., Case, C., Casper, J., and Catanzaro, B., et all.

Deep Speech: Scaling up end-to-end speech

recognition. 1412.5567. DOI=http://arxiv.org/abs/1412.5567.

Hannun, A. Y., Maas, A. L., Jurafsky, D., and Ng, A. Y. 2014.

First-pass large vocabulary continuous speech recognition

using bi-directional recurrent DNNs. abs/1408.2873.

DOI=http://arxiv.org/abs/1408.2873.

Hinton, G., Li, D., Yu, D., Dahl, G., Mohamed, A., Jaitly, N.,

Senior, A.,Vanhoucke, V., Nguyen, P., Sainath, T.,

Kingsbury, B. 2012. Deep Neural Network for Acoustic

Model Speech Recognition. IEEE Signal Processing

Magazine, 1-3.

Lee, H., Pham, P., Largman, Y., and Ng, A.Y. 2009.

Unsupervised feature learning for audio classification using

convolutional deep belief networks. In Advances in Neural

Information Processing Systems, 1096–1104.

Mohamed, A., Dahl, G., and Hinton, G. 2011. Acoustic

modeling using deep belief networks. IEEE Transactions on

Audio, Speech, and Language Processing, 99.

Na'im, A., & Syaputra, H. 2010. Kewarganegaraan, Suku

Bangsa, Agama, Dan Bahasa Sehari-hari Penduduk

Indonesia. Badan Pusat Statistik, Jakarta, Jawa Barat.

Tebelskis, J. 1995. Speech Recognition using Neural

Networks. Doctoral Thesis. CMU Order Number: CMU-CS-

-142., Carnegie Mellon University.


Refbacks

  • There are currently no refbacks.


Jurnal telah terindeks oleh :