Perencanaan dan Pembuatan Aplikasi Android Pengkonversian Suara Menjadi Teks dalam Bahasa Indonesia dengan Machine Learning untuk Membantu Tunarungu
(1) Program Studi Teknik Informatika
(2) Program Studi Teknik Informatika
(3) Program Studi Teknik Informatika
(*) Corresponding Author
Abstract
The Speech Recognition System has achieved WER (Word Error Rate) up to 11.85% in English Words. Big data in speech can helps machine learning to become popular because it can maintain a good generalization to boost machine learning in speech recognition. This paper inspired by Baidu (Deep Speech), we will implement its architecture to achieve the same goal in Indonesian Words. For this research, we use many variations of datasets according to its source such as clean environment voice, noise environment voice, and speech synthesizer from Apple and Bing. The main problem is many variations of datasets influence the results of WER according to its size. Bigger variations of datasets maintain good generalization for the machine learning, but also it has big ambiguity in language model.
Keywords
Full Text:
PDFReferences
Amodei, D., Anubhai, R., Battenberg, E., and Case, C., et all.
Deep Speech : End-to-End Speech Recognition in
English and Mandarin. Proceedings of the 33rd International
Conference on Machine Learning, New York, NY,
02595. DOI= https://arxiv.org/abs/1512.02595.
Dahl, G., Yu, D., Deng, L., and Acero, A. 2011. Contextdependent
pre-trained deep neural networks for large
vocabulary speech recognition. IEEE Transactions on Audio,
Speech, and Language Processing. 30-42.
Graves, A., Fernandez, S., Gomez, F., Schmidhuber, J. 2006.
Connectionist Temporal Classification: Labelling
Unsegmented Sequence Data with Recurrent Neural
Networks. Proceedings of the 23rd International Con- ference
on Machine Learning. ACM, Pittsburgh, PA, 369-376. DOI=
https://dl.acm.org/citation.cfm?doid=1143844.1143891.
Hannun, A., Case, C., Casper, J., and Catanzaro, B., et all.
Deep Speech: Scaling up end-to-end speech
recognition. 1412.5567. DOI=http://arxiv.org/abs/1412.5567.
Hannun, A. Y., Maas, A. L., Jurafsky, D., and Ng, A. Y. 2014.
First-pass large vocabulary continuous speech recognition
using bi-directional recurrent DNNs. abs/1408.2873.
DOI=http://arxiv.org/abs/1408.2873.
Hinton, G., Li, D., Yu, D., Dahl, G., Mohamed, A., Jaitly, N.,
Senior, A.,Vanhoucke, V., Nguyen, P., Sainath, T.,
Kingsbury, B. 2012. Deep Neural Network for Acoustic
Model Speech Recognition. IEEE Signal Processing
Magazine, 1-3.
Lee, H., Pham, P., Largman, Y., and Ng, A.Y. 2009.
Unsupervised feature learning for audio classification using
convolutional deep belief networks. In Advances in Neural
Information Processing Systems, 1096–1104.
Mohamed, A., Dahl, G., and Hinton, G. 2011. Acoustic
modeling using deep belief networks. IEEE Transactions on
Audio, Speech, and Language Processing, 99.
Na'im, A., & Syaputra, H. 2010. Kewarganegaraan, Suku
Bangsa, Agama, Dan Bahasa Sehari-hari Penduduk
Indonesia. Badan Pusat Statistik, Jakarta, Jawa Barat.
Tebelskis, J. 1995. Speech Recognition using Neural
Networks. Doctoral Thesis. CMU Order Number: CMU-CS-
-142., Carnegie Mellon University.
Refbacks
- There are currently no refbacks.
Jurnal telah terindeks oleh :