JP Journal of Heat and Mass Transfer
Special Volume, Issue I, Advances in Mechanical System and ICT-convergence, Pages 77 - 81
(June 2018) http://dx.doi.org/10.17654/HMSI118077 |
|
THE IMPACT OF EACH DEEP NEURAL NETWORK LAYER ON THE PERFORMANCE OF END-TO-END VIETNAMESE SPEECH RECOGNITION
Nguyen Hong Quang
|
Abstract: In this paper, we analyze the impact of each deep neural network (DNN) layer on the performance of end-to-end Vietnamese speech recognition using 1D convolution layers and bi-directional gated recurrent unit (GRU) layers. In the first experiment, we use spectrogram, fully connected (FC) and connectionist temporal classification (CTC) layer to test the Vietnamese digit speech. In the next two experiments, we use the three above layers added with 1D convolution layers and GRU layers. The results of the three experiments show that for Vietnamese speech recognition, 1D convolution and bi-directional GRU layers are the most effective choice for DNN. |
Keywords and phrases: end-to-end speech recognition, deep neural network. |
|
Number of Downloads: 314 | Number of Views: 2390 |
|