End-to-end automatic speech recognition for low-resource languages : a thesis submitted in partial fulfillment for the degree of Doctor of Philosophy in Computer Science at the School of Mathematical and Computational Sciences, Massey University, Auckland, New Zealand

Satwinder Singh

End-to-end automatic speech recognition for low-resource languages : a thesis submitted in partial fulfillment for the degree of Doctor of Philosophy in Computer Science at the School of Mathematical and Computational Sciences, Massey University, Auckland, New Zealand

dc.confidential	Embargo : yes	en_US
dc.contributor.advisor	Wang, Ruili
dc.contributor.author	Satwinder Singh
dc.date.accessioned	2023-08-07T02:42:48Z
dc.date.accessioned	2023-08-27T23:15:15Z
dc.date.available	2023-08-07T02:42:48Z
dc.date.available	2023-08-27T23:15:15Z
dc.date.issued	2023
dc.description.abstract	Automatic speech recognition (ASR) for low-resource languages presents numerous challenges due to the lack of various crucial linguistic resources including annotated speech corpus, lexicon, and raw language text. In this thesis, we propose different approaches to improve fundamental frequency estimation and speech recognition for low-resource languages. Firstly, we propose DeepF0, a new deep learning technique for fundamental frequency (F0) estimation. Existing models have limited learning capabilities due to using a shallow receptive field. Our DeepF0 extends the receptive field by using dilated convolutional blocks. Additionally, we enhance training efficiency and speed by incorporating residual blocks with residual connections. We achieve state-of-the-art results with DeepF0, even using 77.4% fewer network parameters. Secondly, we introduce a new meta-learning framework for low-resource speech recognition that improves on the previous model-agnostic meta-learning (MAML) approach. Our framework addresses issues of MAML such as training instabilities and slower convergence by using a multi-step loss (MSL). MSL calculates losses at each step of MAML's inner loop and combines them using a weighted importance vector, which prioritizes the loss at the last step. Thirdly, we propose an end-to-end ASR approach for low-resource languages that exploit the synthesized datasets along with real speech datasets. We evaluate our approach on the low-resource Punjabi language, which is widely spoken across the globe by millions of speakers, however, still lacks annotated speech datasets. Our empirical results show that our synthesized datasets (Google-synth and CMU-synth) can significantly improve the accuracy of our ASR model. Lastly, we introduce a self-training approach, also known as the pseudo-labeling approach, to enhance the performance of low-resource speech recognition. While most self-training research has centered on high-resource languages such as English, our work is focused on the low-resource Punjabi language. To weed out the low-quality pseudo-labels, we employ length normalized confidence score. Overall, our experimental evaluation validates the efficacy of our proposed approaches and shows that they outperform existing baseline approaches for F0 estimation and low-resource speech recognition.	en_US
dc.identifier.uri	http://hdl.handle.net/10179/19790
dc.publisher	Massey University	en_US
dc.rights	© The Author	en_US
dc.subject	Automatic speech recognition	en
dc.subject	Deep learning (Machine learning)	en
dc.subject	Panjabi language	en
dc.subject.anzsrc	460212 Speech recognition	en
dc.title	End-to-end automatic speech recognition for low-resource languages : a thesis submitted in partial fulfillment for the degree of Doctor of Philosophy in Computer Science at the School of Mathematical and Computational Sciences, Massey University, Auckland, New Zealand	en_US
dc.type	Thesis	en_US
massey.contributor.author	Satwinder Singh	en_US
thesis.degree.discipline	Computer Science	en_US
thesis.degree.grantor	Massey University	en_US
thesis.degree.level	Doctoral	en_US
thesis.degree.name	Doctor of Philosophy (PhD)	en_US

Files

Original bundle

Now showing 1 - 1 of 1

Name:: SatwinderSinghPhDThesis.pdf
Size:: 1.24 MB
Format:: Adobe Portable Document Format

Download

Collections

Theses and Dissertations