Massey Documents by Type

Permanent URI for this communityhttps://mro.massey.ac.nz/handle/10179/294

Browse

Search Results

Now showing 1 - 2 of 2
  • Item
    Multi-microphone speech enhancement technique using a novel neural network beamformer : a thesis presented in partial fulfilment of the requirements for the degree of Doctor of Philosophy in Engineering at Massey University, Albany, New Zealand
    (Massey University, 2014) Yoganathan, Vaitheki
    This thesis presents a novel speech enhancement algorithm to reduce the background noise from the acquired speech signal. It introduces an innovative idea for the speech beamformer using an input delay neural network based adaptive filter for noise reduction. Speech communication is considered as the most popular and natural way for humans to communicate with computers. In the past few decades, there has been an increased demand for speech-based applications; examples include personal dictation devices, hands-free telephony, voice recognition for robotics, speech-controlled equipment, automated phone systems, etc. However, these applications require a high signal-to-noise ratio to function effectively. The background noise sources such as factory machine noises, television, radio, computer or another competing speaker, often degrade the performance of the acquired signals. The problem of removing these unwanted signals from the acquired speech signal has been investigated by various authors. However, there is still room for improvement to the existing methods. A multi-microphone neural network based switched Griffiths-Jim beamformer structure was implemented using the Labview software. The conventional noise reduction section of the Griffiths and Jim beamformer structure was improved with a non-linear neural network approach. A partially connected three-layer neural network structure was implemented for rapid real-time processing. The error back-propagation algorithm was used here to train the neural network structure. Although it is a slow gradient learning algorithm, it can be easily replaced with other algorithms such as the fast back-propagation algorithm. The proposed algorithms show promising noise reduction improvement over the previous adaptive algorithms like the normalised least mean squares adaptive filter. However, the performance of the neural network depends on its chosen parameters such as learning rate, amount of training given, and the size of the neural network structure. Tests with a speech-controlled system demonstrate that the neural network based beamformer significantly improves the recognition rate of the system.
  • Item
    Real-time adaptive noise cancellation for automatic speech recognition in a car environment : a thesis presented in partial fulfillment of the requirements for the degree of Doctor of Philosophy in Computer Engineering at Massey University, School of Engineering and Advanced Technology, Auckland, New Zealand
    (Massey University, 2008) Qi, Ziming
    This research is mainly concerned with a robust method for improving the performance of a real-time speech enhancement and noise cancellation for Automatic Speech Recognition (ASR) in a real-time environment. Therefore, the thesis titled, “Real-time adaptive beamformer for Automatic speech Recognition in a car environment” presents an application technique of a beamforming method and Automatic Speech Recognition (ASR) method. In this thesis, a novel solution is presented to the question as below, namely: How can the driver’s voice control the car using ASR? The solution in this thesis is an ASR using a hybrid system with acoustic beamforming Voice Activity Detector (VAD) and an Adaptive Wiener Filter. The beamforming approach is based on a fundamental theory of normalized least-mean squares (NLMS) to improve Signal to Noise Ratio (SNR). The microphone has been implemented with a Voice Activity Detector (VAD) which uses time-delay estimation together with magnitude-squared coherence (MSC). An experiment clearly shows the ability of the composite system to reduce noise outside of a defined active zone. In real-time environments a speech recognition system in a car has to receive the driver’s voice only whilst suppressing background noise e.g. voice from radio. Therefore, this research presents a hybrid real-time adaptive filter which operates within a geometrical zone defined around the head of the desired speaker. Any sound outside of this zone is considered to be noise and suppressed. As this defined geometrical zone is small, it is assumed that only driver's speech is incoming from this zone. The technique uses three microphones to define a geometric based voice-activity detector (VAD) to cancel the unwanted speech coming from outside of the zone. In the case of a sole unwanted speech incoming from outside of a desired zone, this speech is muted at the output of the hybrid noise canceller. In case of an unwanted speech and a desired speech are incoming at the same time, the proposed VAD fails to identify the unwanted speech or desired speech. In such a situation an adaptive Wiener filter is switched on for noise reduction, where the SNR is improved by as much as 28dB. In order to identify the signal quality of the filtered signal from Wiener filter, a template matching speech recognition system that uses a Wiener filter is designed for testing. In this thesis, a commercial speech recognition system is also applied to test the proposed beamforming based noise cancellation and the adaptive Wiener filter.