Real-time adaptive noise cancellation for automatic speech recognition in a car environment : a thesis presented in partial fulfillment of the requirements for the degree of Doctor of Philosophy in Computer Engineering at Massey University, School of Engineering and Advanced Technology, Auckland, New Zealand
This research is mainly concerned with a robust method for improving the performance of a
real-time speech enhancement and noise cancellation for Automatic Speech Recognition
(ASR) in a real-time environment. Therefore, the thesis titled, “Real-time adaptive
beamformer for Automatic speech Recognition in a car environment” presents an application
technique of a beamforming method and Automatic Speech Recognition (ASR) method. In
this thesis, a novel solution is presented to the question as below, namely:
How can the driver’s voice control the car using ASR?
The solution in this thesis is an ASR using a hybrid system with acoustic beamforming
Voice Activity Detector (VAD) and an Adaptive Wiener Filter.
The beamforming approach is based on a fundamental theory of normalized least-mean
squares (NLMS) to improve Signal to Noise Ratio (SNR). The microphone has been
implemented with a Voice Activity Detector (VAD) which uses time-delay estimation
together with magnitude-squared coherence (MSC). An experiment clearly shows the ability
of the composite system to reduce noise outside of a defined active zone. In real-time
environments a speech recognition system in a car has to receive the driver’s voice only
whilst suppressing background noise e.g. voice from radio. Therefore, this research presents a
hybrid real-time adaptive filter which operates within a geometrical zone defined around the
head of the desired speaker. Any sound outside of this zone is considered to be noise and
suppressed. As this defined geometrical zone is small, it is assumed that only driver's speech
is incoming from this zone. The technique uses three microphones to define a geometric
based voice-activity detector (VAD) to cancel the unwanted speech coming from outside of
the zone. In the case of a sole unwanted speech incoming from outside of a desired zone, this
speech is muted at the output of the hybrid noise canceller. In case of an unwanted speech
and a desired speech are incoming at the same time, the proposed VAD fails to identify the
unwanted speech or desired speech. In such a situation an adaptive Wiener filter is switched
on for noise reduction, where the SNR is improved by as much as 28dB.
In order to identify the signal quality of the filtered signal from Wiener filter, a template
matching speech recognition system that uses a Wiener filter is designed for testing. In this
thesis, a commercial speech recognition system is also applied to test the proposed
beamforming based noise cancellation and the adaptive Wiener filter.