Wavelet-based birdsong recognition for conservation : a thesis presented in partial fulfilment of the requirements for the degree of Doctor of Philosophy in Computer Science at Massey University, Palmerston North, New Zealand
According to the International Union for the Conservation of Nature Red Data List
nearly a quarter of the world's bird species are either threatened or at risk of extinction.
To be able to protect endangered species, we need accurate survey methods that reliably
estimate numbers and hence population trends. Acoustic monitoring is the most
commonly-used method to survey birds, particularly cryptic and nocturnal species,
not least because it is non-invasive, unbiased, and relatively time-effective. Unfortunately,
the resulting data still have to be analysed manually. The current practice,
manual spectrogram reading, is tedious, prone to bias due to observer variations, and
While there is a large literature on automatic recognition of targeted recordings of
small numbers of species, automatic analysis of long field recordings has not been well
studied to date. This thesis considers this problem in detail, presenting experiments
demonstrating the true efficacy of recorders in natural environments under different
conditions, and then working to reduce the noise present in the recording, as well as to
segment and recognise a range of New Zealand native bird species.
The primary issues with field recordings are that the birds are at variable distances
from the recorder, that the recordings are corrupted by many different forms of noise,
that the environment affects the quality of the recorded sound, and that birdsong is
often relatively rare within a recording. Thus, methods of dealing with faint calls,
denoising, and effective segmentation are all needed before individual species can be
recognised reliably. Experiments presented in this thesis demonstrate clearly the effects
of distance and environment on recorded calls. Some of these results are unsurprising,
for example an inverse square relationship with distance is largely true. Perhaps more
surprising is that the height from which a call is transmitted has a signifcant effect on
the recorded sound. Statistical analyses of the experiments, which demonstrate many
significant environmental and sound factors, are presented.
Regardless of these factors, the recordings have noise present, and removing this
noise is helpful for reliable recognition. A method for denoising based on the wavelet
packet decomposition is presented and demonstrated to significantly improve the quality
of recordings. Following this, wavelets were also used to implement a call detection
algorithm that identifies regions of the recording with calls from a target bird species.
This algorithm is validated using four New Zealand native species namely Australasian
bittern (Botaurus poiciloptilus), brown kiwi (Apteryx mantelli ), morepork (Ninox novaeseelandiae),
and kakapo (Strigops habroptilus), but could be used for any species.
The results demonstrate high recall rates and tolerate false positives when compared
to human experts.