Machine learning and audio processing : a thesis presented in partial fulfilment of the requirements for the degree of Doctor of Philosophy in Computer Science at Massey University, Albany, Auckland, New Zealand
In this thesis, we addressed two important theoretical issues in deep neural
networks and clustering, respectively. Also, we developed a new approach for
polyphonic sound event detection, which is one of the most important applications
in the audio processing area.
The developed three novel approaches are:
(i) The Large Margin Recurrent Neural Network (LMRNN), which improves
the discriminative ability of original Recurrent Neural Networks by
introducing a large margin term into the widely used cross-entropy loss
function. The developed large margin term utilises the large margin
discriminative principle as a heuristic term to navigate the convergence
process during training, which fully exploits the information from data
labels by considering both target category and competing categories.
(ii) The Robust Multi-View Continuous Subspace Clustering (RMVCSC)
approach, which performs clustering on a common view-invariant
subspace learned from all views. The clustering result and the common
representation subspace are simultaneously optimised by a single
continuous objective function. In the objective function, a robust estimator
is used to automatically clip specious inter-cluster connections while
maintaining convincing intra-cluster correspondences. Thus, the developed
RMVCSC can untangle heavily mixed clusters without pre-setting the
number of clusters.
(iii) The novel polyphonic sound event detection approach based on Relational
Recurrent Neural Network (RRNN), which utilises the relational reasoning
ability of RRNNs to untangle the overlapping sound events across audio
recordings. Different from previous works, which mixed and packed all
historical information into a single common hidden memory vector, the
developed approach allows historical information to interact with each
other across an audio recording, which is effective and efficient in
untangling the overlapping sound events.
All three approaches are tested on widely used datasets and compared with
recently published works. The experimental results have demonstrated the
effectiveness and efficiency of the developed approaches.