JavaScript is disabled for your browser. Some features of this site may not work without it.
The system will be going down for maintenance on Wednesday 22nd March 7-9pm NZT. Apologies for the inconvenience.
L'arte di interazione musicale : new musical possibilities through multimodal techniques : a dissertation submitted to the Victoria University of Wellington and Massey University in fulfillment of the requirements for the degree of Doctor of Philosophy in Sonic Arts, New Zealand School of Music
Multimodal communication is an essential aspect of human perception,
facilitating the ability to reason, deduce, and understand meaning. Utilizing
multimodal senses, humans are able to relate to the world in many different
contexts. This dissertation looks at surrounding issues of multimodal
communication as it pertains to human-computer interaction. If humans rely on
multimodality to interact with the world, how can multimodality benefit the ways
in which humans interface with computers? Can multimodality be used to help
the machine understand more about the person operating it and what
associations derive from this type of communication?
This research places multimodality within the domain of musical
performance, a creative field rich with nuanced physical and emotive aspects.
This dissertation asks, what kinds of new sonic collaborations between musicians
and computers are possible through the use of multimodal techniques? Are there
specific performance areas where multimodal analysis and machine learning can
benefit training musicians? In similar ways can multimodal interaction or analysis
support new forms of creative processes?
Applying multimodal techniques to music-computer interaction is a
burgeoning effort. As such the scope of the research is to lay a foundation of
multimodal techniques for the future. In doing so the first work presented is a
software system for capturing synchronous multimodal data streams from nearly
any musical instrument, interface, or sensor system.
This dissertation also presents a variety of multimodal analysis scenarios for
machine learning. This includes automatic performer recognition for both string
and drum instrument players, to demonstrate the significance of multimodal
musical analysis. Training the computer to recognize who is playing an
instrument suggests important information is contained not only within the
acoustic output of a performance, but also in the physical domain. Machine
learning is also used to perform automatic drum-stroke identification; training
the computer to recognize which hand a drummer uses to strike a drum. There
are many applications for drum-stroke identification including more detailed
automatic transcription, interactive training (e.g. computer-assisted rudiment
practice), and enabling efficient analysis of drum performance for metrics
tracking.
Furthermore, this research also presents the use of multimodal techniques in
the context of everyday practice. A practicing musician played a sensoraugmented
instrument and recorded his practice over an extended period of time,
realizing a corpus of metrics and visualizations from his performance. Additional
multimodal metrics are discussed in the research, and demonstrate new types of
performance statistics obtainable from a multimodal approach.
The primary contributions of this work include (1) a new software tool
enabling musicians, researchers, and educators to easily capture multimodal
information from nearly any musical instrument or sensor system; (2)
investigating multimodal machine learning for automatic performer recognition
of both string players and percussionists; (3) multimodal machine learning for
automatic drum-stroke identification; (4a) applying multimodal techniques to
musical pedagogy and training scenarios; (4b) investigating novel multimodal
metrics; (5) lastly this research investigates the possibilities, affordances, and
design considerations of multimodal musicianship both in the acoustic domain,
as well as in other musical interface scenarios. This work provides a foundation
from which engaging musical-computer interactions can occur in the future,
benefitting from the unique nuances of multimodal techniques.