Catalyzed by the invention of magnetic tape recording, audio production has transformed from technical to artistic, and the roles of producer, engineer, composer, and performer have merged for many forms of music. However, while these roles have changed, the way we interact with audio production tools has not and still relies on the conventions established in the 1970s for audio engineers. Users communicate their audio concepts to these complex tools using knobs and sliders that control low-level technical parameters. Musicians currently need technical knowledge of signals in addition to their musical knowledge to make novel music. However, many experienced and casual musicians simply do not have the time or desire to acquire this technical knowledge. While simpler tools (e.g. Apple's GarageBand) exist, they are limiting and frustrating to users.
To support these audio-production novices, we must build audio-production tools with affordances for them. We must identify interactions that enable novices to communicate their audio concepts without requiring technical knowledge and develop systems that can understand these interactions.
This dissertation advances our understanding of this problem by investigating three interaction types which are inspired by how novices communicate audio concepts to other people: language, vocal imitation, and evaluation. Because learning from an individual can be time consuming for a user, much of this dissertation focuses on how we can learn general audio concepts offline using crowdsourcing rather than user-specific audio concepts. This work introduces algorithms, frameworks, and software for learning audio concepts via these interactions and investigates the strengths and weaknesses of both the algorithms and the interaction types. These contributions provide a research foundation for a new generation of audio-production tools.
This problem is not limited to audio production tools. Other media production tools—such as software for graphics, image, and video design and editing—are also controlled by low-level technical parameters which require technical knowledge and experience to use effectively. The contributions in this dissertation to learn mappings from descriptive language and feedback to low-level control parameters may also be adapted for creative production tools in these other mediums. The contributions in this dissertation can unlock the creativity trapped in everyone who has the desire to make music and other media but does not have the technical skills required for today's tools.