The objective of my research is to build human-centered machine listening and audio processing systems that enable new interactions for observing the acoustic world and expressing oneself through sound. I aim to use machine learning as a tool to augment human skills/intelligence rather than as a tool to fully automate processes. I carry out this research in the fields of human-computer interaction (HCI) and machine learning, and I apply it to the content domain of audio, a rich application space due to the inherent subjectivity, ambiguity, and context-dependence of auditory perception. To carry out my research, I develop new methods, conduct human subject experiments at scale with crowdsourcing, build working systems, and study users interacting with these systems to gain insights.

Sound Event Detection

sonyc Sound event detection (SED) aims to detect and describe the events in an acoustic scene given a continuous acoustic signal. It has the potential to enable powerful applications in diverse domains such as bioacoustic monitoring, urban noise monitoring, music transcription, electric vehicle sensing, assistive technologies, and more. I research many aspects of sound event detection pipeline, including best practices for audio annotation, audio representation learning for SED, model compression, and interactive SED systems for making sense of large scale audio collections. Most of this research has been conducted in the context of the Sounds of New York City (SONYC) project, a large NSF-funded project to monitor, analyze, and mitigate urban noise pollution.

Selected publications (see all)

Natural Audio Production Interfaces

synthassist The way we interact with audio production tools relies on the conventions established in the 1970s for audio engineers. Users communicate their audio concepts to these complex tools using knobs and sliders that control low-level technical parameters. Musicians currently need technical knowledge of signals in addition to their musical knowledge to make novel music. However, many experienced and casual musicians simply do not have the time or desire to acquire this technical knowledge. While simpler tools (e.g. Apple's GarageBand) exist, they are limiting and frustrating to users. In this research, I focus on bridging the gap between the intentions of both amateur and professional musicians and the audio manipulation tools available through software. Rather than force nonintuitive interactions, or remove control altogether, we reframe the controls to work within the interaction paradigms identified by research done on how audio engineers and musicians communicate auditory concepts to each other: evaluative feedback, natural language, vocal imitation, and exploration.


Selected publications (see all)

Crowdsourced Audio Annotation and Quality Evaluation

caqe In the past decade, audio researchers have often turned to crowdsourcing in order to hasten or scale their efforts for audio annotation and audio quality evaluation. However, the implications of this have been historically understudied. To address this, I research best practices for performing crowdsourced audio annotation and quality evaluation in order to obtain high quality data with high throughput. This includes investigating how data should be presented to annotators, how it should be aggregated (if necessary), and the expected annotation quality and througput.

Selected publications (see all)