SONYC Urban Sound Tagging (SONYC-UST) is a dataset for the development and evaluation of machine listening systems for realistic urban noise monitoring. The audio was recorded from the SONYC acoustic sensor network. Volunteers on the Zooniverse citizen science platform tagged the presence of 23 classes that were chosen in consultation with the New York City Department of Environmental Protection. These 23 fine-grained classes can be grouped into 8 coarse-grained classes. The recordings are split into three sets: training, validation, and test. The training and validation sets are disjoint with respect to the sensor from which each recording came, and the test set is displaced in time. For increased reliability, three volunteers annotated each recording. In addition, members of the SONYC team subsequently created a subset of verified, ground-truth tags using a two-stage annotation procedure in which two annotators independently tagged and then collectively resolved any disagreements. This subset of recordings with verified annotations intersects with all three recording splits. All of the recordings in the test set have these verified annotations. In v2 version of this dataset, we have also included coarse spatiotemporal context information to aid in tag prediction when time and location is known. For more details on the motivation and creation of this dataset see the DCASE 2020 Urban Sound Tagging with Spatiotemporal Context Task website.
To learn more about the dataset, read our paper to be presented at DCASE 2020:
Cartwright, M., Cramer, J., Mendez, A.E.M., Wang, Y., Wu, H., Lostanlen, V., Fuentes, M., Dove, G., Mydlarz, C., Salamon, J., Nov, O., Bello, J.P. SONYC-UST-V2: An Urban Sound Tagging Dataset with Spatiotemporal Context. In Proceedings of the Workshop on Detection and Classification of Acoustic Scenes and Events (DCASE), 2020. pdf