wir bieten...
Dekobild im Seitenkopf ISMLL
Veranstaltungen im Sommersemester 2018 / Master-Seminar: Polyphonic Pitch Detection and Modeling

Lecture Slides:

   01. How to Read a Paper[PDF]10.04.2018
   02. How to Prepare a Presentation and Write a Report[PDF]10.04.2018


Classifcation Approaches

1. Poliner, Graham E., and Daniel PW Ellis. "A discriminative model for polyphonic piano transcription." EURASIP Journal on Advances in Signal Processing 2007.1 (2006): 048317.

2. Sigtia, Siddharth, Emmanouil Benetos, and Simon Dixon. "An end-to-end neural network for polyphonic piano music transcription." IEEE/ACM Transactions on Audio, Speech and Language Processing (TASLP) 24.5 (2016): 927-939.

3. Wang, Qi, Ruohua Zhou, and Yonghong Yan. "A two-stage approach to note-level transcription of a specific piano." Applied Sciences 7.9 (2017): 901.

4. Thickstun, John, et al. "Invariances and Data Augmentation for Supervised Music Transcription." arXiv preprint arXiv:1711.04845 (2017).

5. Zhang, Yu, William Chan, and Navdeep Jaitly. "Very deep convolutional networks for end-to-end speech recognition." Acoustics, Speech and Signal Processing (ICASSP), 2017 IEEE International Conference on. IEEE, 2017.

6. Lee, Jongpil, et al. "SampleCNN: End-to-End Deep Convolutional Neural Networks Using Very Small Filters for Music Classification." Applied Sciences 8.1 (2018): 150.

7. Cakir, Emre, et al. "Polyphonic sound event detection using multi label deep neural networks." Neural Networks (IJCNN), 2015 International Joint Conference on. IEEE, 2015.

8. Adavanne, Sharath, Archontis Politis, and Tuomas Virtanen. "Multichannel Sound Event Detection Using 3D Convolutional Neural Networks for Learning Inter-channel Features." arXiv preprint arXiv:1801.09522 (2018).

9. Lacoste, Alexandre, and Douglas Eck. "A supervised classification algorithm for note onset detection." EURASIP Journal on Applied Signal Processing 2007.1 (2007): 153-153.

Temporal Modeling with Recurrent Networks

1. Trabelsi, Chiheb, et al. "Deep complex networks." arXiv preprint arXiv:1705.09792 (2017).

2. Choi, Keunwoo, et al. "Convolutional recurrent neural networks for music classification." Acoustics, Speech and Signal Processing (ICASSP), 2017 IEEE International Conference on. IEEE, 2017.

3. Parascandolo, Giambattista, et al. "Convolutional recurrent neural networks for polyphonic sound event detection." IEEE/ACM Transactions on Audio, Speech, and Language Processing 25.6 (2017): 1291-1303.

4. Hayashi, Tomoki, et al. "Duration-Controlled LSTM for Polyphonic Sound Event Detection." IEEE/ACM Transactions on Audio, Speech, and Language Processing 25.11 (2017): 2059-2070.

5. Hawthorne, Curtis, et al. "Onsets and Frames: Dual-Objective Piano Transcription." arXiv preprint arXiv:1710.11153 (2017).

6. Chung, Junyoung, et al. "A recurrent latent variable model for sequential data." Advances in neural information processing systems. 2015.

7. Thickstun, John, Zaid Harchaoui, and Sham Kakade. "Learning features of music from scratch." arXiv preprint arXiv:1611.09827 (2016).

8. Vaswani, Ashish, et al. "Attention is all you need." Advances in Neural Information Processing Systems. 2017.

Language Models

1. T. Mikolov, M. Karafiat, L. Burget, J. Cernock ´ y, and ` S. Khudanpur. Recurrent neural network based language model. In Interspeech, volume 2, page 3, 2010.

2. Ryynanen, Matti P., and Anssi Klapuri. "Polyphonic music transcription using note event modeling." Applications of Signal Processing to Audio and Acoustics, 2005. IEEE Workshop on. IEEE, 2005.

3. Sigtia, Siddharth, et al. "RNN-based Music Language Models for Improving Automatic Music Transcription." (2014).

4. Boulanger-Lewandowski, Nicolas, Yoshua Bengio, and Pascal Vincent. "Modeling temporal dependencies in high-dimensional sequences: Application to polyphonic music generation and transcription." arXiv preprint arXiv:1206.6392 (2012).

5. Wang, Qi, Ruohua Zhou, and Yonghong Yan. "Polyphonic Piano Transcription with a Note-Based Music Language Model." Applied Sciences 8.3 (2018): 470.

6. Boulanger-Lewandowski, Nicolas, Yoshua Bengio, and Pascal Vincent. "High-dimensional sequence transduction." Acoustics, Speech and Signal Processing (ICASSP), 2013 IEEE International Conference on. IEEE, 2013.

Variational Encoders

1. Larochelle, Hugo, and Iain Murray. "The neural autoregressive distribution estimator." Proceedings of the Fourteenth International Conference on Artificial Intelligence and Statistics. 2011.

2. Fabius, Otto, and Joost R. van Amersfoort. "Variational recurrent auto-encoders." arXiv preprint arXiv:1412.6581 (2014).

3. Hennig, Jay A., Akash Umakantha, and Ryan C. Williamson. "A Classifying Variational Autoencoder with Application to Polyphonic Music Generation." arXiv preprint arXiv:1711.07050 (2017).

Sequence Modeling

1. Gehring, Jonas, et al. "Convolutional sequence to sequence learning." arXiv preprint arXiv:1705.03122 (2017).

2. Ullrich, Karen, and Eelco van der Wel. "Music transcription with convolutional sequence-to-sequence models." (2018).

3. Sutskever, Ilya, Oriol Vinyals, and Quoc V. Le. "Sequence to sequence learning with neural networks." Advances in neural information processing systems. 2014.

4. Gan, Zhe, et al. "Deep temporal sigmoid belief networks for sequence modeling." Advances in Neural Information Processing Systems. 2015.

5. Yang, Zhen, et al. "Improving neural machine translation with conditional sequence generative adversarial nets." arXiv preprint arXiv:1703.04887 (2017).

Music Generation

1. Yang, Li-Chia, Szu-Yu Chou, and Yi-Hsuan Yang. "MidiNet: A convolutional generative adversarial network for symbolic-domain music generation." Proceedings of the 18th International Society for Music Information Retrieval Conference (ISMIR’2017), Suzhou, China. 2017.

2. Johnson, Daniel D. "Generating polyphonic music using tied parallel networks." International Conference on Evolutionary and Biologically Inspired Music and Art. Springer, Cham, 2017.


1. Wang, Yunbo, et al. "PredRNN: Recurrent Neural Networks for Predictive Learning using Spatiotemporal LSTMs." Advances in Neural Information Processing Systems. 2017.

2. Cogliati, Andrea, Zhiyao Duan, and Brendt Wohlberg. "Piano transcription with convolutional sparse lateral inhibition." IEEE Signal Processing Letters 24.4 (2017): 392-396.

3. Xingjian, S. H. I., et al. "Convolutional LSTM network: A machine learning approach for precipitation nowcasting." Advances in neural information processing systems. 2015.

4. Lai, Guokun, et al. "Modeling Long-and Short-Term Temporal Patterns with Deep Neural Networks." arXiv preprint arXiv:1703.07015 (2017).