Difference between revisions of "Counterfactuals in music generation"

From CCRMA Wiki
Jump to: navigation, search
(Reorganized page with updates by week. Still filling in)
Line 1: Line 1:
 
'''Introduction'''
 
'''Introduction'''
  
[describe high-level goal of the project: human-AI co-creation, refinement of system outputs through counterfactuals]
+
The space of human-AI co-creation of content is ripe with possibility; machine learning models can be used to augment human abilities, resulting in outcomes that would not be possible with humans or AI alone. However, many state-of-the-art ML systems today act as “black boxes” that don’t afford end-users any control over their outputs. In the context of creativity, we desire ML systems that are both expressive in their outputs and controllable by an end-user.
 +
Specifically in the context of music generation, current models are designed more for listeners than composers. While generative models such as Musenet and GANSynth can create outputs with impressive harmonies, rhythms, and styles, they lack any method for the user to refine those features. If the user doesn’t like the musical output, their only option is to re-generate, producing a completely different composition. Moreover, changing the way these models generate output requires machine learning experience and hours of training time, which is not feasible for composers.
 +
 
 +
Counterfactual inference is reasoning about “what would have happened, had some intervention been performed, given that something else in fact occurred.” ([https://causalai.net/r60.pdf Barenboim et al. 2020]) In order to reason about what components of the model are necessary and/or sufficient to produce certain kinds of outputs, counterfactual scenarios are useful.
  
 
'''Updates'''
 
'''Updates'''
Line 17: Line 20:
 
''Week 5''
 
''Week 5''
  
I am now using the DDSP framework in my project. My project abstract can be viewed [https://docs.google.com/document/d/1tbPepxLPPV2MJjuzKfh2UqA_vHr0eMc81PPHPzi0mUg here.]
+
I am now using the [https://magenta.tensorflow.org/ddsp DDSP] framework in my project. My project abstract can be viewed [https://docs.google.com/document/d/1tbPepxLPPV2MJjuzKfh2UqA_vHr0eMc81PPHPzi0mUg here.] It is particulary interesting because rather than generating audio directly using a neural network, they use neural networks to drive the parameters of traditional sound synthesis, which in turn generates audio. This greatly reduces the number of parameters that the neural networks need, and thus drastically reduces the amount of training data needed to produce models that can do impressive feats such as timbre transfer.
 +
 
 +
One particularly interesting work that DDSP cites is this paper: [Active learning of intuitive control knobs for synthesizers using gaussian processes https://dl.acm.org/doi/10.1145/2557500.2557544]. In this paper, they learn high-level "knobs" that map from synthesizer control space (i.e., parameters such as F0, amplitude, harmonic distribution, etc.) to high-level concepts, such as "scariness" or "steadiness." This is relevant and interesting, since they are building an interactive ML system for music generation, but also one in which it would be ripe to explore the space of counterfactual possibilities (e.g. if the "scariness" knob had been lower, what would the composition have been like?)
  
 
TODO:
 
TODO:

Revision as of 12:12, 12 May 2021

Introduction

The space of human-AI co-creation of content is ripe with possibility; machine learning models can be used to augment human abilities, resulting in outcomes that would not be possible with humans or AI alone. However, many state-of-the-art ML systems today act as “black boxes” that don’t afford end-users any control over their outputs. In the context of creativity, we desire ML systems that are both expressive in their outputs and controllable by an end-user. Specifically in the context of music generation, current models are designed more for listeners than composers. While generative models such as Musenet and GANSynth can create outputs with impressive harmonies, rhythms, and styles, they lack any method for the user to refine those features. If the user doesn’t like the musical output, their only option is to re-generate, producing a completely different composition. Moreover, changing the way these models generate output requires machine learning experience and hours of training time, which is not feasible for composers.

Counterfactual inference is reasoning about “what would have happened, had some intervention been performed, given that something else in fact occurred.” (Barenboim et al. 2020) In order to reason about what components of the model are necessary and/or sufficient to produce certain kinds of outputs, counterfactual scenarios are useful.

Updates

Weeks 1 & 2 were primarily spent doing literature review. I learned a lot of background information on causal inference, primarily informed by Pearl's Causal Hierarchy.

Week 3

this week, I tentatively narrowed the scope of the project to music generation. I have been experimenting with various music generation models from Google Magenta, including the Music Transformer and CoCoNet.

Week 4

Still playing around with the generative models, trying to get some intuition into their workings and what parameters I can adjust. The orderless composition property of Coconet is particularly interesting — it seems like the sampling strategy is non-deterministic, and we could run a counterfactual in the vein of "what would have happened if the notes were generated in a different order..."

Week 5

I am now using the DDSP framework in my project. My project abstract can be viewed here. It is particulary interesting because rather than generating audio directly using a neural network, they use neural networks to drive the parameters of traditional sound synthesis, which in turn generates audio. This greatly reduces the number of parameters that the neural networks need, and thus drastically reduces the amount of training data needed to produce models that can do impressive feats such as timbre transfer.

One particularly interesting work that DDSP cites is this paper: [Active learning of intuitive control knobs for synthesizers using gaussian processes https://dl.acm.org/doi/10.1145/2557500.2557544]. In this paper, they learn high-level "knobs" that map from synthesizer control space (i.e., parameters such as F0, amplitude, harmonic distribution, etc.) to high-level concepts, such as "scariness" or "steadiness." This is relevant and interesting, since they are building an interactive ML system for music generation, but also one in which it would be ripe to explore the space of counterfactual possibilities (e.g. if the "scariness" knob had been lower, what would the composition have been like?)

TODO:

- Intuitive control knobs

- Bayesian optimization

- Gaussian Processes

Week 6

TODO:

- Design Adjectives

- knob prototype (link colab)

- idea of higher-level and lower-level control knobs

- talked to Tobi Gerstenberg; describe convo

Week 7

TODO:

- discussion about building intuition for which control parameters to control (and how fine-grained): using envelopes.

- improved knob prototype