Switching models for musical interpretation

# Switching models for musical interpretation
### Daniel J. McDonald
### Indiana University
### 22 November 2019

---

## Disclaimer

Approximation for Least squares `$(n \gg p)$`
  - Homrighausen and .green[McDonald]. (2019+). JCGS.
  
Approximation for dimension reduction `$(p \gg n)$`  
  - Homrighausen and .green[McDonald]. (2016). JCGS.
  - Ding and .green[McDonald]. (2017). Bioinformatics.
  - Ding and .green[McDonald]. (2019). Under review.
  
Algorithms for large data
  - .green[McDonald]. ADMM for large constrained kernel PCA.
  - .green[McDonald] and Khodadadi. (2019). AAAI.
  - .green[McDonald], Sharpnack, Bassett, and Sadhanala. Trend filtering for Spatio-temporal exponential families.
]

Model selection and penalized M-estimators
  - Homrighausen and .green[McDonald]. (2013). ICML.
  - Homrighausen and .green[McDonald]. (2014). Machine Learning.
  - Homrighausen and .green[McDonald]. (2017). Stat. Sinica.
  - .green[McDonald]. SURE for exponential families.
    
Dependence and high dimensions
  - Homrighausen and .green[McDonald]. (2018). JSCS.
  - .green[McDonald] and Shalizi. (2018+). Under review.
  - .green[McDonald], Shalizi and Schervish. (2017). JMLR.
  - .green[McDonald]. (2017). AISTATS.
  - .green[McDonald]. Minimax non-parametric regression.
]

???

* Not my usual research area

* more of a "side project"

* but one I care about a lot

---

???
Music major

Conservatory

---
class: middle

???

* Have a professional cello

* That one was backup for 15 years

* Turns out, non-professionals don't need backup instruments

* Nonetheless, BS in Cello from a top conservatory

* have a taste for classical music

---

## Musical taste

* Easy to describe music you like:
  - "Jazzy sound"
  - "Strong beat"
  - "good lyrics"
  - "anything by Taylor Swift"
  
* Harder to describe a .red[performance]

* Classical music is mainly about performances of the .red[same] music

* How do we think about which ones we like?

---

## Primer on "classical" music

* Written between 6th century and today

* Includes music written during the Classical period (1750&ndash;1820)

<blockquote cite="Leonard Bernstein">The real difference is that when a composer writes a piece of what’s usually called classical music, he puts down the exact notes that he wants, the exact instruments or voices that he wants to play or sing those notes—even the exact number of instruments or voices; and he also writes down as many directions as he can think of. </blockquote>

* Generally much more "musically" complicated

???

* Musically complicated = wider range of chords, keys, instrumentation, contrasts

* Hey Jude: 3 chords (2 others briefly) in 7 minutes. Same key the whole time.

* For today, Chopin is running example

* Chopin: 6 unique chords in first 10 seconds. Two key areas in 1.5 minutes of music.

---

???

Which one do you like better?

---

## What's different?

1. Mistakes

2. Extraneous noise

3. Recording quality

4. Articulation/Legato/Bowing/Breathing

5. Dynamics

6. Tempo/Rubato

???

The first three are uninteresting.

The others are about .red[.bold[interpretation]]

We like performances with "better" interpretations

---
class: inverse, middle, center
background-image: url("https://www.wien.info/media/images/boesendorfer-piano-schriftzug-gespiegelt-3to2.jpeg/image_gallery")
background-size: cover

???

Piano music

* Simplifies the problem
  - No bowing, fingering, breathing, glissando
  
* Focus on __tempo__

---

## Musical tempo

* Notes change "speed"

* Sometimes purposeful

* Speed is important for .red[.bold[interpretation]]

---
class: inverse, center, middle

# What is this "music"?

---
## Important musical terms

Strongly felt impetus

Collections of notes delimited by vertical "barlines"

Number of beats / measure; type of note that gets the beat

The prevailing speed, measured in bpm

Loudness of the note
]

---

## Data

* CHARM Mazurka Project

* Focus on timing only (dynamics also available)

* 50 recordings: Chopin Mazurka Op. 68 No. 3

* Recorded between 1931 and 2006

* 45 different performers

---
class: middle

---
class: middle

---

## Chopin & Mazurkas

* Born in Poland

* Moved to Paris at 21

* Virtuoso pianist

* Wrote mainly piano music
]

* A Polish dance

* Chopin composed at least 58 for Piano

* Repetition is very important

* Certain rhythmic figures

]

???

Everything he wrote includes piano

---

background-image: url("gfx/entire-mazurka.jpg")
background-position: center
background-size: contain

???

Tempo markings, importantly, only 2 + rit and fermata

Dotted eighth sixteenth

ABA structure

Minor phrases

Repetition

Chord progression

---
class: center, middle, inverse

# Switching Kalman Filter

---

## Thinking about tempo

2. Accelerando (speed up)

3. Allargando (slow down)

4. Tenuto (emphasis)
]

.pull-right[
![](https://upload.wikimedia.org/wikipedia/commons/thumb/4/43/Metronome_Mälzel_1.jpg/291px-Metronome_Mälzel_1.jpg)
]

---
## Transition diagram

.pull-left-70[
<img src="gfx/markov-trans.svg" width="70%" style="display: block; margin: auto;" />
]

2. .accel[__Speeding up__]

3. .decel[__Slowing down__]

4. .stress[__Emphasis__]
]

---

## Intentions vs. observations

???

Musicians aren't perfect.

Observe noisy realization

---

## Switching state space models
.pull-left[
<p style="text-align:center;">
<img src="gfx/business-cycle.png" height="175px"><figcaption>Economics (Kim and Nelson, 1998; Chauvet and Piger, 2008)</figcaption></p>
]
.pull-right[
<p style="text-align:center;">
<img src="gfx/animal-movement.png" height="175px"><figcaption>Animal movement (Patterson, et al., 2008; Block, et al., 2011)</figcaption></p>
]

<br>

---

## Inference

* Also unknown parameters `$\theta$`

* If you know `$\{S_k\}_{k=1}^n$` and `$\theta$`, Kalman filter gives `$\{\hat{X}_k\}_{k=1}^n$`

* If you know `$\{X_k\}_{k=1}^n$` and `$\theta$`, Viterbi algorithm gives `$\{\hat{S}_k\}_{k=1}^n$`

* We need to learn `$\{S_k,\ X_k\}_{k=1}^n$`

* And we need to estimate `$\theta$`

---

## Kalman filter

* Developed in the late '50s to track missiles

$$
`\begin{aligned}
X_{k+1} &= d_k + T_k X_k + \eta_{k+1} & \eta_{k+1} &\sim \textrm{N}(0, Q_{k+1})\\
Y_k &= c_k + Z_k X_k + \epsilon_{k}&\epsilon_k & \sim \textrm{N}(0, G_k)\\
\end{aligned}`
$$

* Assume `$X_0$` is Gaussian

* Just track mean and variance of `$X_k\ |\ \{Y_i\}_{i=1}^k$`

* Does this iteratively for each `$k$`

* Gives "filter" estimate of `$\{X_k\}_{k=1}^n$` and likelihood

???
Here At and Zt and the components of epsilon are contained in theta

---

## Switching Kalman filter (for our model)

.pull-left[
$$
`\begin{aligned}
X_{k+1} &= d(s_t,s_{k-1}) + T(s_k,s_{k-1}) X_k + \eta_{k+1}\\
Y_t &= c(s_k) + Z(s_k) X_k + \epsilon_{k}\\\\
\eta_{k} &\sim \textrm{N}(0, Q(s_k,s_{k-1}))\\
\epsilon_k & \sim \textrm{N}(0, G(s_k))
\end{aligned}`
$$
]

---
## Examples

`\begin{align}
  1\rightarrow 1 && 1\rightarrow 2\\
  x_{2} &= 
  \begin{pmatrix}1&0\\0&0\end{pmatrix} x_{1} 
        &   x_{3}
                    &= \begin{pmatrix} l_i\mu_{\textrm{acc}}\\ \mu_{\textrm{acc}}\end{pmatrix} +
  \begin{pmatrix}1&0\\0&0\end{pmatrix} x_{1} +
                         \mbox{N}\left(0,\ \sigma_{\textrm{acc}}^2\begin{pmatrix} l_i^2 & l_i\\ l_i & 1 \end{pmatrix}\right)\\
  y_2 &= (1\quad  0)  x_2 + \mbox{N}(0,\
                                 \sigma_\epsilon^2) &
y_3 &= (1\quad  0) x_3 + \mbox{N}(0,\
                                 \sigma_\epsilon^2).
\end{align}`

`\begin{align}
  1\rightarrow 4 && 4\rightarrow 1\\
  x_{2} &= \begin{pmatrix}0 \\ \mu_{\textrm{stress}} \end{pmatrix} +
  \begin{pmatrix}1&0\\0&0\end{pmatrix} x_{1} + 
  \textrm{N}\left(0,\ \begin{pmatrix}0&0\\0&\sigma^2_{\textrm{stress}}\end{pmatrix}\right)
        &  x_{3} &= 
  \begin{pmatrix}1&0\\0&0\end{pmatrix} x_{2} \\
  y_2 &= (1\quad  1)  x_2 + \mbox{N}(0,\
                                 \sigma_\epsilon^2) &
y_3 &= (1\quad  0) x_3 + \mbox{N}(0,\
                                 \sigma_\epsilon^2).
\end{align}`

???

x is dim-2 (speed, acceleration)

What is li?

---

## 1-step Kalman filter &mdash; .green[`kalman()`]

Get estimates of `$X_{k}$` given a new observation `$y_k$`

Input: 
  * New data &mdash; `$y_k$`, 
  * Parameter matrices &mdash; `$d_k$`, `$c_k$`, `$T_k$`, `$Z_k$`, `$Q_k$`, `$G_k$`, 
  * Previous state mean and variance &mdash; `$x_{k-1}$`, `$P_{k-1}$`

Predict new state `$\longrightarrow \hat{x}_k = d + Tx_{k-1}$` &emsp; `$\hat{P}_k = Q+TP_{k-1}T^\top$`

Predict current data `$\longrightarrow \hat{y}_k = c + Z\hat{x}_k$` &emsp; `$F=G + Z\hat{P}_kZ^\top$`

Calculate error `$\longrightarrow v = y_k - \hat{y}_k$` &emsp; `$K = \hat{P}_kZ^\top F^{-1}$`

Update `$\longrightarrow x_k = \hat{x}_k + Kv$` &emsp; `$P_k = \hat{P}_k(I - Z^\top K)$`

Log Likelihood `$\longrightarrow \ell_k(\theta) \propto \ell_{k-1}(\theta) - v^\top F^{-1} v - \log(|F|)$`

???

If know S, then that pins down all the parameter matrices

Loop this over 1 ... n

Maximize over theta

---

## We don't know the discrete states

Pretend there are only 2 states

<div id="htmlwidget-4e541f5c49ab8699d7e3" style="width:100%;height:400px;" class="widgetframe html-widget"></div>
<script type="application/json" data-for="htmlwidget-4e541f5c49ab8699d7e3">{"x":{"url":"gfx//widgets/widget_s-tree.html","options":{"xdomain":"*","allowfullscreen":false,"lazyload":false}},"evals":[],"jsHooks":[]}</script>

.Large[_k_ = .orange[1], .dark-blue[2], .red[3], .green[4]]

???

I have 4 states

2nd order Markov

Leads to 11 states in 1-Markov

Piece has 231 notes

---

## 3,645,000,000,000,000,000,000,000,000,000,000,000,
## 000,000,000,000,000,000,000,000,000,000,000,000,
## 000,000,000,000,000,000,000,000,000,000,000,000,
## 000,000,000,000,000,000,000,000,000,000,000,000,
## 000,000,000,000,000,000,000,000,000,000,000,000,
## 000,000,000,000,000,000,000,000,000,000,000,000,
## 000,000,000,000,000,000,000,000

---

## Discrete particle filter &mdash; .green[`dpf()`]

1. Track at most `$J$` paths through the `$M^n$` tree

2. At time `$k$`, given `$J$` paths, propogate each one forward

3. Sample the `$JM$` possibilities to get only `$J$`

4. iterate forward through time until done

<div id="htmlwidget-ebce5b1d454449217921" style="width:100%;height:200px;" class="widgetframe html-widget"></div>
<script type="application/json" data-for="htmlwidget-ebce5b1d454449217921">{"x":{"url":"gfx//widgets/widget_small-tree.html","options":{"xdomain":"*","allowfullscreen":false,"lazyload":false}},"evals":[],"jsHooks":[]}</script>

???

This is a greedy approximation

The sampling step is important

Probability of sampling is proportional to current weight times likelihood times trans prob

---

## The complete algorithm

For each performance:

1. Guess a parameter vector `$\theta$`

2. .green[`dpf()`] gives greedy state sequence `$\{\hat{S}_k\}_{k=1}^n$`

3. It gives the likelihood as a side effect via .green[`kfilter()`]

4. Iterate 1&ndash;3 to maximize for `$\theta \in \Theta$`

5. Run the .green[`ksmoother()`] to get estimate for `$\{X_k\}_{k=1}^n$`

???

kfilter() 1 step appears in dpf()

ksmoother() is conditional on all the data

---

---

# Similar performances

---

## The estimated parameters

For each performance, we estimate `$\theta$` by penalized maximum likelihood.

The parameters are things like:

- average speed in different states
  - some variance parameters
  - transition probabilities
  
We have strong prior information.

???
Examples of strong priors

---

---

## Distance matrix on parameters

.pull-left-40[
* Use Mahalanobis distance on `$\theta$`
$$d(\theta_1,\theta_2) = \sqrt{(\theta_1-\theta_2)^\top V^{-1}(\theta_1-\theta_2)} $$

* `$V$` is prior covariance matrix

* Incorporates correlations correctly on probability vectors

* Some performances have no "close" neighbors
]

---

---

---

## Probability of "stress"

---

---
<img src="gfx/clust-2-1.svg" style="display: block; margin: auto;" />

---
<img src="gfx/similar-perfs-1.svg" style="display: block; margin: auto;" />

---
<img src="gfx/rubinstein-perfs-1.svg" style="display: block; margin: auto;" />

---
class: middle

---
class: middle, center, inverse

[<h2>Examining performances and parameters</h2>](https://dajmcdon.shinyapps.io/ChopinMazurkaApp/)

???

Show Cortot recording again. Contrast with Tomsic parameter.

Is it Cortot?

The Hatto Scandal of the Concert Artist Label

_Beyond the Score: Music as Performance_ by Nicholas Cook

---

## Why a switching model?

* Most statistical methods for estimating functions assume "smoothness"

* Trend filtering, splines, wavelets

---

## Model fragility

---

## In summary

* We develop a switching model for tempo decisions

* We give an algorithm for performing likelihood inference

* We estimate our model using a large collection of recordings of the same composition

* We demonstrate how the model is able to recover performer intentions

* We use the learned representations to compare and contrast recordings

---

## Future work

* Similar idea for dynamics here, examine the combination

* Working on an extension to vocal music, glissandi, vibrato, scooping, etc.

* Want a fast implementation to use for teaching

---

## Collaborators, etc.

.pull-left[
<p style="text-align:center;">
<img src="gfx/craphael.jpg" height="200px">
<img src="gfx/mmcbride.jpg" height="200px">
</p>
<p style="text-align:center;">
<img src="gfx/rob_granger.jpg" height="200px"></p>
]

.pull-right[
<iframe width="460" height="250" src="https://www.youtube.com/embed/W8RTpOe-AqA?start=68" frameborder="0" allow="accelerometer; autoplay; encrypted-media; gyroscope; picture-in-picture" allowfullscreen></iframe>
<p style="text-align:center;"><img src="gfx/nsf-logo.png" height="200px"></p>
]