Markov-Switching Models for Musical Interpretation

Daniel J. McDonald

Musical taste

Easy to describe music you like:
- “Jazzy sound”
- “Strong beat”
- “good lyrics”
- “anything by Taylor Swift”

Harder to describe a performance
Classical music is mainly about performances of the same music

How do we think about which ones we like?

Primer on “classical” music

Written between 6th century and today
Includes music written during the Classical period (1750–1820)

The real difference is that when a composer writes a piece of what’s usually called classical music, he puts down the exact notes that he wants, the exact instruments or voices that he wants to play or sing those notes—even the exact number of instruments or voices; and he also writes down as many directions as he can think of.

–Leonard Bernstein

Generally much more musically complicated

What’s different?

Mistakes
Extraneous noise
Recording quality
Articulation / Legato / Bowing / Breathing
Dynamics
Tempo / Rubato

Musical tempo

Notes change speed
Sometimes purposeful
Speed is important for interpretation

What is this “music”?

Important musical terms

Notes

Beat

Measure

Time signature

Tempo

Dynamics

All those little black dots

Strongly felt impetus

Collections of notes delimited by vertical “barlines”

Number of beats / measure; type of note that gets the beat

The prevailing speed, measured in bpm

Loudness of the note

Data

CHARM Mazurka Project

Focus on timing only (dynamics also available)
50 recordings: Chopin Mazurka Op. 68 No. 3
Recorded between 1931 and 2006
45 different performers

Chopin & Mazurkas

Fryderyk Chopin (1810–1849)

Born in Poland
Moved to Paris at 21
Virtuoso pianist
Wrote mainly piano music

Mazurka

A Polish dance
Chopin composed at least 58 for Piano
Repetition is very important
Certain rhythmic figures

Modelling tempo

Tempo regimes

1. Constant tempo

2. Accelerando (speeding up)

3. Ritarando (slowing down)

4. Tenuto (emphasis)

Transition diagram

1. Constant tempo

2. Accelerando (speeding up)

3. Ritarando (slowing down)

4. Tenuto (emphasis)

Transition diagram

1. Constant tempo

2. Accelerando (speeding up)

3. Ritarando (slowing down)

4. Tenuto (emphasis)

Before \(|\mathcal{S}| = 4\). Now, \(|\mathcal{S}| = 11\)

The HMM

HMM Inference

Viterbi algorithm gives estimates of \(\{S_k\}\)

Computation is \(O(n)\): likelihood of \(p\) and estimates of \(\{S_k\}\)

Inside of EM (Baum-Welch) gives estimates of \(p\)

Intentions vs. observations

Switching Kalman filter

Kalman filter algorithm

Developed in the late 1950s to track missiles

\[ \begin{aligned} X_{k+1} &= d_k + T_k X_k + \eta_{k+1}, & \eta_{k+1} &\sim \textrm{N}(0, Q_{k+1}),\\ Y_k &= c_k + Z_k X_k + \epsilon_{k}, &\epsilon_k & \sim \textrm{N}(0, G_k).\\ \end{aligned} \]

Assume \(X_0\) is Gaussian
KF just tracks mean and variance of \(X_k\ |\ \{Y_i\}_{i=1}^k\)
Does this iteratively for each \(k\)
kfilter() gives estimate of \(\{X_k\; |\; Y_1,\ldots,Y_k\}_{k=1}^n\)
kfilter() also gives likelihood of \(\theta\)
ksmoother() gives estimate of \(\{X_k\; |\; Y_1,\ldots,Y_n\}_{k=1}^n\)

Photographer: Anas Baba/AFP/Getty Images

Kalman filter inference

\[ \begin{aligned} X_{k+1} &= {\color{Plum} d_k} + {\color{Plum} T_k} X_k + \eta_{k+1}, & \eta_{k+1} &\sim \textrm{N}(0, {\color{Plum} Q_{k+1}}),\\ Y_k &= {\color{Plum} c_k} + {\color{Plum} Z_k} X_k + \epsilon_{k}, &\epsilon_k & \sim \textrm{N}(0, {\color{Plum} G_k}).\\ \end{aligned} \]

Parameter matrices may depend on

Parameter vector \(\theta\)
Current or previous values of \(X\) and \(Y\)

Kalman filter inference

kfilter() \(+\) ksmoother() algorithms gives estimates of \(\{X_k\}\)

Computation is \(O(n)\): likelihood of \(\theta\) and estimates of \(\{X_k\}\)

Inside of EM gives estimates of \(\theta\)

Switching Kalman filter (for our model)

\[ \begin{aligned} X_{k} &= d(s_k,\ s_{k-1}) + T(s_k,\ s_{k-1}) X_{k-1} + \eta_{k}\\\\ Y_k &= c(s_k) + Z(s_k) X_k + \epsilon_{k}\\\\ \eta_{k} &\sim \textrm{N}(0, Q(s_k,\ s_{k-1}))\\\\ \epsilon_k & \sim \textrm{N}(0, G(s_k)) \end{aligned} \]

We want to estimate: \(\theta\), \(p\), \(\{X_k\}\), and \(\{S_k\}\)

This costs \(O(|\mathcal{S}|^n)\), \(|\mathcal{S}|=11\), \(n=231\)

We don’t know the path through the discrete states

Discrete particle filter — `dpf()`

Track at most \(J\) paths through the \(|\mathcal{S}|^n\) tree

At time \(k\), receive \(J\) paths: a state sequence and a weight
Propagate each of the \(J\) paths forward, creating \(\approx J^2\)
Using the weights, sample the \(\approx J^2\) possibilities to get only \(J\)
iterate forward through time until done, return \(J\) sequences and their weights

The complete pseudocode

For each performance:

Iterate 1–3 to maximize for \(\theta \in \Theta\), produce \(\widehat\theta\)
1. Guess a parameter vector \(\theta\) (in some sensible way)
2. dpf() gives greedy state sequence \(\{\widehat{S}_k\}_{k=1}^n\)
3. It gives the likelihood as a side effect via kfilter()
Rerun dpf() and ksmoother() at \(\widehat{\theta}\) to get \(\{\widehat{S}_k\}_{k=1}^n\) and \(\{\widehat{X}_k\}_{k=1}^n\)

Decoding mazurkas

The estimated parameters

For each performance, we estimate \(p\) and \(\theta\) by penalized maximum likelihood.

Abuse notation and redefine \(\theta := (p, \theta)\)

average speed in different states
some variance parameters
transition probabilities

We have strong prior information.

Distance matrix on parameters

Use Mahalanobis \[d(\theta,\theta') = \sqrt{(\theta-\theta')^\mathsf{T} \mathbf{V}^{-1}(\theta-\theta')}\]

\(\mathbf{V}\) is prior covariance matrix

Incorporates correlations correctly on probability vectors

Some performances have no close neighbors

Probability of “stress”

Cluster 1

Cluster 2

Similar performances

Arthur Rubinstein

Cortot?

In summary

We develop a switching model for tempo decisions
We give an algorithm for performing likelihood inference
We estimate our model using a large collection of recordings of the same composition
We demonstrate how the model is able to recover performer intentions
We use the learned representations to compare and contrast recordings

Future work

Similar model for dynamics
Can we do this fast and “online”?
Use it for teaching?

Collaborators, funding, etc.

Musical taste

Primer on “classical” music

What’s different?

Musical tempo

What is this “music”?

Important musical terms

Data

Chopin & Mazurkas

Modelling tempo

Tempo regimes

Transition diagram

Transition diagram

The HMM

HMM Inference

Intentions vs. observations

Switching Kalman filter

Kalman filter algorithm

Kalman filter inference

Kalman filter inference

Switching Kalman filter (for our model)

We don’t know the path through the discrete states

Discrete particle filter — dpf()

The complete pseudocode

Decoding mazurkas

The estimated parameters

Distance matrix on parameters

Probability of “stress”

Cluster 1

Cluster 2

Similar performances

Arthur Rubinstein

Cortot?

In summary

Future work

Collaborators, funding, etc.

Discrete particle filter — `dpf()`