layout: true <div class="my-footer"><span><a href="https://dajmcdon.github.io/chopin-2022-slides/" style="color:white">dajmcdon.github.io/chopin-2022-slides</a></span></div> --- background-image: url("gfx/cover-1.svg") background-size: contain background-position: top <br/><br/><br/><br/><br/><br/><br/><br/><br/><br/> .center[# Markov switching state-space models for uncovering musical interpretation] .pull-left[ ###Daniel J. McDonald ###University of British Columbia ####24 October 2022 ] .pull-right[ ![](gfx/qrcodes-1.png) <br> ] --- background-image: url("gfx/State-cello-0299.jpg") background-size: contain --- background-image: url("gfx/jacobs.jpg") background-size: contain ??? * Dabbled in Economics * PhD is Statistics * IU happened to have an opening * Met Prof. Chris Raphael at my interview * Accepted the Job --- background-image: url("https://img.freepik.com/free-vector/musical-pentagram-sound-waves-notes-background_1017-33911.jpg?w=1800&t=st=1666393969~exp=1666394569~hmac=2bd5438306b82462af3c31fb52c733010cd2f8ff069fcba5324d16a25638af2f") background-size: contain ??? * Occasionally dabbled in Music on the Side * First invited talk at Laval on Music * Finally published the paper in 2021 * First in-person talk post-pandemic * Seemed more fun than talking about all the COVID modelling or riveting stat theory --- ## Musical taste .pull-left[ * Easy to describe music you like: - "Jazzy sound" - "Strong beat" - "good lyrics" - "anything by Taylor Swift" ] .pull-right[ <div class="tenor-gif-embed" data-postid="8009181" data-share-method="host" data-aspect-ratio="1.30612" data-width="80%"><a href="https://tenor.com/view/head-phones-music-recording-studio-listening-to-music-bart-gif-8009181">Listening To Music GIF</a>from <a href="https://tenor.com/search/head+phones-gifs">Head Phones GIFs</a></div> <script type="text/javascript" async src="https://tenor.com/embed.js"></script> ] -- .pull-left[ * Harder to describe a .tertiary[performance] * Classical music is mainly about performances of the .tertiary[same] music * How do we think about which ones we like? ] --- ## Primer on "classical" music * Written between 6th century and today * Includes music written during the Classical period (1750–1820) <blockquote cite="Leonard Bernstein">The real difference is that when a composer writes a piece of what’s usually called classical music, he puts down the exact notes that he wants, the exact instruments or voices that he wants to play or sing those notes—even the exact number of instruments or voices; and he also writes down as many directions as he can think of. </blockquote> * Generally much more "musically" complicated <div class=center> <img src="gfx/hey-jude-single.jpg" width=125></img> <br> <audio controls> <source src="gfx/hey-jude-clip.m4a"> </audio> </div> ??? * Musically complicated = wider range of chords, keys, instrumentation, contrasts * Hey Jude: 3 chords (2 others briefly) in 7 minutes. Same key the whole time. * For today, Chopin is running example * Chopin: 6 unique chords in first 10 seconds. Two key areas in 1.5 minutes of music. --- class: middle <div class="pull-left"> <img src="gfx/me.jpg" height="500px"></img> <br> <div class="center"> <audio controls> <source src="gfx/RubinsteinFlatTempo.m4a"> </audio> </div> </div> <div class="pull-right"> <img src="gfx/rubin.jpg" height="500px"></img> <br> <div class="center"> <audio controls> <source src="gfx/Rubinstein.m4a"> </audio> </div> </div> ??? Which one do you like better? --- ## What's different? .pull-left[ 1. Mistakes 2. Extraneous noise 3. Recording quality 4. Articulation / Legato / Bowing / Breathing 5. Dynamics 6. Tempo / Rubato ] .pull-right[ ![:scale 90%](https://media.wbur.org/wp/2019/04/gustavo-dudamel.jpg) .tiny[Source: WBUR Boston] ] ??? The first three are uninteresting. The others are about .red[.bold[interpretation]] We like performances with "better" interpretations --- class: inverse, middle, center background-image: url("https://www.worldpianonews.com/wp-content/uploads/2020/05/Bosendorfer-280VC-custom4_Post-scaled.jpg") background-size: cover ??? Piano music * Simplifies the problem - No bowing, fingering, breathing, glissando * Focus on __tempo__ * professional pianists would cringe as I say this, but contrast with strings / winds / singers --- ## Musical tempo <img src="gfx/rubinstein-tempo-1.svg" width="100%" style="display: block; margin: auto;" /> * Notes change "speed" * Sometimes purposeful * Speed is important for .tertiary[.bold[interpretation]] ??? * Orange is Rubinstein, as recorded * Dashed is "mine", mechanically forced Rubinstein to be constant speed * Mention axes * Note the "slow down" at the end (phrase boundary) --- class: inverse, center, middle # What is this "music"? --- ## Important musical terms <img src="gfx/ChopinFirst3.jpeg" width="60%" style="display: block; margin: auto;" /> .pull-left-narrow[ ### Notes ### Beat ### Measure ### Time signature ### Tempo ### Dynamics ] .pull-right-wide[ All those little black dots Strongly felt impetus Collections of notes delimited by vertical "barlines" Number of beats / measure; type of note that gets the beat The prevailing speed, measured in bpm Loudness of the note ] ??? * notes indicate pitch and relative duration * (relative to...) beat is the baseline * accents / pedal markings --- ## Data * CHARM Mazurka Project <img src="gfx/charm.png" width="60%" style="display: block; margin: auto;" /> * Focus on timing only (dynamics also available) * 50 recordings: Chopin Mazurka Op. 68 No. 3 * Recorded between 1931 and 2006 * 45 different performers --- class: middle <img src="gfx/all-performance-lines-1.svg" width="100%" style="display: block; margin: auto;" /> ??? * Note the shaded region --- ## Chopin & Mazurkas .pull-left[ .secondary.large[Fryderyk Chopin (1810–1849)] * Born in Poland * Moved to Paris at 21 * Virtuoso pianist * Wrote mainly piano music ] .pull-right[ .secondary.large[Mazurka] * A Polish dance * Chopin composed at least 58 for Piano * Repetition is very important * Certain rhythmic figures ] .center[ ![:scale 40%](gfx/mazurka-dance.jpeg) ] ??? Everything he wrote includes piano --- background-image: url("gfx/entire-mazurka.jpg") background-position: center background-size: contain <div class="center"> <audio controls> <source src="gfx/IMSLP365286-PMLP02288-Mazurka_F-Dur_Op_68_Nr_3.mp3"> </audio> </div> </div> ??? * Recording by Christoph Zbinden, available from the IMSLP under CC By 4.0. * Tempo markings, importantly, only 2 + rit and fermata * Dotted eighth sixteenth * ABA structure * Minor phrases * Repetition * Chord progression --- ## Previous analysis Nicholas Cook, Craig Sapp, and Andrew Earis at CHARM Among other things, examines correlations between tempo curves .pull-left[ .secondary[Richter 1976] .center[ ![:scale 66%](http://www.mazurka.org.uk/ana/pcor/mazurka68-3-s50/Richter1976-tn.png) ]] .pull-right[ .secondary[Cortot 1951] .center[ ![:scale 66%](http://www.mazurka.org.uk/ana/pcor/mazurka68-3-s50/Cortot1951-tn.png) ]] --- ## Tempos and smoothing * Most statistical methods for estimating functions assume "smoothness" * Splines, wavelets, trend filtering <img src="gfx/alternative-smoothers-1.svg" width="100%" style="display: block; margin: auto;" /> --- class: center, middle <img src="gfx/unnamed-chunk-1-1.svg" width="100%" style="display: block; margin: auto;" /> --- class: center, middle, inverse # Switching Kalman Filter --- ## Thinking about tempo <br/> .pull-left[ ### 1. Playing in tempo <br/> ### 2. Accelerando (speed up) <br/> ### 3. Allargando (slow down) <br/> ### 4. Tenuto (emphasis) ] .pull-right[.center[ ![](https://upload.wikimedia.org/wikipedia/commons/thumb/4/43/Metronome_Mälzel_1.jpg/291px-Metronome_Mälzel_1.jpg) ]] --- ## Transition diagram .pull-left-wide[ <img src="gfx/markov-trans.svg" width="80%" style="display: block; margin: auto;" /> ] .pull-right-narrow[ <h3 class=const>1. Constant tempo</h3> <br/> <h3 class=accel>2. Speeding up</h3> <br/> <h3 class=decel>3. Slowing down</h3> <br/> <h3 class=stress>4. Emphasis</h3> ] --- ## Intentions vs. observations <img src="gfx/ss-mod-flow.svg" width="75%" style="display: block; margin: auto;" /> ??? * Musicians aren't perfect. * Imagine a speed that they'll maintain in CT state. * Accelerate/Decel from there. * Need to track tempo/accel in `\(X_k\)`. * These depend on `\(S_k\)` and `\(S_{k-1}\)` * Observe noisy realization --- ## Kalman filter .pull-left[* Developed in the late 1950s to track missiles] $$ `\begin{aligned} X_{k+1} &= d_k + T_k X_k + \eta_{k+1} & \eta_{k+1} &\sim \textrm{N}(0, Q_{k+1})\\ Y_k &= c_k + Z_k X_k + \epsilon_{k}&\epsilon_k & \sim \textrm{N}(0, G_k)\\ \end{aligned}` $$ .pull-left[ * Assume `\(X_0\)` is Gaussian * Just track mean and variance of `\(X_k\ |\ \{Y_i\}_{i=1}^k\)` * Does this iteratively for each `\(k\)` * Gives "filter" estimate of `\(\{X_k\}_{k=1}^n\)` and likelihood * Matrices can depend on parameters `\(\theta\)` * or on previous `\(X\)` / `\(Y\)` ] .pull-right[ ![](https://assets.bwbx.io/images/users/iqjWHBFdfxIU/iEmS3Tk.cj8U/v0/1400x-1.jpg) .tiny[Photographer: Anas Baba/AFP/Getty Images] ] --- ## Inference .emphasis[ We want to estimate 1. parameters `\(\theta\)` — transition probabilities, mean tempos, variances 2. hidden states `\(S_k\)` 3. hidden states `\(X_k\)` ] If you know `\(\{S_k\}_{k=1}^n\)` and `\(\theta\)`
.tertiary[Kalman filter] gives `\(\{\widehat{X}_k\}_{k=1}^n\)` * this is `\(O(n)\)` for each likelihood evaluation If you know `\(\{X_k\}_{k=1}^n\)` and `\(\theta\)`
.tertiary[Viterbi algorithm] gives `\(\{\widehat{S}_k\}_{k=1}^n\)` * this is `\(O(n)\)` for each likelihood evaluation .info[ But how do we estimate all of it? — both unknown is `\(O(b^n)\)` for each likelihood evaluation. ] ??? * Evaluating the likelihood for one `\(\theta\)` is exponential in \# of notes. * Non-convex to optimize over both. -- .secondary[This is a problem] --- ## Switching Kalman filter (for our model) <br><br> .pull-left[ $$ `\begin{aligned} X_{k} &= d(s_k,\ s_{k-1}) + T(s_k,\ s_{k-1}) X_{k-1} + \eta_{k}\\\\ Y_k &= c(s_k) + Z(s_k) X_k + \epsilon_{k}\\\\ \eta_{k} &\sim \textrm{N}(0, Q(s_k,\ s_{k-1}))\\\\ \epsilon_k & \sim \textrm{N}(0, G(s_k)) \end{aligned}` $$ ] .pull-right[ <img src="gfx/ss-mod-flow.svg" width="1000%" style="display: block; margin: auto;" /> ] --- ## Examples `\begin{align} 1\rightarrow 1 && 1\rightarrow 2\\ x_{2} &= \begin{pmatrix}1&0\\0&0\end{pmatrix} x_{1} & x_{3} &= \begin{pmatrix} l_i\mu_{\textrm{acc}}\\ \mu_{\textrm{acc}}\end{pmatrix} + \begin{pmatrix}1&0\\0&0\end{pmatrix} x_{1} + \mbox{N}\left(0,\ \sigma_{\textrm{acc}}^2\begin{pmatrix} l_i^2 & l_i\\ l_i & 1 \end{pmatrix}\right)\\\\ y_2 &= (1\quad 0) x_2 + \mbox{N}(0,\ \sigma_\epsilon^2) & y_3 &= (1\quad 0) x_3 + \mbox{N}(0,\ \sigma_\epsilon^2). \end{align}` <br> -- <hr> <br> `\begin{align} 1\rightarrow 4 && 4\rightarrow 1\\ x_{2} &= \begin{pmatrix}0 \\ \mu_{\textrm{stress}} \end{pmatrix} + \begin{pmatrix}1&0\\0&0\end{pmatrix} x_{1} + \textrm{N}\left(0,\ \begin{pmatrix}0&0\\0&\sigma^2_{\textrm{stress}}\end{pmatrix}\right) & x_{3} &= \begin{pmatrix}1&0\\0&0\end{pmatrix} x_{2} \\\\ y_2 &= (1\quad 1) x_2 + \mbox{N}(0,\ \sigma_\epsilon^2) & y_3 &= (1\quad 0) x_3 + \mbox{N}(0,\ \sigma_\epsilon^2). \end{align}` ??? x is dim-2 (speed, acceleration) What is `\(li\)`? --- ## We don't know the discrete states
??? I have 4 states 2nd order Markov Leads to 11 states in 1-Markov Piece has 231 notes --- ## Discrete particle filter — .secondary[dpf()] 1. Track at most `\(J\)` paths through the `\(M^n\)` tree 2. At time `\(k\)`, given `\(J\)` paths, propogate each one forward 3. Sample the `\(JM\)` possibilities to get only `\(J\)` 4. iterate forward through time until done
??? * This is a greedy approximation * AKA "Beam Search" * The sampling step is important * Probability of sampling is proportional to current weight times likelihood times trans prob * Example supposing J = 5 --- ## The complete procedure For each performance: * Iterate 1—3 to maximize for `\(\theta \in \Theta\)`, produce `\(\widehat\theta\)` 1. Guess a parameter vector `\(\theta\)` (in some sensible way) 2. .secondary[dpf()] gives greedy state sequence `\(\{\widehat{S}_k\}_{k=1}^n\)` 3. It gives the likelihood as a side effect via .secondary[kfilter()] * Rerun .secondary[dpf()] and .secondary[ksmoother()] at `\(\widehat{\theta}\)` to get `\(\{\widehat{S}_k\}_{k=1}^n\)` and `\(\{\widehat{X}_k\}_{k=1}^n\)` ??? * 1-step kfilter() step appears in dpf() * ksmoother() is conditional on all the data -- --- <br/> ![:scale 50%](https://alliancecan.ca/themes/custom/site_theme/logo.svg) ??? * Takes a few minutes per performance when 4 is done intelligently * I used 6hr walltime on Cedar with 1 perf / core and 10 restarts --- class: middle, center <img src="gfx/two-perfs-1.svg" width="100%" style="display: block; margin: auto;" /> --- class: middle, center, inverse # Similar performances --- ## The estimated parameters For each performance, we estimate `\(\theta\)` by penalized maximum likelihood. The parameters are things like: - average speed in different states - some variance parameters - transition probabilities We have strong prior information. ??? Examples of strong priors --- ## Distance matrix on parameters .pull-left[ * Use Mahalanobis `$$d(\theta,\theta') = \sqrt{(\theta-\theta')^\top V^{-1}(\theta-\theta')}$$` <br/> * `\(V\)` is prior covariance matrix <br/> * Incorporates correlations correctly on probability vectors <br/> * Some performances have no "close" neighbors ] .pull-right[ <img src="gfx/parametric-clusters-1.png" width="90%" style="display: block; margin: auto;" /> ] --- <img src="gfx/good-clusters.png" width="705px" height="600px" style="display: block; margin: auto;" /> --- <img src="gfx/clustered-parameters-1.svg" width="100%" style="display: block; margin: auto;" /> --- ## Probability of "stress" <img src="gfx/clustered-p14-1.svg" width="100%" style="display: block; margin: auto;" /> --- <img src="gfx/clust-1-1.svg" width="100%" style="display: block; margin: auto;" /> --- <img src="gfx/clust-2-1.svg" width="100%" style="display: block; margin: auto;" /> --- <img src="gfx/similar-perfs-1.svg" width="100%" style="display: block; margin: auto;" /> --- <img src="gfx/rubinstein-perfs-1.svg" width="100%" style="display: block; margin: auto;" /> --- class: middle <img src="gfx/cortot-performance-1.svg" width="100%" style="display: block; margin: auto;" /> ??? * Not actually Cortot * Likely Hatto or perhaps even an engineered recording * This was a major scandal for the British Concert Artist record label in about 2006/7 * ~100 other fake recordings * Discovered by uploading to Gracenote database --- ## In summary * We develop a switching model for tempo decisions * We give an algorithm for performing likelihood inference * We estimate our model using a large collection of recordings of the same composition * We demonstrate how the model is able to recover performer intentions * We use the learned representations to compare and contrast recordings -- ### Future work * Similar model for dynamics * Can we do this fast and "online"? * Use it for teaching? --- ## Collaborators, funding, etc. .pull-left[ <p style="text-align:center;"> <img src="gfx/craphael.jpg" height="200px"> <img src="gfx/mmcbride.jpg" height="200px"> </p> <p style="text-align:center;"> <img src="gfx/rob_granger.jpg" height="200px"></p> ] .pull-right[ <iframe width="460" height="250" src="https://www.youtube.com/embed/W8RTpOe-AqA?start=68" frameborder="0" allow="accelerometer; autoplay; encrypted-media; gyroscope; picture-in-picture" allowfullscreen align="center"></iframe> <br><br><br> <img src="gfx/nsf-logo.png" height="100px" align="left" hspace=5%> <img src="https://www.nserc-crsng.gc.ca/img/logos/img-logo2-en.png" height="100px" align="right" hspace=5%> ] --- class: inverse, middle, center # Appendix --- ## 1-step Kalman filter — .secondary[kalman()] Get estimates of `\(X_{k}\)` given a new observation `\(y_k\)` Input: * New data — `\(y_k\)`, * Parameter matrices — `\(d_k\)`, `\(c_k\)`, `\(T_k\)`, `\(Z_k\)`, `\(Q_k\)`, `\(G_k\)`, * Previous state mean and variance — `\(x_{k-1}\)`, `\(P_{k-1}\)` Predict new state
`\(\hat{x}_k = d + Tx_{k-1}\)`   `\(\hat{P}_k = Q+TP_{k-1}T^\top\)` Predict current data
`\(\hat{y}_k = c + Z\hat{x}_k\)`   `\(F=G + Z\hat{P}_kZ^\top\)` Calculate error
`\(v = y_k - \hat{y}_k\)`   `\(K = \hat{P}_kZ^\top F^{-1}\)` Update
`\(x_k = \hat{x}_k + Kv\)`   `\(P_k = \hat{P}_k(I - Z^\top K)\)` Log Likelihood
`\(\ell_k(\theta) \propto \ell_{k-1}(\theta) - v^\top F^{-1} v - \log(|F|)\)` ??? If know S, then that pins down all the parameter matrices Loop this over 1 ... n Maximize over theta