class: center, middle, inverse, title-slide # Switching models for musical interpretation ### Daniel J. McDonald ### Indiana University ### 22 November 2019 --- ## Disclaimer .pull-left[ #### Inference + Optimization + Approximation Approximation for Least squares `\((n \gg p)\)` - Homrighausen and .green[McDonald]. (2019+). JCGS. Approximation for dimension reduction `\((p \gg n)\)` - Homrighausen and .green[McDonald]. (2016). JCGS. - Ding and .green[McDonald]. (2017). Bioinformatics. - Ding and .green[McDonald]. (2019). Under review. Algorithms for large data - .green[McDonald]. ADMM for large constrained kernel PCA. - .green[McDonald] and Khodadadi. (2019). AAAI. - .green[McDonald], Sharpnack, Bassett, and Sadhanala. Trend filtering for Spatio-temporal exponential families. ] .pull-right[ #### Tuning parameter selection and risk estimation and nonparametrics Model selection and penalized M-estimators - Homrighausen and .green[McDonald]. (2013). ICML. - Homrighausen and .green[McDonald]. (2014). Machine Learning. - Homrighausen and .green[McDonald]. (2017). Stat. Sinica. - .green[McDonald]. SURE for exponential families. Dependence and high dimensions - Homrighausen and .green[McDonald]. (2018). JSCS. - .green[McDonald] and Shalizi. (2018+). Under review. - .green[McDonald], Shalizi and Schervish. (2017). JMLR. - .green[McDonald]. (2017). AISTATS. - .green[McDonald]. Minimax non-parametric regression. ] ??? * Not my usual research area * more of a "side project" * but one I care about a lot --- background-image: url("gfx/State-cello-0299.jpg") background-size: contain ??? Music major Conservatory --- class: middle .pull-left-70[  ] .pull-right-30[  ] ??? * Have a professional cello * That one was backup for 15 years * Turns out, non-professionals don't need backup instruments * Nonetheless, BS in Cello from a top conservatory * have a taste for classical music --- ## Musical taste * Easy to describe music you like: - "Jazzy sound" - "Strong beat" - "good lyrics" - "anything by Taylor Swift" * Harder to describe a .red[performance] * Classical music is mainly about performances of the .red[same] music * How do we think about which ones we like? --- ## Primer on "classical" music * Written between 6th century and today * Includes music written during the Classical period (1750–1820) <blockquote cite="Leonard Bernstein">The real difference is that when a composer writes a piece of what’s usually called classical music, he puts down the exact notes that he wants, the exact instruments or voices that he wants to play or sing those notes—even the exact number of instruments or voices; and he also writes down as many directions as he can think of. </blockquote> * Generally much more "musically" complicated .center[<img src="gfx/hey-jude-single.jpg" height="175px">] ??? * Musically complicated = wider range of chords, keys, instrumentation, contrasts * Hey Jude: 3 chords (2 others briefly) in 7 minutes. Same key the whole time. * For today, Chopin is running example * Chopin: 6 unique chords in first 10 seconds. Two key areas in 1.5 minutes of music. --- class: middle .pull-left[ <img src="gfx/me.jpg" height="500px"> ] .pull-right[ <img src="gfx/rubin.jpg" height="500px"> ] ??? Which one do you like better? --- ## What's different? 1. Mistakes 2. Extraneous noise 3. Recording quality 4. Articulation/Legato/Bowing/Breathing 5. Dynamics 6. Tempo/Rubato ??? The first three are uninteresting. The others are about .red[.bold[interpretation]] We like performances with "better" interpretations --- class: inverse, middle, center background-image: url("https://www.wien.info/media/images/boesendorfer-piano-schriftzug-gespiegelt-3to2.jpeg/image_gallery") background-size: cover ??? Piano music * Simplifies the problem - No bowing, fingering, breathing, glissando * Focus on __tempo__ --- ## Musical tempo <img src="gfx/rubinstein-tempo-1.svg" style="display: block; margin: auto;" /> * Notes change "speed" * Sometimes purposeful * Speed is important for .red[.bold[interpretation]] --- class: inverse, center, middle # What is this "music"? --- ## Important musical terms .pull-left-20[ .green[.bolder[Notes]] .green[.bolder[Beat]] .green[.bolder[Measure]] .green[.bolder[Time signature]] .green[.bolder[Tempo]] .green[.bolder[Dynamics]] ] .pull-right-80[ All those little black dots Strongly felt impetus Collections of notes delimited by vertical "barlines" Number of beats / measure; type of note that gets the beat The prevailing speed, measured in bpm Loudness of the note ] <img src="gfx/ChopinFirst3.jpeg" width="60%" style="display: block; margin: auto;" /> --- ## Data * CHARM Mazurka Project <img src="gfx/charm.png" width="60%" style="display: block; margin: auto;" /> * Focus on timing only (dynamics also available) * 50 recordings: Chopin Mazurka Op. 68 No. 3 * Recorded between 1931 and 2006 * 45 different performers --- class: middle <img src="gfx/all-performance-lines-1.svg" style="display: block; margin: auto;" /> --- class: middle <img src="gfx/example-perf-1.svg" style="display: block; margin: auto;" /> --- ## Chopin & Mazurkas .pull-left[ #### Fryderyk Chopin (1810–1849) * Born in Poland * Moved to Paris at 21 * Virtuoso pianist * Wrote mainly piano music ] .pull-right[ #### Mazurka * A Polish dance * Chopin composed at least 58 for Piano * Repetition is very important * Certain rhythmic figures ] ??? Everything he wrote includes piano --- background-image: url("gfx/entire-mazurka.jpg") background-position: center background-size: contain ??? Tempo markings, importantly, only 2 + rit and fermata Dotted eighth sixteenth ABA structure Minor phrases Repetition Chord progression --- class: center, middle, inverse # Switching Kalman Filter --- ## Thinking about tempo .pull-left[ 1. Playing in tempo 2. Accelerando (speed up) 3. Allargando (slow down) 4. Tenuto (emphasis) ] .pull-right[  ] --- ## Transition diagram .pull-left-70[ <img src="gfx/markov-trans.svg" width="70%" style="display: block; margin: auto;" /> ] .pull-right-30[ 1. .const[__Constant tempo__] 2. .accel[__Speeding up__] 3. .decel[__Slowing down__] 4. .stress[__Emphasis__] ] --- ## Intentions vs. observations <img src="gfx/ss-mod-flow.svg" width="80%" style="display: block; margin: auto;" /> ??? Musicians aren't perfect. Observe noisy realization --- ## Switching state space models .pull-left[ <p style="text-align:center;"> <img src="gfx/business-cycle.png" height="175px"><figcaption>Economics (Kim and Nelson, 1998; Chauvet and Piger, 2008)</figcaption></p> ] .pull-right[ <p style="text-align:center;"> <img src="gfx/animal-movement.png" height="175px"><figcaption>Animal movement (Patterson, et al., 2008; Block, et al., 2011)</figcaption></p> ] <br> <p style="text-align:center;"> <img src="gfx/language-processing.png" height="175px"><figcaption>NLP (Fox, et al., 2011)</figcaption></p> --- ## Inference * Also unknown parameters `\(\theta\)` * If you know `\(\{S_k\}_{k=1}^n\)` and `\(\theta\)`, Kalman filter gives `\(\{\hat{X}_k\}_{k=1}^n\)` * If you know `\(\{X_k\}_{k=1}^n\)` and `\(\theta\)`, Viterbi algorithm gives `\(\{\hat{S}_k\}_{k=1}^n\)` * We need to learn `\(\{S_k,\ X_k\}_{k=1}^n\)` * And we need to estimate `\(\theta\)` --- ## Kalman filter * Developed in the late '50s to track missiles $$ `\begin{aligned} X_{k+1} &= d_k + T_k X_k + \eta_{k+1} & \eta_{k+1} &\sim \textrm{N}(0, Q_{k+1})\\ Y_k &= c_k + Z_k X_k + \epsilon_{k}&\epsilon_k & \sim \textrm{N}(0, G_k)\\ \end{aligned}` $$ * Assume `\(X_0\)` is Gaussian * Just track mean and variance of `\(X_k\ |\ \{Y_i\}_{i=1}^k\)` * Does this iteratively for each `\(k\)` * Gives "filter" estimate of `\(\{X_k\}_{k=1}^n\)` and likelihood ??? Here At and Zt and the components of epsilon are contained in theta --- ## Switching Kalman filter (for our model) .pull-left[ $$ `\begin{aligned} X_{k+1} &= d(s_t,s_{k-1}) + T(s_k,s_{k-1}) X_k + \eta_{k+1}\\ Y_t &= c(s_k) + Z(s_k) X_k + \epsilon_{k}\\\\ \eta_{k} &\sim \textrm{N}(0, Q(s_k,s_{k-1}))\\ \epsilon_k & \sim \textrm{N}(0, G(s_k)) \end{aligned}` $$ ] .pull-right[ <img src="gfx/ss-mod-flow.svg" width="1000%" style="display: block; margin: auto;" /> ] --- ## Examples `\begin{align} 1\rightarrow 1 && 1\rightarrow 2\\ x_{2} &= \begin{pmatrix}1&0\\0&0\end{pmatrix} x_{1} & x_{3} &= \begin{pmatrix} l_i\mu_{\textrm{acc}}\\ \mu_{\textrm{acc}}\end{pmatrix} + \begin{pmatrix}1&0\\0&0\end{pmatrix} x_{1} + \mbox{N}\left(0,\ \sigma_{\textrm{acc}}^2\begin{pmatrix} l_i^2 & l_i\\ l_i & 1 \end{pmatrix}\right)\\ y_2 &= (1\quad 0) x_2 + \mbox{N}(0,\ \sigma_\epsilon^2) & y_3 &= (1\quad 0) x_3 + \mbox{N}(0,\ \sigma_\epsilon^2). \end{align}` <br> -- <hr> <br> `\begin{align} 1\rightarrow 4 && 4\rightarrow 1\\ x_{2} &= \begin{pmatrix}0 \\ \mu_{\textrm{stress}} \end{pmatrix} + \begin{pmatrix}1&0\\0&0\end{pmatrix} x_{1} + \textrm{N}\left(0,\ \begin{pmatrix}0&0\\0&\sigma^2_{\textrm{stress}}\end{pmatrix}\right) & x_{3} &= \begin{pmatrix}1&0\\0&0\end{pmatrix} x_{2} \\ y_2 &= (1\quad 1) x_2 + \mbox{N}(0,\ \sigma_\epsilon^2) & y_3 &= (1\quad 0) x_3 + \mbox{N}(0,\ \sigma_\epsilon^2). \end{align}` ??? x is dim-2 (speed, acceleration) What is li? --- ## 1-step Kalman filter — .green[`kalman()`] Get estimates of `\(X_{k}\)` given a new observation `\(y_k\)` Input: * New data — `\(y_k\)`, * Parameter matrices — `\(d_k\)`, `\(c_k\)`, `\(T_k\)`, `\(Z_k\)`, `\(Q_k\)`, `\(G_k\)`, * Previous state mean and variance — `\(x_{k-1}\)`, `\(P_{k-1}\)` Predict new state `\(\longrightarrow \hat{x}_k = d + Tx_{k-1}\)`   `\(\hat{P}_k = Q+TP_{k-1}T^\top\)` Predict current data `\(\longrightarrow \hat{y}_k = c + Z\hat{x}_k\)`   `\(F=G + Z\hat{P}_kZ^\top\)` Calculate error `\(\longrightarrow v = y_k - \hat{y}_k\)`   `\(K = \hat{P}_kZ^\top F^{-1}\)` Update `\(\longrightarrow x_k = \hat{x}_k + Kv\)`   `\(P_k = \hat{P}_k(I - Z^\top K)\)` Log Likelihood `\(\longrightarrow \ell_k(\theta) \propto \ell_{k-1}(\theta) - v^\top F^{-1} v - \log(|F|)\)` ??? If know S, then that pins down all the parameter matrices Loop this over 1 ... n Maximize over theta --- ## We don't know the discrete states Pretend there are only 2 states
.Large[_k_ = .orange[1], .dark-blue[2], .red[3], .green[4]] ??? I have 4 states 2nd order Markov Leads to 11 states in 1-Markov Piece has 231 notes --- class: middle, center, inverse ## 3,645,000,000,000,000,000,000,000,000,000,000,000, ## 000,000,000,000,000,000,000,000,000,000,000,000, ## 000,000,000,000,000,000,000,000,000,000,000,000, ## 000,000,000,000,000,000,000,000,000,000,000,000, ## 000,000,000,000,000,000,000,000,000,000,000,000, ## 000,000,000,000,000,000,000,000,000,000,000,000, ## 000,000,000,000,000,000,000,000 --- ## Discrete particle filter — .green[`dpf()`] 1. Track at most `\(J\)` paths through the `\(M^n\)` tree 2. At time `\(k\)`, given `\(J\)` paths, propogate each one forward 3. Sample the `\(JM\)` possibilities to get only `\(J\)` 4. iterate forward through time until done
??? This is a greedy approximation The sampling step is important Probability of sampling is proportional to current weight times likelihood times trans prob --- ## The complete algorithm For each performance: 1. Guess a parameter vector `\(\theta\)` 2. .green[`dpf()`] gives greedy state sequence `\(\{\hat{S}_k\}_{k=1}^n\)` 3. It gives the likelihood as a side effect via .green[`kfilter()`] 4. Iterate 1–3 to maximize for `\(\theta \in \Theta\)` 5. Run the .green[`ksmoother()`] to get estimate for `\(\{X_k\}_{k=1}^n\)` ??? kfilter() 1 step appears in dpf() ksmoother() is conditional on all the data --- class: middle, center <img src="gfx/two-perfs-1.svg" style="display: block; margin: auto;" /> --- class: middle, center, inverse # Similar performances --- ## The estimated parameters For each performance, we estimate `\(\theta\)` by penalized maximum likelihood. The parameters are things like: - average speed in different states - some variance parameters - transition probabilities We have strong prior information. ??? Examples of strong priors --- <img src="gfx/estimated-parm-1.svg" style="display: block; margin: auto;" /> --- ## Distance matrix on parameters .pull-left-40[ * Use Mahalanobis distance on `\(\theta\)` $$d(\theta_1,\theta_2) = \sqrt{(\theta_1-\theta_2)^\top V^{-1}(\theta_1-\theta_2)} $$ * `\(V\)` is prior covariance matrix * Incorporates correlations correctly on probability vectors * Some performances have no "close" neighbors ] .pull-right-60[ <img src="gfx/parametric-clusters-1.svg" style="display: block; margin: auto;" /> ] --- <img src="gfx/other-removed-1.svg" style="display: block; margin: auto;" /> --- <img src="gfx/clustered-parameters-1.svg" style="display: block; margin: auto;" /> --- ## Probability of "stress" <img src="gfx/clustered-p14-1.svg" style="display: block; margin: auto;" /> --- <img src="gfx/clust-1-1.svg" style="display: block; margin: auto;" /> --- <img src="gfx/clust-2-1.svg" style="display: block; margin: auto;" /> --- <img src="gfx/similar-perfs-1.svg" style="display: block; margin: auto;" /> --- <img src="gfx/rubinstein-perfs-1.svg" style="display: block; margin: auto;" /> --- class: middle <img src="gfx/cortot-performance-1.svg" style="display: block; margin: auto;" /> --- class: middle, center, inverse [<h2>Examining performances and parameters</h2>](https://dajmcdon.shinyapps.io/ChopinMazurkaApp/) ??? Show Cortot recording again. Contrast with Tomsic parameter. Is it Cortot? The Hatto Scandal of the Concert Artist Label _Beyond the Score: Music as Performance_ by Nicholas Cook --- ## Why a switching model? * Most statistical methods for estimating functions assume "smoothness" * Trend filtering, splines, wavelets -- <img src="gfx/alternative-smoothers-1.svg" style="display: block; margin: auto;" /> --- ## Model fragility <img src="gfx/bad-model-1.svg" style="display: block; margin: auto;" /> --- ## In summary * We develop a switching model for tempo decisions * We give an algorithm for performing likelihood inference * We estimate our model using a large collection of recordings of the same composition * We demonstrate how the model is able to recover performer intentions * We use the learned representations to compare and contrast recordings --- ## Future work * Similar idea for dynamics here, examine the combination * Working on an extension to vocal music, glissandi, vibrato, scooping, etc. * Want a fast implementation to use for teaching --- ## Collaborators, etc. .pull-left[ <p style="text-align:center;"> <img src="gfx/craphael.jpg" height="200px"> <img src="gfx/mmcbride.jpg" height="200px"> </p> <p style="text-align:center;"> <img src="gfx/rob_granger.jpg" height="200px"></p> ] .pull-right[ <iframe width="460" height="250" src="https://www.youtube.com/embed/W8RTpOe-AqA?start=68" frameborder="0" allow="accelerometer; autoplay; encrypted-media; gyroscope; picture-in-picture" allowfullscreen></iframe> <p style="text-align:center;"><img src="gfx/nsf-logo.png" height="200px"></p> ]