class: center, middle, inverse, title-slide .title[ # On the geometric interpretation of MFPCA ] .subtitle[ ## and the usage of the Gram matrix ] .author[ ###
Steven Golovkine
· Edward Gunning · Andrew J. Simpkin · Norma Bargary ] .institute[ ### 54es Journée de Statistique de la SFDS ] .date[ ### July 5th, 2023 ] --- # Multivariate functional data <figure> <center> <img src="data:image/png;base64,#./img/data_matrix.svg" alt="observation" width="70%"/> </center> </figure> --- # Some notations - Observation space: `$$\mathcal{H} = \underbrace{\mathcal{L}^2(\mathcal{T}_1) \times \cdots \times \mathcal{L}^2(\mathcal{T}_P)}_{P \text{ terms}}.$$` - Inner product in `\(\mathcal{H}\)`: `$$\langle\!\langle f, g \rangle\!\rangle = \sum_{p = 1}^P \int_{\mathcal{T}_p} f^{(p)}(t_p)g^{(p)}(t_p)\mathrm{d}t_p.$$` - For `\(N\)` realizations of a process `\(X\)`, we note * the mean function `\(\mu\)`, * the covariance operator `\(\Gamma\)`, with covariance kernel `\(C\)`, * the Gram (inner-product) matrix `\(\mathbf{M}\)`. - Each feature of each observation is sampled on a regular grid of `\(M_p\)` points. --- # Cloud of individuals <br> <figure> <center> <img src="data:image/png;base64,#./img/cloud_obs.svg" alt="cloud_obs" width="47%"/> <img src="data:image/png;base64,#./img/cloud_obs_proj.svg" alt="cloud_obs_proj" width="47%"/> </center> </figure> --- # Cloud of individuals * Let `\(\pi_n, n \in \{1, \dots, N\}\)` be a weight on each observation such that `\(\sum_n \pi_n = 1\)`. * Distance between observations `$$d^2(\mathrm{M}_f, \mathrm{M}_g) = \langle\!\langle f - g, f - g \rangle\!\rangle, \quad f, g \in \mathcal{H}.$$` * Inertia of the cloud `\(\mathcal{C}_{\!N}\)` using `\(d\)` `$$\sum_{n = 1}^N \pi_n d^2(\mathrm{M}_n, \mathrm{G}_{\mu}) = \frac{1}{2}\sum_{n = 1}^N \sum_{m = 1}^N \pi_n \pi_m d^2(\mathrm{M}_n, \mathrm{M}_m) = \sum_{p = 1}^P \int_{\mathcal{T}_p} \text{Var} X^{(p)}(t_p)\mathrm{d}t_p.$$` -- * Another distance between observations `$$d^2_{\Gamma}(\mathrm{M}_f, \mathrm{M}_g) = \langle\!\langle f - g, \Gamma(f - g) \rangle\!\rangle, \quad f, g \in \mathcal{H}.$$` * Inertia of the cloud `\(\mathcal{C}_{\!N}\)` using `\(d_{\Gamma}\)` `$$\sum_{n = 1}^N \pi_n d^2_{\Gamma}(\mathrm{M}_n, \mathrm{G}_{\mu}) = \frac{1}{2}\sum_{n = 1}^N \sum_{m = 1}^N \pi_n \pi_m d^2_{\Gamma}(\mathrm{M}_n, \mathrm{M}_m) = \sum_{p = 1}^P \int_{\mathcal{T}_p} \lvert\!\lvert\!\lvert C_{p \cdot}(t_p, \cdot) \rvert\!\rvert\!\rvert^2 \mathrm{d}t_p.$$` --- # Cloud of features <figure> <center> <img src="data:image/png;base64,#./img/cloud_features.svg" alt="cloud_features" width="47%"/> <img src="data:image/png;base64,#./img/cloud_features_proj.svg" alt="cloud_features_proj" width="47%"/> </center> </figure> --- # Cloud of features * Distance between features `$$\mathrm{d}^2(\mathrm{M}_f, \mathrm{M}_g) = \sum_{n = 1}^N \pi_n \langle\!\langle X_n - \mu, f - g\rangle\!\rangle^2, \quad f, g \in \mathcal{H}.$$` * Inertia of the cloud `\(\mathcal{C}_{\!P}\)` `$$\sum_{n = 1}^N \pi_n \mathrm{d}^2(\mathrm{M}_n, \mathrm{G}_{\mu}) = \frac{1}{2}\sum_{n = 1}^N \sum_{m = 1}^N \pi_n \pi_m d^2_{\Gamma}(\mathrm{M}_n, \mathrm{M}_m) = \sum_{p = 1}^P \int_{\mathcal{T}_p} \lvert\!\lvert\!\lvert C_{p \cdot}(t_p, \cdot) \rvert\!\rvert\!\rvert^2 \mathrm{d}t_p.$$` * Correlation coefficient `$$\cos \theta_{fg} = \frac{\sum_{n = 1}^N \pi_n \langle\!\langle X_n - \mu, f \rangle\!\rangle \langle\!\langle X_n - \mu, g \rangle\!\rangle}{\left(\sum_{n = 1}^N \pi_n \langle\!\langle X_n - \mu, f \rangle\!\rangle^2\right)^{1/2}\left(\sum_{n = 1}^N \pi_n \langle\!\langle X_n - \mu, g \rangle\!\rangle^2\right)^{1/2}} = \frac{\langle\!\langle f, \Gamma g \rangle\!\rangle}{\langle\!\langle f, \Gamma f \rangle\!\rangle\langle\!\langle g, \Gamma g \rangle\!\rangle}.$$` --- # Duality diagram <figure> <center> <img src="data:image/png;base64,#img/duality_diagram.svg" alt="diagram" width="50%"/> <figcaption>Duality diagram (extended from <a id='cite-delacruzDualityDiagramData2011'></a><a href='#bib-delacruzDualityDiagramData2011'>De la Cruz and Holmes (2011)</a>).</figcaption> </center> </figure> --- # MFPCA * Consider the matrix `\(\mathbf{M}\)` of size `\(N \times N\)` with entries `$$\mathbf{M}_{ij} = \sqrt{\pi_i \pi_j}\langle\!\langle X_i - \mu, X_j - \mu\rangle\!\rangle, \quad i, j \in 1, \dots N.$$` * Eigenvalues of `\(\Gamma\)` and `\(\mathbf{M}\)` are related by `$$\lambda_k = l_k, \quad k = 1, 2, \dots$$` * Eigenvectors of `\(\Gamma\)` and `\(M\)` are related by `$$\phi_k(t) = \frac{1}{\sqrt{Nl_k}}\sum_{n = 1}^N v_{nk}\{X_n(t) - \mu(t)\}, \quad k = 1, 2, \dots$$` * Scores are given by `$$c_{nk} = \sqrt{Nl_k}v_{nk}, \quad n = 1, \dots, N, \quad k = 1, 2, \dots$$` --- # Computational complexity * Assume `\(M^a = \sum_{p = 1}^P M_p^a\)` and `\(K = \sum_{p = 1}^P K_p\)`. * Using the diagonalization of the covariance operator (<a id='cite-happMultivariateFunctionalPrincipal2015'></a><a href='https://doi.org/10.1080/01621459.2016.1273115'>Happ and Greven (2018)</a>) `$$\mathcal{O}\left(\underbrace{NM^2 + M^3 + N\sum_{p = 1}^P M_pK_p}_{\substack{\text{Univariate covariance}} \\ \substack{\text{decomposition}}} + \underbrace{NK^2 + K^3}_{\substack{\text{Univariate scores}} \\ \substack{\text{decomposition}}} + \underbrace{K\sum_{p = 1}^P M_pK_p + NK^2}_{\substack{\text{Multivariate eigencomponents}} \\ \substack{\text{and scores estimation}}}\right).$$` * Using the diagonalization of the inner product matrix `$$\mathcal{O}\left(\underbrace{N^2M^1 + N^3}_{\substack{\text{Gram matrix}} \\ \substack{\text{decomposition}}} + \underbrace{KPN + KN}_{\substack{\text{Multivariate eigencomponents}} \\ \substack{\text{and scores estimation}}}\right).$$` * Note that, here, the smoothing part is not considered into the computational complexity. --- # Simulation of multivariate functional data <figure> <center> <img src="data:image/png;base64,#./img/computation_time_1.svg" alt="comput_time_image_1" width="100%"/> </center> </figure> --- # Simulation of multivariate functional data <figure> <center> <img src="data:image/png;base64,#./img/mise_1.svg" alt="reconst_error_image_1" width="100%"/> </center> </figure> --- # Simulation of images data <figure> <center> <img src="data:image/png;base64,#./img/computation_time.svg" alt="comput_time_image" width="100%"/> </center> </figure> --- # Simulation of images data <figure> <center> <img src="data:image/png;base64,#./img/mise.svg" alt="reconst_error_image" width="100%"/> </center> </figure> --- # Takeaway ideas * We gave a geometric interpretation of the duality between rows and columns of a functional data matrix. * We provided relationships between the eigenelements of the covariance operator and the ones of the Gram matrix. * When to use the covariance operator? - Only one-dimensional curves. - For sparse to relatively dense functional data. * When to use the Gram matrix? - For two-dimensional (or higher dimensional) functional data (images). - For ultra-dense functional data. * The paper is available on arXiv: [arXiv:2306.12949](https://arxiv.org/abs/2306.12949) <h2 style="color:#005844;"><center>Thank you for your attention!</center></h2> --- # References <p><cite><a id='bib-delacruzDualityDiagramData2011'></a><a href="#cite-delacruzDualityDiagramData2011">De la Cruz, O. and S. Holmes</a> (2011). “The Duality Diagram in Data Analysis: Examples of Modern Applications”. In: <em>The annals of applied statistics</em> 5.4, pp. 2266–2277. ISSN: 1932-6157.</cite></p> <p><cite><a id='bib-happMultivariateFunctionalPrincipal2015'></a><a href="#cite-happMultivariateFunctionalPrincipal2015">Happ, C. and S. Greven</a> (2018). “Multivariate Functional Principal Component Analysis for Data Observed on Different (Dimensional) Domains”. In: <em>Journal of the American Statistical Association</em> 113.522, pp. 649-659. DOI: <a href="https://doi.org/10.1080/01621459.2016.1273115">10.1080/01621459.2016.1273115</a>.</cite></p>