Learning the regularity of the curves in FDA

class: center, middle, inverse, title-slide

.title[
# Learning the regularity of the curves in FDA
]
.subtitle[
## with applications
]
.author[
### Steven Golovkine · Nicolas Klutchnikoff · Valentin Patilea
]
.institute[
### CSDA & EcoSta Workshop on Statistical Data Science
]
.date[
### August 27, 2022
]

---

# Functional Data Analysis

<figure>
<center>
<img src="data:image/png;base64,#img/temperature.png" alt="temperature" width="500"/>
<img src="data:image/png;base64,#img/precipitation.png" alt="precipitation" width="500"/>
<figcaption>Canadian weather dataset <a id='cite-ramsay2005'></a>(<a href='#bib-ramsay2005'>Ramsay and Silverman, 2005</a>).</figcaption>
</center>
</figure>

---
# Functional Data Analysis

<figure>
<center>
<img src="data:image/png;base64,#img/power.png" alt="power" width="700"/>
<figcaption>Household Active Power Consumption (UC Irvine ML Repository).</figcaption>
</center>
</figure>

---
# Sample path regularity — a key concept

* Data: (randomly) discrete time noisy measurements of the curves

* Recover the curves

> A nonparametric regression problem where optimality depends on the curve regularity; 
 
 > see, *e.g.*, <a id='cite-tsybakov2009'></a>(<a href='https://doi.org/10.1007/b13794'>Tsybakov, 2009</a>).

* Estimate the mean and the covariance

> The optimal rates depend on the curve regularity;
 
 > see, *e.g.*, <a id='cite-cai2010'></a>(<a href='#bib-cai2010'>Cai and Yuan, 2010</a>), <a id='cite-cai2011'></a>(<a href='#bib-cai2011'>Cai and Yuan, 2011</a>), <a id='cite-cai2016'></a>(<a href='#bib-cai2016'>Cai and Yuan, 2016</a>).

* Fit usual predictive models (linear...)

> The optimal convergence rates depend on the predictor curve regularity;
 
 > see, *e.g.*, <a id='cite-hall2007'></a>(<a href='#bib-hall2007'>Hall and Horowitz, 2007</a>).

---
# Sample path regularity

* Let `$\mathcal{O}_\star$` be a neighborhood of `$t$`.

* For `$H_{t} \in (0, 1)$`, `$L_t > 0$`, `$u, v \in \mathcal{O}_\star$` assume that the stochastic process `$X$` satisfies the condition:

`$$\mathbb{E} {(X_u - X_v)^2} \asymp L_t^2\lvert v - u \rvert^{2H_{t}}.$$`

* `$H_{t}$` is called **the local regularity of the process `$X$`** on `$\mathcal{O}_\star$`.

* Our parameter `$H_t$` is related to Hurst exponent.

* The definition extends to smoother sample paths using the derivatives of `$X_u$`  and `$X_v$` instead.

---
# Local regularity vs. eigenvalue decrease rate

* The (local) regularity is related to the decrease rate of the eigenvalues of the covariance operator of the process.

* Under some conditions, if

`$$\lambda_j \sim j^{-\nu}, \qquad j \geq 1,$$`

for some `$\nu > 1$`, then

$$ 2(H+\delta) = \nu - 1$$
    
when the sample paths admit derivative up to order `$\delta$` and `$H$` is the regularity of the derivatives of order `$\delta$`.

--

<h3 style="color:#005844;"><center>The value of `$\nu$` is usually supposed given!</center></h3>

---
# Observed data

* Let `$X^{(1)}, \dots, X^{(N)}$` be an independent sample of a random process `$X = (X (t)  : t \in [0, 1])$` with continuous trajectories.

* For each `$1 \leq n \leq N$`

> `$M_n$` is a random positive integer.
    
    > `$T_m^{(n)} \in [0, 1], 1 \leq m \leq M_n$` be the (random) observation times, design points, for the curve `$X^{(n)}$`.

* The observations are `$(Y_m^{(n)}, T_m^{(n)})$`,  `$1 \leq m \leq M_n, 1 \leq n \leq N$`, where

`$$Y_m^{(n)} = X^{(n)}(T_m^{(n)}) + \sigma(T_m^{(n)}, X^{(n)}(T_m^{(n)})) e_m^{(n)}.$$`

* `$e_m^{(n)}$` are independent copies of a standardized  error term.

---
# Estimation

* For `$s, t \in \mathcal{O}_\star$`, let

`$$\theta(s, t) = \mathbb{E}\left[(X_t - X_s)^2\right] \approx L^2 \lvert t - s \rvert^{2H_{t_0}}.$$`

* Let `$t_1$` and `$t_3$` be such that `$[t_1, t_3] \subset \mathcal{O}_\star$`, and denote `$t_2$` the middle point of `$[t_1, t_3]$`.

* A natural proxy of `$H_{t_0}$` is given by
    
`$$\frac{\log(\theta(t_1, t_3)) - \log(\theta(t_1, t_2))}{2\log 2}, \quad\text{if}~ t_3 - t_1 ~\text{is small.}$$`
--

* An estimator of `$H_{t_0}$` is given by

`$$\frac{\log(\widehat \theta(t_1, t_3)) - \log(\widehat \theta(t_1, t_2))}{2\log 2}, \quad\text{if}~ t_3 - t_1 ~\text{is small.}$$`
where, given a nonparametric estimator `$\widetilde{X_{t}}$` of `$X_{t}$`, 
`$$\widehat\theta(s, t) = \frac{1}{N}\sum_{n = 1}^N \left(\widetilde{X}^{(n)}_{t} - \widetilde{X}^{(n)}_{s}\right)^2.$$`
---
# Estimator of the mean function

* For any `$t \in \mathcal{T}$`, let

`$$w_n(t; h) = 1 \quad \text{if}\quad \sum_{m = 1}^{M_n} \mathbf{1}\{\vert T_m^{(n)} - t \leq h\vert\} \geq k_0, \quad n \in \{1, \dots, N\}.$$`
* Then,

`$$W_N(t, h) = \sum_{n = 1}^N w_n(t, h).$$`

* The estimator of the mean function is

`$$\widehat{\mu}_N(t, h) = \frac{1}{W_N(t, h)}\sum_{n = 1}^N w_n(t, h)\widehat{X}^{(n)}(t), \quad t \in \mathcal{T}.$$`

* An adaptive optimal bandwidth is

`$$\widehat{h}_{\mu}^\star = C_\mu (N\mathfrak{m})^{-1 / (1 + 2\widehat{H}_t)}.$$`

---
# Results for mean estimation

<figure>
<center>
<img src="data:image/png;base64,#img/exp1_mu.png" alt="exp1_mu" width="800"/>
<figcaption>ISE with respect to the true mean function `$\mu$`.</figcaption>
</center>
</figure>

---
# Estimator of the covariance function

* For any `$s \neq t$`, let

`$$W_N(s, t, h) = \sum_{n = 1}^N w_n(s, h)w_n(t, h).$$`

* The estimator of the covariance function is, for `$\lvert s - t \rvert > \delta$`,

`$$\widehat{\Gamma}_N(s, t, h) = \frac{1}{W_N(s, t, h)}\sum_{n = 1}^N w_n(s, h)\widehat{X}^{(n)}(s) w_n(t, h)\widehat{X}^{(n)}(t) - \widehat{\mu}^\star(s)\widehat{\mu}^\star(t).$$`

* An adaptive optimal bandwidth is

`$$\widehat{h}_{\Gamma}^\star = C_\Gamma (N\mathfrak{m})^{-1 / (1 + 2\min(\widehat{H}_s, \widehat{H}_t))}.$$`

---
# Results for covariance estimation

<figure>
<center>
<img src="data:image/png;base64,#img/exp1_gamma.png" alt="exp1_gamma" width="800"/>
<figcaption>ISE with respect to the true mean function `$\Gamma$`.</figcaption>
</center>
</figure>

---
# Takeaway ideas

* The available data in FDA are usually **noisy** measurements at a **discrete, possibly random design points**.

* The usual FDA methods require the **reconstruction** of the curves.

* The optimal curve recovery depends on the purpose, but in most cases **depends on the regularity** of the sample paths.

* We formalize the concept of local regularity of the process, propose a first **simple** estimator for it and mean and covariance functions.

* The paper concerning the estimation of the regularity is available at
<center>
<a href="https://projecteuclid.org/journals/electronic-journal-of-statistics/volume-16/issue-1/Learning-the-smoothness-of-noisy-curves-with-application-to-online/10.1214/22-EJS1997.full">DOI: 10.1214/22-EJS1997</a>
</center>

* A preprint of the paper for the estimation of the mean and covariance is available at
<center>
<a href="https://arxiv.org/abs/2108.06507">https://arxiv.org/abs/2108.06507</a>
</center>

<h2 style="color:#005844;"><center>Thank you for your attention!</center></h2>

---
# References

<cite><a id='bib-cai2010'></a><a href="#cite-cai2010">Cai, T. T. and M. Yuan</a>
(2010).
&ldquo;Nonparametric Covariance Function Estimation for Functional and Longitudinal Data&rdquo;.
In: University of Pennsylvania and Georgia institute of technology, p. 36.</cite>

<cite><a id='bib-cai2011'></a><a href="#cite-cai2011">Cai, T. T. and M. Yuan</a>
(2011).
&ldquo;Optimal estimation of the mean function based on discretely sampled functional data: Phase transition&rdquo;.
EN.
In: Annals of Statistics 39.5, pp. 2330&ndash;2355.
ISSN: 0090-5364, 2168-8966.
(Visited on Dec. 31, 2020).</cite>

<cite><a id='bib-cai2016'></a><a href="#cite-cai2016">Cai, T. T. and M. Yuan</a>
(2016).
&ldquo;Minimax and Adaptive Estimation of Covariance Operator for Random Variables Observed on a Lattice Graph&rdquo;.
In: Journal of the American Statistical Association 111.513, pp. 253&ndash;265.</cite>

<cite><a id='bib-hall2007'></a><a href="#cite-hall2007">Hall, P. and J. L. Horowitz</a>
(2007).
&ldquo;Methodology and convergence rates for functional linear regression&rdquo;.
EN.
In: The Annals of Statistics 35.1, pp. 70&ndash;91.
(Visited on Nov. 29, 2018).</cite>

<cite><a id='bib-ramsay2005'></a><a href="#cite-ramsay2005">Ramsay, J. and B. W. Silverman</a>
(2005).
Functional Data Analysis.
2nd ed.
Springer Series in Statistics.
New York: Springer-Verlag.</cite>

<cite><a id='bib-tsybakov2009'></a><a href="#cite-tsybakov2009">Tsybakov, A. B.</a>
(2009).
Introduction to Nonparametric Estimation.
En.
Springer Series in Statistics.
New York, NY: Springer New York.
DOI: <a href="https://doi.org/10.1007/b13794">10.1007/b13794</a>.
(Visited on Jan. 22, 2020).</cite>