Estimating the extrema of noisy curves and optimization using spline surface approximation

Estimating maxima and minima of a noisy curve turns out to be very hard and to a large part is still an open question. In this blog post we will discover some strategies to encounter the problem. Another interpretation is given in the context of optimization. Consider an optimization problem where the (smooth) objective function is only evaluated at some, maybe random and noisy, points or due to computational issues can only be sparsely evaluated. Then this algorithm will give a fast estimate of the optimal values.

1. Estimation based on Splines

We assume that a smooth at least two times differentiable curve X(t) \in \mathbb{R} is observed at independent randomly-distributed points t_{k} \in [0,1], \; k=1,\dots, T contaminated with additional noise. Our model is then given by

(1)   \begin{equation*} Y(t_{k})=X(t_{k}) + \epsilon_{k}\end{equation*}


where \epsilon_{k} are i.i.d. random variables with \mathbf{E}\left[\varepsilon_{k}|X\right]=0, Var\left(\epsilon_{k}|X\right)= \sigma^2 and \epsilon_{k} is independent of X= X(t_1),\dots, X(t_k). Our goal is to determine all t_0 such that the derivative

(2)   \begin{equation*} \frac{\partial X(t)}{\partial t}(t_0)=0 \text{ and } \frac{\partial X(t)}{\partial t^2}(t_0) \neq 0.\end{equation*}

1.1 Natural cubic splines

Consider the following minimization problem, among all functions with two continuous derivatives find f \in C^{2}( [0,1] ) such that

(3)   \begin{equation*} \sum_{i=1}^T \{ Y_i - m(t_i) \}^2 + \lambda \int\{m^{''}(t) \}^2 dt\end{equation*}

is minimized. where \lambda is a fixed smoothing parameter and Y_i := Y(t_{i}). It can be shown [1], that (3) has an unique minimizer which is a particular natural cubic spline. More generally a natural cubic spline with knots \zeta_1<\dots<\zeta_K is a piecewise polynomial of order 4, with additional constrains such that the the function is linear beyond the boundary knots, represented with K Basis functions N_1(t),\dots, N_K(t). When the knots are given by \zeta_i=t_i, i=1,\dots,T the natural cubic spines solves (3) in that case an optimal solution for 3 is given by m(t)=\sum_{j=1}^T N_j(t) \theta_j.

Let

    \begin{equation*} \textbf{N}= \begin{bmatrix} N_1(t_1)& N_2(t_1) &\dots & N_K(t_1)\\ N_1(t_2)& N_2(t_2) & \dots & N_K(t_2)\\ \vdots & \vdots & \ddots & \vdots \\ N_1(t_T)& N_2(t_T) & \dots & N_K(t_T)\\ \end{bmatrix}, \Omega= \begin{bmatrix} \int N^{''}_1(t)N^{''}_1(t)dt &\dots & \int N^{''}_K(t)N^{''}_1(t)dt\\ \int N^{''}_1(t)N^{''}_2(t)dt & \dots & \int N^{''}_K(t)N^{''}_2(t)dt\\ \vdots & \ddots & \vdots \\ \int N^{''}_1(t)N^{''}_K(t)dt & \dots & \int N^{''}_K(t)N^{''}_K(t)dt\\ \end{bmatrix}  \end{equation*}

the solution to the minimization problem using \textbf{Y}=(Y_1,\dots,Y_T)^T is thus given by

(4)   \begin{equation*} \hat{\theta}}= (\textbf{N}^T \textbf{N} +T \lambda \Omega)^{-1} \textbf{N}^T \textbf{Y} \mathop{=}\limits^{\text{#def}} \textbf{K} \textbf{Y}\end{equation*}

A popular basis choice is derived from the truncated power series and given by

(5)   \begin{equation*} N_1(t)= 1,\; N_1(t)= t,\; N_{k+2}(t)= d_k(t)-d_{K-1}(t)\end{equation*}

where

(6)   \begin{equation*} d_k(t) = \frac{(t-\zeta_k)^3_+ - (t-\zeta_K)^3_+}{\zeta_K-\zeta_k}.\end{equation*}



We will use this basis as illustration for our method from here on. An important inside is that for a fixed interval limited by the nodes t \in [\zeta_k, \zeta_{k+1}], f(t) can be represented using a cubic function (see [2]). In particular

(7)   \begin{equation*}\hat{m}_k(t)=  \sum_{l=1}^{k} \frac{\theta_{l+2} }{\zeta_K - \zeta_l}  t^3 +  \sum_{l=1}^{k} \frac{- 3 \theta_{l+2} \zeta_l}{\zeta_K - \zeta_l}  t^2 +  \left( \theta_2 + \sum_{l=1}^{k} \frac{3\theta_{l+2} \zeta_l^2}{\zeta_K - \zeta_l} \right) t + \left( \theta_1 + \sum_{l=1}^{k} \frac{-\theta_{l+2} \zeta_l^3}{\zeta_K - \zeta_l} \right)\end{equation*}


The figure shows an example with X(t)=sin(10t)+t, \; -0.5\leq t \leq0.5 the grey dots show the observed Y while the black dots indicate the estimated extrema by the algorithm. The spline (brown curve) is estimated using K=5, the rainbow colored curves are the representation of the spline using a polynomial of order 3 between a sector delimited by two neighbor knots.


Accordingly,

(8)   \begin{equation*}  \hat{m} _k^{'}(t)= \underbrace{ \sum_{l=1}^{k} \frac{3 \theta_{l+2}}{\zeta_K - \zeta_l} }_{A_k} t^2 + \underbrace{ \sum_{l=1}^{k} \frac{- 6 \theta_{l+2} \zeta_l}{\zeta_K - \zeta_l} }_{B_k} t + \underbrace{ \theta_2 + \sum_{l=1}^{k} \frac{3\theta_{l+2} \zeta_l^2}{\zeta_K - \zeta_l} }_{C_k} \text{ and }  \hat{m} _k^{''}(t)= \sum_{l=1}^{k} \frac{ \theta_{l+2} ( - 6 \zeta_l + 6  t )}{ \zeta_K - \zeta_l }\end{equation*}

For B_k^2- 4 A_kC_k>0, the real possible extrema a_k of \hat{m}_k are thus given by

(9)   \begin{equation*} a_k = - \frac{B_k}{2A_k} \pm \frac{\sqrt{B_k^2- 4 A_kC_k}}{2A_k}\end{equation*}


this will then be an extrema of \hat{m}(t) if a_k \in [\zeta_k, \zeta_{k+1}] and \hat{m}^{''}(a_k) \neq 0.

1.2 Estimation and Asymptotic

To estimate the extrema we replace the spline coefficients \theta by \hat{\theta}, their estimated counterpart, as specified in (4) to derive \hat{A}_k, \hat{B}_k, \hat{C}_k and \hat{a}_k according to (8) and (9). First verify that for fixed \lambda, K

(10)   \begin{equation*}E(\hat{\theta}|X)  - \theta= ((\textbf{N}^T \textbf{N} +T \lambda \Omega)^{-1} - (\textbf{N}^T \textbf{N})^{-1}) \textbf{N}^T \textbf{N} \theta\end{equation*}


and

(11)   \begin{equation*}Var(\hat{\theta}|X) = \sigma^2 (\textbf{N}^T \textbf{N} +T \lambda \Omega)^{-1} \textbf{N}^T \textbf{N} (\textbf{N}^T \textbf{N} +T \lambda \Omega)^{-1} = \sigma^2 \textbf{K} \textbf{K}^T\end{equation*}


therefore

(12)   \begin{equation*}E(\hat{A}_k|X) - A_k= \iota^A_k ( E(\hat{\theta}|X) - \theta), \; Var(\hat{A}_k|X) = \iota_k^A Var(\hat{\theta}|X) (\iota_k^A)^T\end{equation*}


(13)   \begin{equation*}E(\hat{B}_k|X) - B_k= \iota^B_k( E(\hat{\theta}|X) - \theta), \; Var(\hat{B}_k|X) = \iota_k^B Var(\hat{\theta}|X) (\iota_k^B)^T\end{equation*}


(14)   \begin{equation*}E(\hat{C}_k|X) - C_k= \iota^C_k( E(\hat{\theta}|X) - \theta),\; Var(\hat{C}_k|X) = \iota_k^C Var(\hat{\theta}|X) (\iota_k^C)^T\end{equation*}


where \iota^A_k=(0,0,\frac{3}{\zeta_K - \zeta_1},\dots,\frac{3}{\zeta_K - \zeta_k}, \underbrace{\dots,0}_{K-k -2 } )^T, \iota^B_k=(0,0,\frac{-6 \zeta_1}{\zeta_K - \zeta_1},\dots,\frac{-6\zeta_k}{\zeta_K - \zeta_k},  \underbrace{\dots,0}_{ K-k -2 } )^T ,\iota^C_k=(0,1,\frac{3 \zeta_1^2}{\zeta_K - \zeta_1},\dots,\frac{3 \zeta_k^2}{\zeta_K - \zeta_k},  \underbrace{\dots,0}_{ K-k-2 } )^T.


After using a Taylor (see here for example) expansion and some calculus we can derive

(15)   \begin{equation*} \begin{split}E(\hat{a_k}|X)= a_k +( a'_{kA_k} \iota^A_k  +   a'_{kB_k} \iota^B_k +   a'_{kC_k} \iota^C_k ) ( E(\hat{\theta}|X) - \theta) \\ +  ( a''_{kA_k A_k } \iota^A_k  +   a'_{kB_k} \iota^B_k +   a'_{kC_k} \iota^C_k )   Var(\hat{\theta}|X)  ( \iota^A_k + \iota^B_k ) + o(\frac{1}{T})\end{split}\end{equation*}


The first order Taylor expansion of the variance is given by

(16)   \begin{equation*}Var(\hat{a}|X) \approx E([ ( a'_{kA_k} \iota^A_k  +   a'_{kB_k} \iota^B_k +   a'_{kC_k} \iota^C_k ) ( E(\hat{\theta}|X) - \theta) ]^2)\end{equation*}


where

(17)   \begin{equation*}a'_A= \frac{1}{2} (\frac{6AC+B^2}{\sqrt{4AC+B^2}}-B)\end{equation*}


(18)   \begin{equation*}a'_B=\frac{1}{2} A (\frac{B}{\sqrt{4AC-B^2}}-1)\end{equation*}

(19)   \begin{equation*}a'_C=\frac{A^2}{\sqrt{4AC+B^2}}\end{equation*}


(20)   \begin{equation*}a^{''}_{AA}=\frac{A^2}{\sqrt{4AC+B^2}}\end{equation*}

2. White noise representation

The white noise version, as descibed in [3], of (1) is

(21)   \begin{equation*}y_t=m(t) + e_t\end{equation*}


where the noise term e_t can be interpreted as n^{-1/2} v(t)^{1/2} DW(t) and v(t) is the error variance when the design point equals t, and D W(t) can be figuratively represented as a ‘derivative’ of standard Brownian motion, in the sense
that DW(t)dt = dW(t).

In the white noise case, the corresponding equivalent basis to is given by \phi(t|s)=(t-s)^3_{+} with the knot density \rho(s) then

(22)   \begin{equation*}\hat{m}(t)=\sum_{k=1}^4 \theta_k t^{k-1} + \int_0^1 \theta(s) \rho(s) \phi(t|s)ds\end{equation*}



Then the continuous analogue to (8) is given by

(23)   \begin{equation*}A(s)= \int_0^s \frac{3(  \theta_3+  \theta_4 + \theta(s)\rho(s) ) } {1 - s}  ds\end{equation*}


(24)   \begin{equation*}B(s)= \int_0^s  \frac{- 6 ( \theta_3+  \theta_4 +   \theta(s)\rho(s)) s}{1 - s}  ds\end{equation*}


(25)   \begin{equation*}C(s) = \theta_2 +  \int_0^s \frac{3 ( \theta_3+  \theta_4 +   \theta(s)\rho(s)s^2}{1 - s} ds\right\end{equation*}

Accordingly a is an extrema of \hat{m}(t) if and only if

(26)   \begin{equation*}a = - \frac{B(a)}{2A(a)} \pm \frac{\sqrt{B(a)^2- 4 A(a)C(a)}}{2A(a)} \;\text{and} \; m^{''}(a) \neq 0\end{equation*}


therefore an extrema is observed if a is a fix point.

[1] P. Craven and G. Wahba, “Smoothing noisy data with spline functions,” Numerische mathematik, vol. 31, pp. 377-403, 1978.
[Bibtex]
@article{Wahba1979,
title={Smoothing noisy data with spline functions},
author={Peter Craven and Grace Wahba},
journal={Numerische Mathematik},
year={1978},
volume={31},
pages={377-403}
}
[2] C. H. Reinsch, “Smoothing by spline functions,” Numerische mathematik, vol. 10, pp. 177-183, 1967.
[Bibtex]
@article{Reinsch1967,
title={Smoothing by spline functions},
author={Christian H. Reinsch},
journal={Numerische Mathematik},
year={1967},
volume={10},
pages={177-183}
}
[3] P. Hall and J. D. Opsomer, “Theory for penalised spline regression,” Biometrika, vol. 92, iss. 1, p. 105–118, 2005.
[Bibtex]
@article{Hall92,
ISSN = {00063444},
URL = {http://www.jstor.org/stable/20441169},
abstract = {Penalised spline regression is a popular new approach to smoothing, but its theoretical properties are not yet well understood. In this paper, mean squared error expressions and consistency results are derived by using a white-noise model representation for the estimator. The effect of the penalty on the bias and variance of the estimator is discussed, both for general splines and for the case of polynomial splines. The penalised spline regression estimator is shown to achieve the optimal nonparametric convergence rate established by Stone (1982).},
author = {Peter Hall and J. D. Opsomer},
journal = {Biometrika},
number = {1},
pages = {105--118},
publisher = {[Oxford University Press, Biometrika Trust]},
title = {Theory for Penalised Spline Regression},
urldate = {2022-04-16},
volume = {92},
year = {2005}
}

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.