Estimating the extrema of noisy curves and optimization using spline surface approximation

Estimating maxima and minima of a noisy curve turns out to be very hard and to a large part is still an open question. In this blog post we will discover some strategies to encounter the problem. Another interpretation is given in the context of optimization. Consider an optimization problem where the (smooth) objective function is only evaluated at some, maybe random and noisy, points or due to computational issues can only be sparsely evaluated. Then this algorithm will give a fast estimate of the optimal values.

1. Estimation based on Splines

We assume that a smooth at least two times differentiable curve X(t) \in \mathbb{R} is observed at independent randomly-distributed points t_{k} \in [0,1], \; k=1,\dots, T contaminated with additional noise. Our model is then given by

(1)   \begin{equation*} Y(t_{k})=X(t_{k}) + \epsilon_{k}\end{equation*}


where \epsilon_{k} are i.i.d. random variables with \mathbf{E}\left[\varepsilon_{k}|X\right]=0, Var\left(\epsilon_{k}|X\right)= \sigma^2 and \epsilon_{k} is independent of X= X(t_1),\dots, X(t_k). Our goal is to determine all t_0 such that the derivative

(2)   \begin{equation*} \frac{\partial X(t)}{\partial t}(t_0)=0 \text{ and } \frac{\partial X(t)}{\partial t^2}(t_0) \neq 0.\end{equation*}

1.1 Natural cubic splines

Consider the following minimization problem, among all functions with two continuous derivatives find f \in C^{2}( [0,1] ) such that

(3)   \begin{equation*} \sum_{i=1}^T \{ Y_i - m(t_i) \}^2 + \lambda \int\{m^{''}(t) \}^2 dt\end{equation*}

is minimized. where \lambda is a fixed smoothing parameter and Y_i := Y(t_{i}). It can be shown [1], that (3) has an unique minimizer which is a particular natural cubic spline. More generally a natural cubic spline with knots \zeta_1<\dots<\zeta_K is a piecewise polynomial of order 4, with additional constrains such that the the function is linear beyond the boundary knots, represented with K Basis functions N_1(t),\dots, N_K(t). When the knots are given by \zeta_i=t_i, i=1,\dots,T the natural cubic spines solves (3) in that case an optimal solution for 3 is given by m(t)=\sum_{j=1}^T N_j(t) \theta_j.

Let

    \begin{equation*} \textbf{N}= \begin{bmatrix} N_1(t_1)& N_2(t_1) &\dots & N_K(t_1)\\ N_1(t_2)& N_2(t_2) & \dots & N_K(t_2)\\ \vdots & \vdots & \ddots & \vdots \\ N_1(t_T)& N_2(t_T) & \dots & N_K(t_T)\\ \end{bmatrix}, \Omega= \begin{bmatrix} \int N^{''}_1(t)N^{''}_1(t)dt &\dots & \int N^{''}_K(t)N^{''}_1(t)dt\\ \int N^{''}_1(t)N^{''}_2(t)dt & \dots & \int N^{''}_K(t)N^{''}_2(t)dt\\ \vdots & \ddots & \vdots \\ \int N^{''}_1(t)N^{''}_K(t)dt & \dots & \int N^{''}_K(t)N^{''}_K(t)dt\\ \end{bmatrix}  \end{equation*}

the solution to the minimization problem using \textbf{Y}=(Y_1,\dots,Y_T)^T is thus given by

(4)   \begin{equation*} \hat{\theta}}= (\textbf{N}^T \textbf{N} +T \lambda \Omega)^{-1} \textbf{N}^T \textbf{Y} \mathop{=}\limits^{\text{#def}} \textbf{K} \textbf{Y}\end{equation*}

A popular basis choice is derived from the truncated power series and given by

(5)   \begin{equation*} N_1(t)= 1,\; N_1(t)= t,\; N_{k+2}(t)= d_k(t)-d_{K-1}(t)\end{equation*}

where

(6)   \begin{equation*} d_k(t) = \frac{(t-\zeta_k)^3_+ - (t-\zeta_K)^3_+}{\zeta_K-\zeta_k}.\end{equation*}



We will use this basis as illustration for our method from here on. An important inside is that for a fixed interval limited by the nodes t \in [\zeta_k, \zeta_{k+1}], f(t) can be represented using a cubic function (see [2]). In particular

(7)   \begin{equation*}\hat{m}_k(t)=  \sum_{l=1}^{k} \frac{\theta_{l+2} }{\zeta_K - \zeta_l}  t^3 +  \sum_{l=1}^{k} \frac{- 3 \theta_{l+2} \zeta_l}{\zeta_K - \zeta_l}  t^2 +  \left( \theta_2 + \sum_{l=1}^{k} \frac{3\theta_{l+2} \zeta_l^2}{\zeta_K - \zeta_l} \right) t + \left( \theta_1 + \sum_{l=1}^{k} \frac{-\theta_{l+2} \zeta_l^3}{\zeta_K - \zeta_l} \right)\end{equation*}


The figure shows an example with X(t)=sin(10t)+t, \; -0.5\leq t \leq0.5 the grey dots show the observed Y while the black dots indicate the estimated extrema by the algorithm. The spline (brown curve) is estimated using K=5, the rainbow colored curves are the representation of the spline using a polynomial of order 3 between a sector delimited by two neighbor knots.


Accordingly,

(8)   \begin{equation*}  \hat{m} _k^{'}(t)= \underbrace{ \sum_{l=1}^{k} \frac{3 \theta_{l+2}}{\zeta_K - \zeta_l} }_{A_k} t^2 + \underbrace{ \sum_{l=1}^{k} \frac{- 6 \theta_{l+2} \zeta_l}{\zeta_K - \zeta_l} }_{B_k} t + \underbrace{ \theta_2 + \sum_{l=1}^{k} \frac{3\theta_{l+2} \zeta_l^2}{\zeta_K - \zeta_l} }_{C_k} \text{ and }  \hat{m} _k^{''}(t)= \sum_{l=1}^{k} \frac{ \theta_{l+2} ( - 6 \zeta_l + 6  t )}{ \zeta_K - \zeta_l }\end{equation*}

For B_k^2- 4 A_kC_k>0, the real possible extrema a_k of \hat{m}_k are thus given by

(9)   \begin{equation*} a_k = - \frac{B_k}{2A_k} \pm \frac{\sqrt{B_k^2- 4 A_kC_k}}{2A_k}\end{equation*}


this will then be an extrema of \hat{m}(t) if a_k \in [\zeta_k, \zeta_{k+1}] and \hat{m}^{''}(a_k) \neq 0.

1.2 Estimation and Asymptotic

To estimate the extrema we replace the spline coefficients \theta by \hat{\theta}, their estimated counterpart, as specified in (4) to derive \hat{A}_k, \hat{B}_k, \hat{C}_k and \hat{a}_k according to (8) and (9). First verify that for fixed \lambda, K

(10)   \begin{equation*}E(\hat{\theta}|X)  - \theta= ((\textbf{N}^T \textbf{N} +T \lambda \Omega)^{-1} - (\textbf{N}^T \textbf{N})^{-1}) \textbf{N}^T \textbf{N} \theta\end{equation*}


and

(11)   \begin{equation*}Var(\hat{\theta}|X) = \sigma^2 (\textbf{N}^T \textbf{N} +T \lambda \Omega)^{-1} \textbf{N}^T \textbf{N} (\textbf{N}^T \textbf{N} +T \lambda \Omega)^{-1} = \sigma^2 \textbf{K} \textbf{K}^T\end{equation*}


therefore

(12)   \begin{equation*}E(\hat{A}_k|X) - A_k= \iota^A_k ( E(\hat{\theta}|X) - \theta), \; Var(\hat{A}_k|X) = \iota_k^A Var(\hat{\theta}|X) (\iota_k^A)^T\end{equation*}


(13)   \begin{equation*}E(\hat{B}_k|X) - B_k= \iota^B_k( E(\hat{\theta}|X) - \theta), \; Var(\hat{B}_k|X) = \iota_k^B Var(\hat{\theta}|X) (\iota_k^B)^T\end{equation*}


(14)   \begin{equation*}E(\hat{C}_k|X) - C_k= \iota^C_k( E(\hat{\theta}|X) - \theta),\; Var(\hat{C}_k|X) = \iota_k^C Var(\hat{\theta}|X) (\iota_k^C)^T\end{equation*}


where \iota^A_k=(0,0,\frac{3}{\zeta_K - \zeta_1},\dots,\frac{3}{\zeta_K - \zeta_k}, \underbrace{\dots,0}_{K-k -2 } )^T, \iota^B_k=(0,0,\frac{-6 \zeta_1}{\zeta_K - \zeta_1},\dots,\frac{-6\zeta_k}{\zeta_K - \zeta_k},  \underbrace{\dots,0}_{ K-k -2 } )^T ,\iota^C_k=(0,1,\frac{3 \zeta_1^2}{\zeta_K - \zeta_1},\dots,\frac{3 \zeta_k^2}{\zeta_K - \zeta_k},  \underbrace{\dots,0}_{ K-k-2 } )^T.


After using a Taylor (see here for example) expansion and some calculus we can derive

(15)   \begin{equation*} \begin{split}E(\hat{a_k}|X)= a_k +( a'_{kA_k} \iota^A_k  +   a'_{kB_k} \iota^B_k +   a'_{kC_k} \iota^C_k ) ( E(\hat{\theta}|X) - \theta) \\ +  ( a''_{kA_k A_k } \iota^A_k  +   a'_{kB_k} \iota^B_k +   a'_{kC_k} \iota^C_k )   Var(\hat{\theta}|X)  ( \iota^A_k + \iota^B_k ) + o(\frac{1}{T})\end{split}\end{equation*}


The first order Taylor expansion of the variance is given by

(16)   \begin{equation*}Var(\hat{a}|X) \approx E([ ( a'_{kA_k} \iota^A_k  +   a'_{kB_k} \iota^B_k +   a'_{kC_k} \iota^C_k ) ( E(\hat{\theta}|X) - \theta) ]^2)\end{equation*}


where

(17)   \begin{equation*}a'_A= \frac{1}{2} (\frac{6AC+B^2}{\sqrt{4AC+B^2}}-B)\end{equation*}


(18)   \begin{equation*}a'_B=\frac{1}{2} A (\frac{B}{\sqrt{4AC-B^2}}-1)\end{equation*}

(19)   \begin{equation*}a'_C=\frac{A^2}{\sqrt{4AC+B^2}}\end{equation*}


(20)   \begin{equation*}a^{''}_{AA}=\frac{A^2}{\sqrt{4AC+B^2}}\end{equation*}

2. White noise representation

The white noise version, as descibed in [3], of (1) is

(21)   \begin{equation*}y_t=m(t) + e_t\end{equation*}


where the noise term e_t can be interpreted as n^{-1/2} v(t)^{1/2} DW(t) and v(t) is the error variance when the design point equals t, and D W(t) can be figuratively represented as a ‘derivative’ of standard Brownian motion, in the sense
that DW(t)dt = dW(t).

In the white noise case, the corresponding equivalent basis to is given by \phi(t|s)=(t-s)^3_{+} with the knot density \rho(s) then

(22)   \begin{equation*}\hat{m}(t)=\sum_{k=1}^4 \theta_k t^{k-1} + \int_0^1 \theta(s) \rho(s) \phi(t|s)ds\end{equation*}



Then the continuous analogue to (8) is given by

(23)   \begin{equation*}A(s)= \int_0^s \frac{3(  \theta_3+  \theta_4 + \theta(s)\rho(s) ) } {1 - s}  ds\end{equation*}


(24)   \begin{equation*}B(s)= \int_0^s  \frac{- 6 ( \theta_3+  \theta_4 +   \theta(s)\rho(s)) s}{1 - s}  ds\end{equation*}


(25)   \begin{equation*}C(s) = \theta_2 +  \int_0^s \frac{3 ( \theta_3+  \theta_4 +   \theta(s)\rho(s)s^2}{1 - s} ds\right\end{equation*}

Accordingly a is an extrema of \hat{m}(t) if and only if

(26)   \begin{equation*}a = - \frac{B(a)}{2A(a)} \pm \frac{\sqrt{B(a)^2- 4 A(a)C(a)}}{2A(a)} \;\text{and} \; m^{''}(a) \neq 0\end{equation*}


therefore an extrema is observed if a is a fix point.

[1] [doi] P. Craven and G. Wahba, “Smoothing noisy data with spline functions,” Numer. math., vol. 31, iss. 4, p. 377–403, 1978.
[Bibtex]
@article{Wahba1979,
author = {Craven, Peter and Wahba, Grace},
title = {Smoothing Noisy Data with Spline Functions},
year = {1978},
issue_date = {December 1978},
publisher = {Springer-Verlag},
address = {Berlin, Heidelberg},
volume = {31},
number = {4},
issn = {0029-599X},
url = {https://doi.org/10.1007/BF01404567},
doi = {10.1007/BF01404567},
abstract = {Smoothing splines are well known to provide nice curves which smooth discrete, noisy data. We obtain a practical, effective method for estimating the optimum amount of smoothing from the data. Derivatives can be estimated from the data by differentiating the resulting (nearly) optimally smoothed spline.We consider the model y i ( t i )+ i , i =1, 2, ..., n , t i [0, 1], where g W 2 ( m ) ={ f : f , f , ..., f ( m 1) abs. cont., f ( m ) 2[0,1]}, and the { i } are random errors with E i =0, E i j = 2 ij . The error variance 2 may be unknown. As an estimate of g we take the solution g n, to the problem: Find f W 2 (m) to minimize 

    \[frac{1}{n}sumlimits_{j = 1}^n {(f(t_j ) - y_j )^2 + lambda intlimits_0^1 {(f^{(m)} (u))^2 du} }\]

. The function g n, is a smoothing polynomial spline of degree 2 m 1. The parameter controls the tradeoff between the "roughness" of the solution, as measured by

    \[intlimits_0^1 {[f^{(m)} (u)]^2 du}\]

, and the infidelity to the data as measured by

    \[frac{1}{n}sumlimits_{j = 1}^n {(f(t_j ) - y_j )^2 }\]

, and so governs the average square error R( ; g)=R( ) defined by

    \[R(lambda ) = frac{1}{n}sumlimits_{j = 1}^n {(g_{n,lambda } (t_j ) - g(t_j ))^2 }\]

. We provide an estimate

    \[hat lambda\]

, called the generalized cross-validation estimate, for the minimizer of R( ) . The estimate

    \[hat lambda\]

is the minimizer of V ( ) defined by

    \[V(lambda ) = frac{1}{n}parallel (I - A(lambda ))yparallel ^2 /left[ {frac{1}{n}{text{Trace(}}I - A(lambda ))} right]^2\]

, where y=(y 1, ..., y n)t and A ( ) is the n n matrix satisfying (g n, ( t 1), ..., g n, ( t n))t= A ( ) y . We prove that there exist a sequence of minimizers

    \[tilde lambda = tilde lambda (n)\]

of EV( ) , such that as the (regular) mesh {t i} i=1 n becomes finer,

    \[mathop {lim }limits_{n to infty } ER(tilde lambda )/mathop {min }limits_lambda ER(lambda ) downarrow 1\]

. A Monte Carlo experiment with several smooth g 's was tried with m =2, n =50 and several values of 2, and typical values of

    \[R(hat lambda )/mathop {min }limits_lambda R(lambda )\]

were found to be in the range 1.01---1.4. The derivative g of g can be estimated by

    \[g'_{n,hat lambda } (t)\]

. In the Monte Carlo examples tried, the minimizer of

    \[R_D (lambda ) = frac{1}{n}sumlimits_{j = 1}^n {(g'_{n,lambda } (t_j ) - } g'(t_j ))\]

tended to be close to the minimizer of R(¿) , so that

    \[hat lambda\]

was also a good value of the smoothing parameter for estimating the derivative.}, journal = {Numer. Math.}, month = {dec}, pages = {377–403}, numpages = {27}, keywords = {MOS:65D10, MOS:65D25, CR:5.17} }
[2] C. REINSCH, “Smoothing by spline functions.,” Numerische mathematik, vol. 10, pp. 177-183, 1967.
[Bibtex]
@article{Reinsch1967,
author = {REINSCH, CH.H.},
journal = {Numerische Mathematik},
keywords = {numerical analysis},
pages = {177-183},
title = {Smoothing by Spline Functions.},
url = {http://eudml.org/doc/131782},
volume = {10},
year = {1967},
}
[3] P. Hall and J. D. Opsomer, “Theory for penalised spline regression,” Biometrika, vol. 92, iss. 1, p. 105–118, 2005.
[Bibtex]
@article{Hall92,
ISSN = {00063444},
URL = {http://www.jstor.org/stable/20441169},
abstract = {Penalised spline regression is a popular new approach to smoothing, but its theoretical properties are not yet well understood. In this paper, mean squared error expressions and consistency results are derived by using a white-noise model representation for the estimator. The effect of the penalty on the bias and variance of the estimator is discussed, both for general splines and for the case of polynomial splines. The penalised spline regression estimator is shown to achieve the optimal nonparametric convergence rate established by Stone (1982).},
author = {Peter Hall and J. D. Opsomer},
journal = {Biometrika},
number = {1},
pages = {105--118},
publisher = {[Oxford University Press, Biometrika Trust]},
title = {Theory for Penalised Spline Regression},
urldate = {2022-04-16},
volume = {92},
year = {2005}
}

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.