Fast Kernel Density Estimation using the Fast Fourier Transform

1. Setup

This Post is about how to speed up the computation kernel density estimators using the FFT (Fast Fourier Transform). Let be X_1, \dots, X_T be a random sample drawn from an unknown distribution with density f. Remember, the kernel density estimator with bandwidth h is then given by

(1)   \begin{equation*}\hat{f}_h(u)= \frac{1}{T} \sum_{i=1}^T K_\textbf{h}(u-X_i)\end{equation*}

1.1 Convolution Theorem

In the following we will use the notation from a previous article where the convolution theorem for the Fourier Transform was introduced. Accordingly the discrete version of the convolution theorem is given by: Let f and g be two functions defined at evenly spaced points, their convolution is given by:

(2)   \begin{equation*}(f * g)_n:=\sum_{m=-\infty}^{m= \infty } f_m g_{n-m}.\end{equation*}


To speed up the kernel computation we will use a particular feature given by:

(3)   \begin{equation*}\widetilde{ (f * g)_n }= \tilde{ f_n }  \cdot  \tilde{   g_n } \end{equation*}

1.2 Using DFFT Convolution to estimate \hat{f}_h

Since the discrete convolution theorem requires functions observed at evenly spaced points we create a fine grid u \in (u_1, \dots u_{T'}) of length T'=2^k, k \in \mathbb{N} where we want to evaluate the density.

(4)   \begin{equation*}\hat{f}_h(u)= \frac{1}{T} \sum_{i=1}^{T} K_\textbf{h}(u- X_i )  =   \frac{1}{T} \sum_{i=1}^{T}  K_\textbf{h}(u) * \delta(u-  X_i )   \end{equation*}

where \delta() is the dirac delta function. To derive the estimate for all points (u_1, \dots u_{T'}) the computer has to handle \mathcal{O}(T'T) operations.

Following [1], let the discrete Fourier transform of \hat{f} be denoted by \tilde{f}. The Fourier transform of (4) for s \in   (u_1, \dots u_{T'}) is then using convolution and translation given by

(5)   \begin{equation*}\tilde{f}_h(s) = \frac{1}{T } \sum_{j=1}^{T}  \tilde{K}_\textbf{h}  (s) e^{i s X_j} =  \tilde{K}_\textbf{h}  (s) \tilde{u}(s) \end{equation*}


where \tilde{u}(s)=  \frac{1}{T }   \sum_{i=1}^{T}  e^{i s X_j} is the Fourier transform of the data. The result corresponding to (4) can then be obtained by applying the inverse DFFT. All these steps then involve only \mathcal{O}(T' log(T') ) operations. \tilde{u}(s) is derived using a histogram on the observed data with binning boundaries defined by u_1, \dots u_{T'}, and then applying the Fast Fourier transform.

1.3 Implementation

To implement the method the fft function that comes with the numpy package was chosen. The numpy fft() function will return the approximation of the DFT from 0 to \pi. Therefore fftshift() is needed to swap the output vector of the fft() right down the middle. So the output of fftshift(fft()) is then from -\pi/2 to \pi/2. The computation using the fft was around 5 times faster.

import numpy as np
import matplotlib.pyplot as plt
import timeit
 
# Simulation
N=500
X=np.random.normal(-0,1.5,N)
 
plt.scatter(np.linspace(0,N,N), X, c="blue")
plt.xlabel('sample')
plt.ylabel('X')
plt.show()

# Presetting
Nout=2**7

def epan_kernel(u,b):
    u= u/b
    return max(0, 1./b*3./4*(1-u**2))   

# Usual estimation
def dens(s, X, h=0.5):
    return [epan_kernel((s-x)/h,1) for x in X]

start = timeit.default_timer()

grid =  np.linspace(-5, 5, num=Nout)   
density=[(1./X.size)*sum( dens(y, X) ) for y in grid] 
plt.plot(grid, density )
 
stop = timeit.default_timer()
print('Time: ', stop - start)  
 
# Estimation using FFT
start = timeit.default_timer()

to = np.linspace(-5, 5, Nout+1)
t = grid
h=0.5
kernel=[epan_kernel(x/h,1) for x in t]
kernel_ft=np.fft.fft(kernel)
hist,bins=np.histogram(X, bins=to)
density_tmp=kernel_ft.flatten()*np.fft.fft(  hist  )
denisty_fft=(1./X.size)*np.fft.fftshift( np.fft.ifft(density_tmp).real)
plt.plot(t, denisty_fft)
 
stop = timeit.default_timer() 
print('Time: ', stop - start)  
print(np.sqrt(sum((density-denisty_fft)**2)))

References

[1] B. W. Silverman, “Algorithm as 176: kernel density estimation using the fast fourier transform,” Journal of the royal statistical society. series c (applied statistics), vol. 31, iss. 1, p. 93–99, 1982.
[Bibtex]
@article{Silverman1982,
ISSN = {00359254, 14679876},
URL = {http://www.jstor.org/stable/2347084},
author = {B. W. Silverman},
journal = {Journal of the Royal Statistical Society. Series C (Applied Statistics)},
number = {1},
pages = {93--99},
publisher = {[Wiley, Royal Statistical Society]},
title = {Algorithm AS 176: Kernel Density Estimation Using the Fast Fourier Transform},
volume = {31},
year = {1982}
}

7 thoughts on “Fast Kernel Density Estimation using the Fast Fourier Transform

  1. Pingback: Kernel based Estimators for Multivariate Densities and Functions – The Big Data Blog

  2. Pingback: Kernel Regression using the Fast Fourier Transform – The Big Data Blog

  3. Petkov Reply

    could you give an example for multidimensional kernel in python (for example 2D)

  4. Nichole Reply

    Thanks for sharing this!
    and note that there’s one typo:
    epan_kernel(s-x,1) should be epan_kernel((s-x)/h,1)

    -Nichole

    • Heiko Wagner Post authorReply

      Hi Nicole,
      thank’s for the remark. I’ll fix that.

  5. Anders Munk-Nielsen Reply

    Thanks for a super helpful post. Just to be fair to the/a standard implementation of the density (requiring T^2 evaluations), I suggest using the following vectorized evaluation of the kernel:

    grid = np.linspace(-5, 5, num=Nout)
    density = epan_kernel(np.subtract.outer(grid, X), b=0.5).mean(axis=1)
    density /= density.sum()

    (it’s using an np.subtract.outer to compute the T^2 differences between all points in the grid and the data vector)

    This way, the vectorized version is faster than the FFT version with 500 data points on my machine. But asymptotically, FFT will (of course) eventually triumph.

    Best, Anders

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.