1. Kernel Functions
In general, a kernel
is an integrable function satisfying

is symmetric (e.g.
if
)
.
Popular univariate (
) kernel functions:
- Uniform:

- Epanechnikov:

- Gaussian:

An easy way to construct a multivariate (g>1) kernel from an univariate kernel is to construct a product kernel. Let
and
be an univariate kernel then
![Rendered by QuickLaTeX.com \[K(u) = \prod_{i=1}^g K_u(u_i).\]](https://www.thebigdatablog.com/wp-content/ql-cache/quicklatex.com-d92242cd97e77329bf918b83560183ea_l3.png)
An important feature is to scale kernel functions by a parameter matrix
with
.
2. Kernel Density Estimation
A famous application for kernels is to estimate the underlining density functions of a given independent and identically distributed
-variate random vectors
drawn from some distribution with an unknown density
. A kernel based density estimator is then given by
![Rendered by QuickLaTeX.com \[\hat{f}_h(u)= \frac{1}{n} \sum_{i=1}^n K_\textbf{H}(u-X_i)\]](https://www.thebigdatablog.com/wp-content/ql-cache/quicklatex.com-5d742e72f1b171227bf35c737b7a21a1_l3.png)
A naive Matlab implementation is straightforward:
% ------------------------------------------------------------------------------
% Description: g-Dimensional Density Estimator
%
% ------------------------------------------------------------------------------
% Usage: -
% ------------------------------------------------------------------------------
% Inputs Simulation:
%X - observations (Dimension: Txg)
%x - g dimensional output coordinates (Dimension: T_outxg)
%H - Bandwidth matrix (Dimension: gxg)
%type - Kernel choice, 'Gauss' for gaussian kernel, Epanechnikov else.
% ------------------------------------------------------------------------------
%Output:
%fit - Estimated Density
% ------------------------------------------------------------------------------
% Keywords: kernel, density estimation
% ------------------------------------------------------------------------------
% See also: -
% ------------------------------------------------------------------------------
% Author: Heiko Wagner, 2017/01/18
% ------------------------------------------------------------------------------
function [fit] = density(X,x,H,k_type)
T = size(X,1);
T_out = size(x,1);
%% Epanechnikov Kernel function
function [k]=epan(t);
k= 3/4*(1-t.^2);
k((abs(t)-1)>0)=0;
end
%%Construct Productkernel
function [A]=kernel(Xn)
g=size(Xn,2);
A=1;
if(strcmp(k_type,'Gauss')==1) %Use Gaussian Kernel
for (m=1:g)
A= A.*normpdf(Xn(:,m)) ;
end
else %Use Epanechnikov Kernel
for (m=1:g)
A= A.*epan(Xn(:,m)) ;
end
end
end
Xn=zeros( size(X) );
function [f_j]=getpoint(x_j)
%%%Construct g dimensional grid around point j, at this pint you can improve the code if you use the Epanechnikov Kernel by only selecting certain observations
Xn = X - repmat( x_j,T,1 );
f_j= 1/T*det(H)^(-1)* sum( kernel(Xn*H^(-1)) )
end
%%%To estimate not just a single point we estimate the function for an entire set of g dimensional output coordinates (x)
[out]=arrayfun(@(s) getpoint( x(s,:) ), 1:T_out , 'UniformOutput', false);
fit=[out{:}]';
end
The run time of this an algorithm in
-notation, that evaluates the density at
points, is then given by
. Can we do better? Yes, we can. A smarter way is to make use of the fast-fourier transformation (FFT) introduced by [1].
[Bibtex]
@article{Silverman1982,
ISSN = {00359254, 14679876},
URL = {http://www.jstor.org/stable/2347084},
author = {B. W. Silverman},
journal = {Journal of the Royal Statistical Society. Series C (Applied Statistics)},
number = {1},
pages = {93--99},
publisher = {[Wiley, Royal Statistical Society]},
title = {Algorithm AS 176: Kernel Density Estimation Using the Fast Fourier Transform},
volume = {31},
year = {1982}
}
Pingback: Nonparametric Density estimation using Spark – The Big Data Blog
Pingback: Kernel Regression using Pyspark – The Big Data Blog
Pingback: Fast Kernel Density Estimation using the Fast Fourier Transform – The Big Data Blog