Kernel based Estimators for Multivariate Densities and Functions

1. Kernel Functions

In general, a kernel K: \mathbb{R}^g \rightarrow \mathbb{R}  is an integrable function  satisfying

  1. \int_{\mathbb{R}^g} K(u) du =1
  2. K(u) is symmetric (e.g. K(u)=K(-u) if g=1)
  3. K(u) \geq 0.

Popular univariate (g=1) kernel functions:

  • Uniform: K(u)= \frac{1}{2} \textbf{1}_{ \{|u|\leq 1 \} }
  • Epanechnikov: K(u)= \frac{3}{4} (1-u^2)\textbf{1}_{ \{|u|\leq 1 \} }
  • Gaussian:  K(u)={\frac {1}{\sqrt {2\pi }}}e^{-{\frac {1}{2}}u^{2}}}

An easy way to construct a multivariate (g>1) kernel from an univariate kernel is to construct a product kernel. Let u=(u_1, \dots, u_g) and K_u be an univariate kernel then

    \[K(u) = \prod_{i=1}^g K_u(u_i).\]

An important feature is to scale kernel functions by a parameter matrix \textbf{H}=\{h_{ij}\}_{i,j=1,\dots,g} with K_\textbf{H}(u)=|\textbf{H}|^{-1} K(\textbf{H}^{-1} u ).

2. Kernel Density Estimation

A famous application for kernels is to estimate the underlining density functions of a given independent and identically distributed g-variate random vectors (X_1,\dots, X_n) drawn from some distribution with an unknown density f. A kernel based density estimator is then given by

    \[\hat{f}_h(u)= \frac{1}{n} \sum_{i=1}^n K_\textbf{H}(u-X_i)\]

A naive Matlab implementation is straightforward:

% ------------------------------------------------------------------------------ 
% Description: g-Dimensional Density Estimator
% ------------------------------------------------------------------------------ 
% Usage:       - 
% ------------------------------------------------------------------------------ 
% Inputs Simulation:      
%X    - observations (Dimension: Txg)
%x    - g dimensional output coordinates (Dimension: T_outxg)
%H    - Bandwidth matrix (Dimension: gxg)
%type - Kernel choice, 'Gauss' for gaussian kernel, Epanechnikov else.
% ------------------------------------------------------------------------------ 
%fit  - Estimated Density
% ------------------------------------------------------------------------------ 
% Keywords:    kernel, density estimation
% ------------------------------------------------------------------------------ 
% See also:    -  
% ------------------------------------------------------------------------------ 
% Author:      Heiko Wagner, 2017/01/18
% ------------------------------------------------------------------------------ 
function [fit] = density(X,x,H,k_type)
T = size(X,1);
T_out = size(x,1);
%% Epanechnikov Kernel function
function [k]=epan(t);
k= 3/4*(1-t.^2);
%%Construct Productkernel
function [A]=kernel(Xn)
    if(strcmp(k_type,'Gauss')==1)   %Use Gaussian Kernel
        for (m=1:g)
           A= A.*normpdf(Xn(:,m))  ;
    else                            %Use Epanechnikov Kernel
        for (m=1:g)
           A= A.*epan(Xn(:,m))  ;
Xn=zeros( size(X) ); 
function [f_j]=getpoint(x_j)
    %%%Construct g dimensional grid around point j, at this pint you can improve the code if you use the Epanechnikov Kernel by only selecting certain observations
    Xn = X - repmat( x_j,T,1 );  
    f_j= 1/T*det(H)^(-1)* sum( kernel(Xn*H^(-1)) )
%%%To estimate not just a single point we estimate the function for an entire set of g dimensional output coordinates (x) 
[out]=arrayfun(@(s) getpoint( x(s,:) ), 1:T_out , 'UniformOutput', false);

The run time of this an algorithm in \mathcal{O}-notation, that evaluates the density at m points, is then given by \mathcal{O}(mn). Can we do better? Yes, we can. A smarter way is to make use of the fast-fourier transformation (FFT) introduced by [1].

[1] B. W. Silverman, “Algorithm as 176: kernel density estimation using the fast fourier transform,” Journal of the royal statistical society. series c (applied statistics), vol. 31, iss. 1, p. 93–99, 1982.
ISSN = {00359254, 14679876},
URL = {},
author = {B. W. Silverman},
journal = {Journal of the Royal Statistical Society. Series C (Applied Statistics)},
number = {1},
pages = {93--99},
publisher = {[Wiley, Royal Statistical Society]},
title = {Algorithm AS 176: Kernel Density Estimation Using the Fast Fourier Transform},
volume = {31},
year = {1982}

3 thoughts on “Kernel based Estimators for Multivariate Densities and Functions

  1. Pingback: Nonparametric Density estimation using Spark – The Big Data Blog

  2. Pingback: Kernel Regression using Pyspark – The Big Data Blog

  3. Pingback: Fast Kernel Density Estimation using the Fast Fourier Transform – The Big Data Blog

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.