Getting Bandwidth Used By Scipy's Gaussian_kde Function

January 05, 2024 Post a Comment

I'm using SciPy's stats.gaussian_kde function to generate a kernel density estimate (kde) function from a data set of x,y points. This is a simple MWE of my code: import numpy as n

Solution 1:

Short answer

The bandwidth is kernel.covariance_factor()multiplied by the std of the sample that you are using.

(This is in the case of 1D sample and it is computed using Scott's rule of thumb in the default case).

Example:

from scipy.stats import gaussian_kde
sample = np.random.normal(0., 2., 100)
kde = gaussian_kde(sample)
f = kde.covariance_factor()
bw = f * sample.std()

The pdf that you get is this:

from pylab importplotx_grid= np.linspace(-6, 6, 200)
plot(x_grid, kde.evaluate(x_grid))

You can check it this way, If you use a new function to create a kde using, say, sklearn:

from sklearn.neighbors import KernelDensity
defkde_sklearn(x, x_grid, bandwidth):
    kde_skl = KernelDensity(bandwidth=bandwidth)
    kde_skl.fit(x[:, np.newaxis])
    # score_samples() returns the log-likelihood of the samples
    log_pdf = kde_skl.score_samples(x_grid[:, np.newaxis])
    pdf = np.exp(log_pdf)
    return pdf

Now using the same code from above you get:

plot(x_grid, kde_sklearn(sample, x_grid, f))

plot(x_grid, kde_sklearn(sample, x_grid, bw))

Solution 2:

I've got it, the line is:

kernel.covariance_factor()

From scipy.stats.gaussian_kde.covariance_factor:

Computes the coefficient (kde.factor) that multiplies the data covariance matrix to obtain the kernel covariance matrix. The default is scotts_factor. A subclass can overwrite this method to provide a different method, or set it through a call to kde.set_bandwidth.

One can check that the resulting kernel using this bandwidth value is equivalent to the kernel generated using the default bandwidth. To do this obtain a new kernel with the bandwidth given by covariance_factor(), and compare its value on a random point with the original kernel:

kernel = stats.gaussian_kde(np.vstack([x_data, y_data]))
print kernel([0.5, 1.3])

bw = kernel.covariance_factor()    
kernel2 = stats.gaussian_kde(np.vstack([x_data, y_data]), bw_method=bw)
print kernel2([0.5, 1.3])

Solution 3:

I came across this old question since I was also interested in knowing what was the bandwidth used for gaussian_kde from Scipy. I would like to add/amend previous answers that the covariance factor is used in kde.py code from Scipy as: self.covariance = self._data_covariance * self.factor**2

Therefore the full kernel covariance is the sample covariance times the square of the so-called covariance factor (Scott factor) which can be retrieved by kde.factor or kde.covariance_factor().

Getting Started with Python

Getting Bandwidth Used By Scipy's Gaussian_kde Function

Solution 1:

Solution 2:

Solution 3:

Post a Comment for "Getting Bandwidth Used By Scipy's Gaussian_kde Function"