The python module astroML contains an improved version of pylab's hist, where the form of the histogram can be automatically chosen based on different statistical models. Two noteworthy models are the Freedman-Diaconis Rule and the Bayesian block method discussed here, with examples of usage shown here.
The following python code calculates the optimal bin number based on the Freedman-Diaconis Rule without the use of the astroML module:
def bins(t): t.sort() n=len(t) width = 2*(t[3*n/4]-t[n/4])*n**(-1./3) return int((t[-1]-t[0])/width)An interesting alternative to histograms for estimating a distribution is Kernel Density Estimation, where kernels (a non-skewed function that integrates to one e.g. a normal distribution) are placed at each datapoint and the sum of the kernels give the kernel density estimate.
As an example, the following code generates a data set x, and plots the data by the three mentioned methods:
import numpy as np from astroML.plotting import hist from scipy.stats.kde import gaussian_kde import pylab #generate data from two gaussians x1 = np.random.normal(0,0.5,size=1000) x2 = np.random.normal(1.5,0.3, size=1000) x = np.concatenate((x1,x2)) #plot histogram from the Freedman-Diaconis Rule (filled opaque blue bins) hist(x, normed=1, alpha = 0.2, bins='freedman') #plot histogram using bayesian blocks (green step line) hist(x, normed=1, bins='blocks', histtype = 'step') #plot KDE using gaussian kernels (red): my_pdf = gaussian_kde(x) data_range = np.linspace(min(x),max(x),100) pylab.plot(data_range,my_pdf(data_range)) pylab.savefig('test')
Which produces the following plot, where the filled bins are from the Freedman-Diaconis Rule, the green step line is the bayesian block method and the red line is the KDE.
No comments:
Post a Comment