Skip to content Skip to sidebar Skip to footer

Plotting A Histogram With Overlaid Pdf

This is a follow-up to my previous couple of questions. Here's the code I'm playing with: import pandas as pd import matplotlib.pyplot as plt import scipy.stats as stats import num

Solution 1:

You should plot the histogram with density=True if you hope to compare it to a true PDF. Otherwise your normalization (amplitude) will be off.

Also, you need to specify the x-values (as an ordered array) when you plot the pdf:

fig, ax = plt.subplots()

df2[df2[column] >-999].hist(column, alpha =0.5, density=True, ax=ax)

param = stats.norm.fit(df2[column].dropna())
x = np.linspace(*df2[column].agg([min, max]), 100) # x-values

plt.plot(x, stats.norm.pdf(x, *param), color ='r')
plt.show()

enter image description here


As an aside, using a histogram to compare continuous variables with a distribution is isn't always the best. (Your sample data are discrete, but the link uses a continuous variable). The choice of bins can alias the shape of your histogram, which may lead to incorrect inference. Instead, the ECDF is a much better (choice-free) illustration of the distribution for a continuous variable:

def ECDF(data):
    n = sum(data.notnull())
    x = np.sort(data.dropna())
    y = np.arange(1, n+1) / n
    return x,y

fig, ax = plt.subplots()

plt.plot(*ECDF(df2.loc[df2[column] > -999, 'B']), marker='o')

param = stats.norm.fit(df2[column].dropna())
x = np.linspace(*df2[column].agg([min, max]), 100) # x-values

plt.plot(x, stats.norm.cdf(x, *param), color = 'r')
plt.show()

enter image description here

Post a Comment for "Plotting A Histogram With Overlaid Pdf"