Skip to content Skip to sidebar Skip to footer

Python Distribution Fitting With Sum Of Square Error (sse)

I am trying to find an optimal distribution curve fit to my data consisting of y-axis = [0, 0, 0, 0, 0.24, 0.53, 0.49, 0.64, 0.54, 0.78, 0.59, 0.44, 0.34, 0.88, 0.2, 0.

Solution 1:

Your situation is not the same as that in the one treated in the question you cited. You have both the ordinates and the abscissae of the data points, rather than the usual i.i.d. sample. I would suggest that you use scipy curve_fit. Here's a sample.

x_axis = [1.0, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0, 8.0, 9.0, 10.0, 11.0, 12.0, 13.0, 14.0, 15.0, 16.0, 17.0, 18.0, 19.0, 20.0, 21.0, 22.0, 23.0, 24.0, 25.0, 26.0, 27.0, 28.0, 29.0, 30.0, 31.0, 32.0, 33.0, 34.0]
y_axis = [0, 0, 0, 0, 0.24, 0.53, 0.49, 0.64, 0.54, 0.78, 0.59, 0.44, 0.34, 0.88, 0.2, 0.49, 0.39, 0.39, 0.29, 0.2, 0.05, 0.05, 0.25, 0.05, 0.1, 0.15, 0.1, 0.1, 0.1, 0, 0, 0, 0, 0]

## y_axis values must be normalised
sum_ys = sum(y_axis)
y_axis = [_/sum_ys for _ in y_axis]
print (sum(y_axis))

from scipy.stats import gamma, norm
from scipy.optimize import curve_fit

defgamma_f(x, a, loc, scale):
    return gamma.pdf(x, a, loc, scale)

defnorm_f(x, loc, scale):
    return norm.pdf(x, loc, scale)

fitting = norm_f

result = curve_fit(fitting, x_axis, y_axis)
print (result)

import matplotlib.pyplot as plt

plt.plot(x_axis, y_axis, 'ro')
plt.plot(x_axis, [fitting(_, *result[0]) for _ in x_axis], 'b-')
plt.axis([0,35,0,.5])
plt.show()

This version shows how to do one plot, for the normal fit to the data. (The gamma provides a poor fit.) Only two parameters are needed for the normal. In general you would need only the first part of the output results, the estimates of the parameters, shape, location and scale.

(array([  2.3352639 ,  -3.08105104,  10.15024823]), array([[   5954.86532869,  -27818.92220973,  -19675.22421994],
       [ -27818.92220973,  133161.76500251,   90741.43608615],
       [ -19675.22421994,   90741.43608615,   66054.79087992]]))

Notice that the pdf of the gamma distribution is also available in scipy, as are the others that you need, I think, saving you the work of coding them.

The most important thing I omitted from the first code was the need to normalise the y-values, that is, to make them sum to one, since they should approximate histogram heights.

Solution 2:

I tried your example using OpenTURNS platform Here what I got.

I started with the same data as you after importing openturns and openturs.viewer.View for plotting

import openturns as ot
    from openturns.viewerimportView

    x_axis = [1.0, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0, 8.0, 9.0, 10.0, 11.0, 
          12.0, 13.0, 14.0, 15.0, 16.0, 17.0, 18.0, 19.0, 20.0, 21.0, 
          22.0, 23.0, 24.0, 25.0, 26.0, 27.0, 28.0, 29.0, 30.0, 31.0, 
          32.0, 33.0, 34.0]

    y_axis = [0, 0, 0, 0, 0.24, 0.53, 0.49, 0.64, 0.54, 0.78, 0.59, 0.44, 
          0.34, 0.88, 0.2, 0.49, 0.39, 0.39, 0.29, 0.2, 0.05, 0.05, 
          0.25, 0.05, 0.1, 0.15, 0.1, 0.1, 0.1, 0, 0, 0, 0, 0]

First step: we can define the corresponding distribution

    distribution = ot.UserDefined(ot.Sample([[s] forsin x_axis]), y_axis)
    graph = distribution.drawPDF()
    graph.setColors(["black"])
    graph.setLegends(["your input"])

at this stage, if you View(graph) you would get:

enter image description here

Second step: we can derive a sample from the obtained distibution

sample = distribution.getSample(10000)

this sample will be used to fit any kind of distributions. I tried with WeibullMin and Gamma distributions

# WeibullMin Factory
    distribution2 = ot.WeibullMinFactory().build(sample)
    print(distribution2)
    graph2 = distribution2.drawPDF() ; graph2.setLegends(["Best WeibullMin"])
    >>> WeibullMin(beta = 8.83969, alpha = 1.48142, gamma = 4.76832)

    # Gamma Factory
    distribution3 = ot.GammaFactory().build(sample)
    print(distribution3)
    >>> Gamma(k = 2.08142, lambda = 0.25157, gamma = 4.9995)
    graph3 = distribution3.drawPDF() ; graph3.setLegends(["Best Gamma"]) ; 
    graph3.setColors(["blue"])

    # plotting all the results
    graph.add(graph2) ; graph.add(graph3)
    View(graph)

enter image description here

Solution 3:

I think its the best and simple way to calculate the sum of square error:

#write the function

defSSE(y_true, y_pred):

     sse= np.sum((y_true-y_pred)**2)

     print(sse)

#now call the function and get results

SSE(y_true, y_pred)

Post a Comment for "Python Distribution Fitting With Sum Of Square Error (sse)"