Skip to content Skip to sidebar Skip to footer

Best Fit Line On Log Log Scales In Python 2.7

This is a network IP frequency rank plot in log scales. After completing this portion, I am trying to plot the best fit line on log-log scales using Python 2.7. I have to use matpl

Solution 1:

Data that falls along a straight line on a log-log scale follows a power relationship of the form y = c*x^(m). By taking the logarithm of both sides, you get the linear equation that you are fitting:

log(y)= m*log(x)+c

Calling np.polyfit(log(x), log(y), 1) provides the values of m and c. You can then use these values to calculate the fitted values of log_y_fit as:

log_y_fit = m*log(x) + c

and the fitted values that you want to plot against your original data are:

y_fit =exp(log_y_fit)=exp(m*log(x)+c)

So, the two problems you are having are that:

  1. you are calculating the fitted values using the original x coordinates, not the log(x) coordinates

  2. you are plotting the logarithm of the fitted y values without transforming them back to the original scale

I've addressed both of these in the code below by replacing plt.plot(z, np.poly1d(np.polyfit(logA, logB, 1))(z)) with:

m,c= np.polyfit(logA, logB,1)# fit log(y) = m*log(x) + c
y_fit = np.exp(m*logA +c)# calculate the fitted values of y 
plt.plot(z, y_fit,':')

This could be placed on one line as: plt.plot(z, np.exp(np.poly1d(np.polyfit(logA, logB, 1))(logA))), but I find that makes it harder to debug.

A few other things that are different in the code below:

  • You are using a list comprehension when you calculate logA from z to filter out any values < 1, but z is a linear range and only the first value is < 1. It seems easier to just create z starting at 1 and this is how I've coded it.

  • I'm not sure why you have the term x*log(x) in your list comprehension for logA. This looked like an error to me, so I didn't include it in the answer.

This code should work correctly for you:

fig=plt.figure()
ax = fig.add_subplot(111)

z=np.arange(1, len(x)+1) #startat1, to avoid error fromlog(0)

logA = np.log(z) #no need for list comprehension since all z values>=1
logB = np.log(y)

m, c = np.polyfit(logA, logB, 1) # fit log(y) = m*log(x) + c
y_fit = np.exp(m*logA + c) # calculate the fitted valuesof y 

plt.plot(z, y, color ='r')
plt.plot(z, y_fit, ':')

ax.set_yscale('symlog')
ax.set_xscale('symlog')
#slope, intercept = np.polyfit(logA, logB, 1)
plt.xlabel("Pre_referer")
plt.ylabel("Popularity")
ax.set_title('Pre Referral URL Popularity distribution')
plt.show()

When I run it on simulated data, I get the following graph:

Log-log graph with fitted line

Notes:

Solution 2:

I figured out another solution for this problem. Sharing this because it might be helpful.

fig=plt.figure()
ax = fig.add_subplot(111)
z=np.arange(len(x)) + 1print z
print y
rank = [np.log10(i) for i in z]
freq = [np.log10(i) for i in y]
m, b, r_value, p_value, std_err = stats.linregress(rank, freq)
print"slope: ", m
print"r-squared: ", r_value**2print"intercept:", b
plt.plot(rank, freq, 'o',color = 'r')
abline_values = [m * i + b for i in rank]
plt.plot(rank, abline_values)

This essentially achieves the objective as well. It uses the stats module.

Post a Comment for "Best Fit Line On Log Log Scales In Python 2.7"