In Python, How Can I Calculate Correlation And Statistical Significance Between Two Arrays Of Data?
Solution 1:
If you want to calculate the Pearson Correlation Coefficient, then scipy.stats.pearsonr
is the way to go; although, the significance is only meaningful for larger data sets. This function does not require the data to be manipulated to fall into a specified range. The value for the correlation falls in the interval [-1,1]
, perhaps that was the confusion?
If the significance is not terribly important, you can use numpy.corrcoef()
.
The Mahalanobis distance does take into account the correlation between two arrays, but it provides a distance measure, not a correlation. (Mathematically, the Mahalanobis distance is not a true distance function; nevertheless, it can be used as such in certain contexts to great advantage.)
Solution 2:
You can use the Mahalanobis distance between these two arrays, which takes into account the correlation between them.
The function is in the scipy package: scipy.spatial.distance.mahalanobis
There's a nice example here
Solution 3:
scipy.spatial.distance.euclidean()
This gives euclidean distance between 2 points, 2 np arrays, 2 lists, etc
import scipy.spatial.distanceas spsd
spsd.euclidean(nparray1, nparray2)
You can find more info here http://docs.scipy.org/doc/scipy/reference/spatial.distance.html
Post a Comment for "In Python, How Can I Calculate Correlation And Statistical Significance Between Two Arrays Of Data?"