Skip to content Skip to sidebar Skip to footer

Matplotlib, How To Loop?

So I have this in Matplotlib. plt.scatter(X[: , 0:1][Y == 0], X[: , 2:3][Y==0]) plt.scatter(X[: , 0:1][Y == 1], X[: , 2:3][Y==1]) plt.scatter(X[: , 0:1][Y == 2], X[: , 2:3][Y==2])

Solution 1:

Probably the simplest way to display your data is with a single plot containing multiple colors.

The key is to label the data more efficiently. You have the right idea with np.intersect1d(Y, Y), but though clever, this not the best way to set up unique values. Instead, I recommend using np.unique. Not only will that remove the need to hard-code the argument to plt.legend, but the return_inverse argument will allow you to construct attributes directly.

A minor point is that you can index single columns with a single index, rather than a slice.

For example,

X = np.loadtxt('iris.csv', skiprows=1, delimiter=',', usecols=[0, 1, 2, 3])
Y = np.loadtxt('iris.csv', skiprows=1, delimiter=',', usecols=[4], dtype=str)

labels, indices = np.unique(Y, return_inverse=True)
scatter = plt.scatter(X[:, 0], X[:, 2], color=indices)

The array indices indexes into the three unique values in labels to get the original array back. You can therefore supply the index as a label for each element.

Constructing a legend for such a labeled dataset is something that matplotlib fully supports out of the box, as I learned from matplotlib add legend with multiple entries for a single scatter plot, which was inspired by this solution. The gist of it is that the object that plt.scatter returns has a method legend_elements which does all the work for you:

plt.legend(scatter.legend_elements()[0], labels)

legend_elements returns a tuple with two items. The first is handle to a collection of elements with distinct labels that can be used as the first argument to legend. The second is a set of default text labels based on the numerical labels you supplied. We discard these in favor of our actual text labels.

Solution 2:

You can do a much better job with the indexing by splitting the data properly.

The indexing expression X[:, 0:1][Y == n] extracts a view of the first column of X. It then applies the boolean mask Y == n to the view. Both steps can be done more concisely as a single step: X[Y == n, 0]. This is a bit inefficient since you will do this for every unique value in Y.

My other solution called for np.unique to group the labels. But np.unique works by sorting the array. We can do that ourselves:

X = np.loadtxt('iris.csv', skiprows=1, delimiter=',', usecols=[0, 1, 2, 3])
Y = np.loadtxt('iris.csv', skiprows=1, delimiter=',', usecols=[4], dtype=str)

ind = np.argsort(Y)
X = X[ind, :]
Y = Y[ind]

To find where Y changes, you can apply an operation like np.diff, but tailored to strings:

diffs = Y[:-1] != Y[1:]

The mask can be converted to split indices with np.flatnonzero:

inds = np.flatnonzero(diffs) + 1

And finally, you can split the data:

data = np.split(X, inds, axis= 0)

For good measure, you can even convert the split data into a dictionary instead of a list:

labels = np.concatenate(([Y[0]], Y[inds]))
data = dict(zip(labels, data))

You can plot with a loop, but much more efficiently now.

for label, groupin data.items():
    plt.scatter(group[:, 0], group[:, 2], label=label)
plt.legend(labels)

Post a Comment for "Matplotlib, How To Loop?"