Python Iterate Through Connected Components In Grayscale Image
Solution 1:
You can use networkx
:
from itertools import product, repeat
import numpy as np
import networkx as nx
arr = np.array(
[[255,255,255,255,255,255,255],
[255,255, 0 ,10 ,255,255, 1 ],
[255,30 ,255,255,50 ,255, 9 ],
[51 ,20 ,255,255, 9 ,255,240],
[255,255,80 ,50 ,170,255, 20],
[255,255,255,255,255,255, 0 ],
[255,255,255,255,255,255, 69]])
# generate edges
shift = list(product(*repeat([-1, 0, 1], 2)))
x_max, y_max = arr.shape
edges = []
for x, y in np.ndindex(arr.shape):
for x_delta, y_delta in shift:
x_neighb = x + x_delta
y_neighb = y + y_delta
if (0 <= x_neighb < x_max) and (0 <= y_neighb < y_max):
edge = (x, y), (x_neighb, y_neighb)
edges.append(edge)
# build graph
G = nx.from_edgelist(edges)
# draw graph
pos = {(x, y): (y, x_max-x) for x, y in G.nodes()}
nx.draw(G, with_labels=True, pos=pos, node_color='coral', node_size=1000)
# draw graph with numbers
labels = dict(np.ndenumerate(arr))
node_color = ['coral'if labels[n] == 255else'lightgrey'for n in G.nodes()]
nx.draw(G, with_labels=True, pos=pos, labels=labels, node_color=node_color, node_size=1000)
# build subgraphselect = np.argwhere(arr < 255)
G1 = G.subgraph(map(tuple, select))
# draw subgraphpos = {(x, y): (y, x_max-x) forx, y in G1.nodes()}
labels1 = {n:labels[n] for n in G1.nodes()}
nx.draw(G1, with_labels=True, pos=pos, labels=labels1, node_color='lightgrey', node_size=1000)
# find connected components and DFS trees
for i in nx.connected_components(G1):
source = next(iter(i))
idx = nx.dfs_tree(G1, source=source)
print(arr[tuple(np.array(idx).T)])
Output:
[ 0 10 50 9 50 80 20 30 51 170][ 9 1 240 20 0 69]
Solution 2:
So after so much researches for suitable implementation of connected components, I came up with my solution. In order to reach the best I can do in terms of performance, I relied on these rules:
- Not to use
networkx
because it's slow according to this benchmark - Use vectorized actions as much as possible because Python based iterations are slow according to this answer.
I'm implementing an algorithm of connected components of image here only because I believe this is an essential part of this question.
Algorithm of connected components of image
import numpy as np
import numexpr as ne
import pandas as pd
import igraph
defget_coords(arr):
x, y = np.indices(arr.shape)
mask = arr != 255return np.array([x[mask], y[mask]]).T
defcompare(r1, r2):
#assuming r1 is a sorted array, returns:# 1) locations of r2 items in r1# 2) mask array of these locations
idx = np.searchsorted(r1, r2)
idx[idx == len(r1)] = 0
mask = r1[idx] == r2
return idx, mask
defget_reduction(coords, s):
d = {'s': s, 'c0': coords[:,0], 'c1': coords[:,1]}
return ne.evaluate('c0*s+c1', d)
defget_bounds(coords, increment):
return np.max(coords[1]) + 1 + increment
defget_shift_intersections(coords, shifts):
# instance that consists of neighbours found for each node [[0,1,2],...]
s = get_bounds(coords, 10)
rdim = get_reduction(coords, s)
shift_mask, shift_idx = [], []
for sh in shifts:
sh_rdim = get_reduction(coords + sh, s)
sh_idx, sh_mask = compare(rdim, sh_rdim)
shift_idx.append(sh_idx)
shift_mask.append(sh_mask)
return np.array(shift_idx).T, np.array(shift_mask).T,
defconnected_components(coords, shifts):
shift_idx, shift_mask = get_shift_intersections(coords, shifts)
x, y = np.indices((len(shift_idx), len(shift_idx[0])))
vertices = np.arange(len(coords))
edges = np.array([x[shift_mask], shift_idx[shift_mask]]).T
graph = igraph.Graph()
graph.add_vertices(vertices)
graph.add_edges(edges)
graph_tags = graph.clusters().membership
values = pd.DataFrame(graph_tags).groupby([0]).indices
return values
coords = get_coords(arr)
shifts=((0,1),(1,0),(1,1),(-1,1))
comps = connected_components(coords, shifts=shifts)
for c in comps:
print(coords[comps[c]].tolist())
Outcome
[[1, 2], [1, 3], [2, 1], [2, 4], [3, 0], [3, 1], [3, 4], [4, 2], [4, 3], [4, 4]]
[[1, 6], [2, 6], [3, 6], [4, 6], [5, 6], [6, 6]]
Explanation
Algorithm consists of these steps:
We need to convert image to coordinates of non-white cells. It can be done using function:
def get_coords(arr): x, y = np.indices(arr.shape) mask = arr != 255return np.array([y[mask], x[mask]]).T
I'll name an outputting array by
X
for clarity. Here is an output of this array, visually:Next, we need to consider all the cells of each shift that intersects with
X
:In order to do that, we should solve a problem of intersections I posted few days before. I found it quite difficult to do using multidimensional
numpy
arrays. Thanks to Divakar, he proposes a nice way of dimensionality reduction usingnumexpr
package which fastens operations even more thannumpy
. I implement it here in this function:defget_reduction(coords, s): d = {'s': s, 'c0': coords[:,0], 'c1': coords[:,1]} return ne.evaluate('c0*s+c1', d)
In order to make it working, we should set a bound
s
which can be calculated automatically using a functiondefget_bounds(coords, increment): return np.max(coords[1]) + 1 + increment
or inputted manually. Since algorithm requires increasing coordinates, pairs of coordinates might be out of bounds, therefore I have used a slight
increment
here. Finally, as a solution to my post I mentioned here, indexes of coordinates ofX
(reduced to 1D), that intersects with any other array of coordinatesY
(also reduced to 1D) can be accessed via functiondefcompare(r1, r2): # assuming r1 is a sorted array, returns:# 1) locations of r2 items in r1# 2) mask array of these locations idx = np.searchsorted(r1, r2) idx[idx == len(r1)] = 0 mask = r1[idx] == r2 return idx, mask
Plugging all the corresponding
arrays
of shifts. As we can see, abovementioned function outputs two variables: an array of index locations in the main setX
and its mask array. A proper indexes can be found usingidx[mask]
and since this procedure is being applied for each shift, I implementedget_shift_intersections(coords, shifts)
method for this case.Final: constructing nodes & edges and taking output from
igraph
. The point here is thatigraph
performs well only with nodes that are consecutive integers starting from 0. That's why my script was designed to use mask-based access to item locations inX
. I'll explain briefly how did I useigraph
here:I have calculated coordinate pairs:
[[1, 2], [1, 3], [1, 6], [2, 1], [2, 4], [2, 6], [3, 0], [3, 1], [3, 4], [3, 6], [4, 2], [4, 3], [4, 4], [4, 6], [5, 6], [6, 6]]
Then I assigned integers for them:
[ 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15]
My edges looks like this:
[[0, 1], [1, 4], [2, 5], [3, 7], [3, 0], [4, 8], [5, 9], [6, 7], [6, 3], [7, 10], [8, 12], [9, 13], [10, 11], [11, 12], [11, 8], [13, 14], [14, 15]]
Output of
graph.clusters().membership
looks like this:[0, 0, 1, 0, 0, 1, 0, 0, 0, 1, 0, 0, 0, 1, 1, 1]
And finally, I have used
groupby
method ofPandas
to find indexes of separate groups (I use Pandas here because I found it to be the most efficient way of grouping in Python)
Notes
Download of igraph
is not straightforward, you might need to install it from unofficial binaries.
Post a Comment for "Python Iterate Through Connected Components In Grayscale Image"