Skip to content Skip to sidebar Skip to footer

Find All Paths From Leafs To Each Node In A Forest

I asked this question in parts, because I didn't have enough infromations, but now that I have, I can ask the full question. So I have data in text file which has 2 columns. First

Solution 1:

I would suggest changing all_paths to leaf_paths, meaning that it would only yield those paths that start at a leaf.

Then use those paths to:

  • Identify the root it leads to (the last element in the path)
  • Identify the leaf (the first element in the path)
  • Iterate all non-leaves in that path and combine each of them in a pair with the leaf.
  • Store these pairs in a dictionary that is keyed by the root

Here is how you would alter all_paths at two places marked with a comment:

defleaf_paths(adj):
    defrecur(path):
        node = path[-1]
        neighbors = [neighbor for neighbor in adj[node] ifnot neighbor in path]
        ifnot neighbors:
            yield path
        for neighbor in neighbors:
            yieldfrom recur(path + [neighbor])

    # identify the internal nodes (not leaves)
    internals = set(parent for parents in adj.values() for parent in parents) 
    for node in adj:
        ifnot node in internals:  # require that the starting node is a leafyieldfrom recur([node])

Then add this function:

def all_leaf_pairs(paths):
    trees = {}

    forpathin paths:
        iflen(path) > 1:
            root = path[-1]
            ifnot root in trees:
                trees[root] = []
            it = iter(path)
            leaf = next(it)
            trees[root].extend((leaf, node) for node in it)
    return trees

And your main program would do:

data = [
    ["ANALYTICAL_BALANCE","BFG_DEPOSIT"],
    ["CUSTOMER_DETAIL","BALANCE"],
    ["BFG_2056", "FFD_15"],
    ["BALANCE","BFG_16"],
    ["BFG_16","STAT_HIST"],
    ["ANALYTICAL_BALANCE","BFG_2056"],
    ["CUSTOM_DATA","AND_11"],
    ["AND_11","DICT_DEAL"],
    ["DICT_DEAL","BFG_2056"]
]

adj = create_adj(data)
paths = leaf_paths(adj)

import pprint
pprint.pprint(all_leaf_pairs(paths))

This will output:

{'BFG_DEPOSIT': [('ANALYTICAL_BALANCE', 'BFG_DEPOSIT')],
 'FFD_15': [('ANALYTICAL_BALANCE', 'BFG_2056'),
            ('ANALYTICAL_BALANCE', 'FFD_15'),
            ('CUSTOM_DATA', 'AND_11'),
            ('CUSTOM_DATA', 'DICT_DEAL'),
            ('CUSTOM_DATA', 'BFG_2056'),
            ('CUSTOM_DATA', 'FFD_15')],
 'STAT_HIST': [('CUSTOMER_DETAIL', 'BALANCE'),
               ('CUSTOMER_DETAIL', 'BFG_16'),
               ('CUSTOMER_DETAIL', 'STAT_HIST')]}

Explanation of leaf_paths

This function uses recursion. It defines recur in its scope.

But the main code starts with identifying the internal nodes (i.e. the nodes that have at least one child). Since adj provides the parent(s) for a given node, we just have to collect all those parents.

We use this set of internal nodes to make sure we start the recursion only on leaf nodes, as in the output we want to have paths that always start out with a leaf node.

The recur function will walk from the given leaf towards any root it can find. It extends the path with the next parent it can find (neighbor) and performs recursion until there is no more parent (i.e., it is a root). When that happens the accumulated path (that starts with a leaf and ends with a root) is yielded.

leaf_paths itself yields any path that recur yields.

Post a Comment for "Find All Paths From Leafs To Each Node In A Forest"