Find All Binary Splits Of A Nominal Attribute
Solution 1:
Just for reference, your binary splits are also known as partitions with exactly 2 parts. Each 2-partition is fully determined by a subset (the other half of the partition is the complement of the subset), hence the relationship to combinations.
In fact, if you generate the powerset of your string in shortlex order, you can essentially fold the powerset in half to produce the desired partitions.
import itertools
defbsplit(chars):
"Returns a list of all unique 2-partitions."assertlen(chars) >= 2# first, we generate the powerset in shortlex order,# minus the empty set and its complement
subsets = (itertools.combinations(chars, k) for k inrange(1, len(chars)))
subsets = itertools.chain.from_iterable(subsets)
subsets = [''.join(sub) for sub in subsets]
# then, we "fold" the set in half--pairing each subset # in the first half with its complement from the second half
half = len(subsets) // 2returnlist(zip(subsets[:half], reversed(subsets[half:])))
deftest(*strings):
for string in strings:
for left, right in bsplit(string):
print(left, right)
print()
test('ab', 'abc', 'abcd', 'abcde')
This also shows that there are (2^n - 2) / 2) = 2^(n - 1) - 1)
partitions of size 2 for a set of size n
.
Obviously, you can't use this for large sequences because it needs to materialize (almost) the whole powerset all at once. Although, it does suggest an efficient solution that avoids generating duplicates: enumerate the first 2^(n - 1) - 1)
non-empty elements of the ordered powerset, and map each subset to its corresponding partition.
Solution 2:
I would use itertools.product
to write a function that splits a sequence into all possible divisions of two halves. I'd iterate through that and remove duplicates using a set.
import itertools
defbinary_splits(seq):
for result_indices in itertools.product((0,1), repeat=len(seq)):
result = ([], [])
for seq_index, result_index inenumerate(result_indices):
result[result_index].append(seq[seq_index])
#skip results where one of the sides is emptyifnot result[0] ornot result[1]: continue#convert from list to tuple so we can hash it lateryieldmap(tuple, result)
defbinary_splits_no_dupes(seq):
seen = set()
for item in binary_splits(seq):
key = tuple(sorted(item))
if key in seen: continueyield key
seen.add(key)
for left, right in binary_splits_no_dupes("abcd"):
print left, right
Result:
('a', 'b', 'c') ('d',)
('a', 'b', 'd') ('c',)
('a', 'b') ('c', 'd')
('a', 'c', 'd') ('b',)
('a', 'c') ('b', 'd')
('a', 'd') ('b', 'c')
('a',) ('b', 'c', 'd')
Post a Comment for "Find All Binary Splits Of A Nominal Attribute"