Skip to content Skip to sidebar Skip to footer

Most Pythonic Way To Split An Array By Repeating Elements

I have a list of items that I want to split based on a delimiter. I want all delimiters to be removed and the list to be split when a delimiter occurs twice. For example, if the d

Solution 1:

I don't think there's going to be a nice, elegant solution to this (I'd love to be proven wrong of course) so I would suggest something straightforward:

def nSplit(lst, delim, count=2):
    output = [[]]
    delimCount = 0for item in lst:
        if item == delim:
            delimCount += 1
        elif delimCount >= count:
            output.append([item])
            delimCount = 0else:
            output[-1].append(item)
            delimCount = 0returnoutput

 

>>> nSplit(['a', 'b', 'X', 'X', 'c', 'd', 'X', 'X', 'f', 'X', 'g'], 'X', 2)
[['a', 'b'], ['c', 'd'], ['f', 'g']]

Solution 2:

Here's a way to do it with itertools.groupby():

import itertools

classMultiDelimiterKeyCallable(object):
    def__init__(self, delimiter, num_wanted=1):
        self.delimiter = delimiter
        self.num_wanted = num_wanted

        self.num_found = 0def__call__(self, value):
        if value == self.delimiter:
            self.num_found += 1if self.num_found >= self.num_wanted:
                self.num_found = 0returnTrueelse:
            self.num_found = 0defsplit_multi_delimiter(items, delimiter, num_wanted):
    keyfunc = MultiDelimiterKeyCallable(delimiter, num_wanted)

    return (list(item
                 for item in group
                 if item != delimiter)
            for key, group in itertools.groupby(items, keyfunc)
            ifnot key)

items = ['a', 'b', 'X', 'X', 'c', 'd', 'X', 'X', 'f', 'X', 'g']

printlist(split_multi_delimiter(items, "X", 2))

I must say that cobbal's solution is much simpler for the same results.

Solution 3:

Use a generator function to maintain state of your iterator through the list, and the count of the number of separator chars seen so far:

l = ['a', 'b', 'X', 'X', 'c', 'd', 'X', 'X', 'f', 'X', 'g'] 

def splitOn(ll, x, n):
    cur = []
    splitcount = 0for c in ll:
        ifc== x:
            splitcount += 1ifsplitcount== n:
                yield curcur= []
                splitcount = 0else:
            cur.append(c)
            splitcount = 0
    yield cur

print list(splitOn(l, 'X', 2))
print list(splitOn(l, 'X', 1))
print list(splitOn(l, 'X', 3))

l += ['X','X']
print list(splitOn(l, 'X', 2))
print list(splitOn(l, 'X', 1))
print list(splitOn(l, 'X', 3))

prints:

[['a', 'b'], ['c', 'd'], ['f', 'g']]
[['a', 'b'], [], ['c', 'd'], [], ['f'], ['g']]
[['a', 'b', 'c', 'd', 'f', 'g']]
[['a', 'b'], ['c', 'd'], ['f', 'g'], []]
[['a', 'b'], [], ['c', 'd'], [], ['f'], ['g'], [], []]
[['a', 'b', 'c', 'd', 'f', 'g']]

EDIT: I'm also a big fan of groupby, here's my go at it:

from itertools import groupby
defsplitOn(ll, x, n):
    cur = []
    for isdelim,grp in groupby(ll, key=lambda c:c==x):
        if isdelim:
            nn = sum(1for c in grp)
            while nn >= n:
                yield cur
                cur = []
                nn -= n
        else:
            cur.extend(grp)
    yield cur

Not too different from my earlier answer, just lets groupby take care of iterating over the input list, creating groups of delimiter-matching and not-delimiter-matching characters. The non-matching characters just get added onto the current element, the matching character groups do the work of breaking up new elements. For long lists, this is probably a bit more efficient, as groupby does all its work in C, and still only iterates over the list once.

Solution 4:

a = ['a', 'b', 'X', 'X', 'c', 'd', 'X', 'X', 'f', 'X', 'g']
b = [[b for b in q if b != 'X'] for q in "".join(a).split("".join(['X' for i in range(2)]))]

this gives

[['a', 'b'], ['c', 'd'], ['f', 'g']]

where the 2 is the number of elements you want. there is most likely a better way to do this.

Solution 5:

Very ugly, but I wanted to see if I could pull this off as a one-liner and I thought I would share. I beg you not to actually use this solution for anything of any importance though. The ('X', 3) at the end is the delimiter and the number of times it should be repeated.

(lambda delim, count: map(lambda x:filter(lambda y:y != delim, x), reduce(lambda x, y: (x[-1].append(y) if y != delim or x[-1][-count+1:] != [y]*(count-1) else x.append([])) or x, ['a', 'b', 'X', 'X', 'c', 'd', 'X', 'X', 'f', 'X', 'g'], [[]])))('X', 2)

EDIT

Here's a breakdown. I also eliminated some redundant code that was far more obvious when written out like this. (changed above also)

# Wrap everything in a lambda form to avoid repeating values
(lambda delim, count:
    # Filter all sublists after constructionmap(lambda x: filter(lambda y: y != delim, x), reduce(
        lambda x, y: (
            # Add the value to the current sub-list
            x[-1].append(y) if# but only if we have accumulated the# specified number of delimiters
                y != delim or x[-1][-count+1:] != [y]*(count-1) else# Start a new sublist
                x.append([]) or x,
        ['a', 'b', 'X', 'X', 'c', 'd', 'X', 'X', 'f', 'X', 'g'], [[]])
    )
)('X', 2)

Post a Comment for "Most Pythonic Way To Split An Array By Repeating Elements"