Skip to content Skip to sidebar Skip to footer

In Python, How Do I Split A String And Keep The Separators?

Here's the simplest way to explain this. Here's what I'm using: re.split('\W', 'foo/bar spam\neggs') -> ['foo', 'bar', 'spam', 'eggs'] Here's what I want: someMethod('\W', 'foo

Solution 1:

>>> re.split('(\W)', 'foo/bar spam\neggs')
['foo', '/', 'bar', ' ', 'spam', '\n', 'eggs']

Solution 2:

If you are splitting on newline, use splitlines(True).

>>> 'line 1\nline 2\nline without newline'.splitlines(True)
['line 1\n', 'line 2\n', 'line without newline']

(Not a general solution, but adding this here in case someone comes here not realizing this method existed.)

Solution 3:

another example, split on non alpha-numeric and keep the separators

import re
a = "foo,bar@candy*ice%cream"
re.split('([^a-zA-Z0-9])',a)

output:

['foo', ',', 'bar', '@', 'candy', '*', 'ice', '%', 'cream']

explanation

re.split('([^a-zA-Z0-9])',a)

() <- keep the separators
[] <- match everything in between
^a-zA-Z0-9 <-except alphabets, upper/lowerand numbers.

Solution 4:

If you have only 1 separator, you can employ list comprehensions:

text = 'foo,bar,baz,qux'sep = ','

Appending/prepending separator:

result= [x+sep for x in text.split(sep)]
#['foo,', 'bar,', 'baz,', 'qux,']
# toget rid oftrailingresult[-1] =result[-1].strip(sep)
#['foo,', 'bar,', 'baz,', 'qux']

result= [sep+x for x in text.split(sep)]
#[',foo', ',bar', ',baz', ',qux']
# toget rid oftrailingresult[0] =result[0].strip(sep)
#['foo', ',bar', ',baz', ',qux']

Separator as it's own element:

result= [u for x in text.split(sep) for u in (x, sep)]
#['foo', ',', 'bar', ',', 'baz', ',', 'qux', ',']
results =result[:-1]   # toget rid oftrailing

Solution 5:

Another no-regex solution that works well on Python 3

# Split strings and keep separator
test_strings = ['<Hello>', 'Hi', '<Hi> <Planet>', '<', '']

defsplit_and_keep(s, sep):
   ifnot s: return [''] # consistent with string.split()# Find replacement character that is not used in string# i.e. just use the highest available character plus one# Note: This fails if ord(max(s)) = 0x10FFFF (ValueError)
   p=chr(ord(max(s))+1) 

   return s.replace(sep, sep+p).split(p)

for s in test_strings:
   print(split_and_keep(s, '<'))


# If the unicode limit is reached it will fail explicitly
unicode_max_char = chr(1114111)
ridiculous_string = '<Hello>'+unicode_max_char+'<World>'print(split_and_keep(ridiculous_string, '<'))

Post a Comment for "In Python, How Do I Split A String And Keep The Separators?"