In Python, How Do I Split A String And Keep The Separators?
Here's the simplest way to explain this. Here's what I'm using: re.split('\W', 'foo/bar spam\neggs') -> ['foo', 'bar', 'spam', 'eggs'] Here's what I want: someMethod('\W', 'foo
Solution 1:
>>> re.split('(\W)', 'foo/bar spam\neggs')
['foo', '/', 'bar', ' ', 'spam', '\n', 'eggs']
Solution 2:
If you are splitting on newline, use splitlines(True)
.
>>> 'line 1\nline 2\nline without newline'.splitlines(True)
['line 1\n', 'line 2\n', 'line without newline']
(Not a general solution, but adding this here in case someone comes here not realizing this method existed.)
Solution 3:
another example, split on non alpha-numeric and keep the separators
import re
a = "foo,bar@candy*ice%cream"
re.split('([^a-zA-Z0-9])',a)
output:
['foo', ',', 'bar', '@', 'candy', '*', 'ice', '%', 'cream']
explanation
re.split('([^a-zA-Z0-9])',a)
() <- keep the separators
[] <- match everything in between
^a-zA-Z0-9 <-except alphabets, upper/lowerand numbers.
Solution 4:
If you have only 1 separator, you can employ list comprehensions:
text = 'foo,bar,baz,qux'sep = ','
Appending/prepending separator:
result= [x+sep for x in text.split(sep)]
#['foo,', 'bar,', 'baz,', 'qux,']
# toget rid oftrailingresult[-1] =result[-1].strip(sep)
#['foo,', 'bar,', 'baz,', 'qux']
result= [sep+x for x in text.split(sep)]
#[',foo', ',bar', ',baz', ',qux']
# toget rid oftrailingresult[0] =result[0].strip(sep)
#['foo', ',bar', ',baz', ',qux']
Separator as it's own element:
result= [u for x in text.split(sep) for u in (x, sep)]
#['foo', ',', 'bar', ',', 'baz', ',', 'qux', ',']
results =result[:-1] # toget rid oftrailing
Solution 5:
Another no-regex solution that works well on Python 3
# Split strings and keep separator
test_strings = ['<Hello>', 'Hi', '<Hi> <Planet>', '<', '']
defsplit_and_keep(s, sep):
ifnot s: return [''] # consistent with string.split()# Find replacement character that is not used in string# i.e. just use the highest available character plus one# Note: This fails if ord(max(s)) = 0x10FFFF (ValueError)
p=chr(ord(max(s))+1)
return s.replace(sep, sep+p).split(p)
for s in test_strings:
print(split_and_keep(s, '<'))
# If the unicode limit is reached it will fail explicitly
unicode_max_char = chr(1114111)
ridiculous_string = '<Hello>'+unicode_max_char+'<World>'print(split_and_keep(ridiculous_string, '<'))
Post a Comment for "In Python, How Do I Split A String And Keep The Separators?"