Python Split String By Spaces Except When In Quotes, But Keep The Quotes
Solution 1:
To treat string, the basic way is the regular expression tool ( module re
)
Given the infos you give (this mean they may be unsufficient) the following code does the job:
import re
r = re.compile('(?! )[^[]+?(?= *\[)''|''\[.+?\]')
s1 = "Quantity [*,'EXTRA 05',*] [*,'EXTRA 09',*]"print r.findall(s1)
print'---------------'
s2 = "'zug hug'Quantity boondoggle 'fish face monkey "\
"dung' [*,'EXTRA 05',*] [*,'EXTRA 09',*]"print r.findall(s2)
result
['Quantity', "[*,'EXTRA 05',*]", "[*,'EXTRA 09',*]"]
---------------
["'zug hug'Quantity boondoggle 'fish face monkey dung'", "[*,'EXTRA 05',*]", "[*,'EXTRA 09',*]"]
The regular expression pattern must be undesrtood as follows:
'|'
means OR
So the regex pattern expresses two partial RE:
(?! )[^[]+?(?= *\[)
and
\[.+?\]
The first partial RE :
The core is [^[]+
Brackets define a set of characters. The symbol ^
being after the first bracket [
, it means that the set is defined as all the characters that aren't the ones that follow the symbol ^
.
Presently [^[]
means any character that isn't an opening bracket [ and, as there's a +
after this definition of set, [^[]+
means sequence of characters among them there is no opening bracket.
Now, there is a question mark after [^[]+
: it means that the sequence catched must stop before what is symbolized just after the question mark.
Here, what follows the ?
is (?= *\[)
which is a lookahead assertion, composed of (?=....)
that signals it is a positive lookahead assertion and of *\[
, this last part being the sequence in front of which the catched sequence must stop. *\[
means: zero,one or more blanks until the opening bracket (backslash \
needed to eliminate the meaning of [
as the opening of a set of characters).
There's also (?! )
in front of the core, it's a negative lookahead assertion: it is necessary to make this partial RE to catch only sequences beginning with a blank, so avoiding to catch successions of blanks. Remove this (?! )
and you'll see the effect.
The second partial RE :
\[.+?\]
means : the opening bracket characater [ , a sequence of characters catched by .+?
(the dot matching with any character except \n
) , this sequence must stop in front of the ending bracket character ] that is the last character to be catched.
.
EDIT
string = "Quantity [*,'EXTRA 05',*] [*,'EXTRA 09',*]"import re
print re.split(' (?=\[)',string)
result
['Quantity', "[*,'EXTRA 05',*]", "[*,'EXTRA 09',*]"]
!!
Solution 2:
Advised for picky people, the algorithm WON'T split well every string you pass through it, just strings like:
"Quantity [*,'EXTRA 05',*] [*,'EXTRA 09',*]"
"Quantity [*,'EXTRA 05',*]"
"Quantity [*,'EXTRA 05',*] [*,'EXTRA 10',*] [*,'EXTRA 07',*] [*,'EXTRA 09',*]"
string = "Quantity [*,'EXTRA 05',*] [*,'EXTRA 09',*]"
splitted_string = []
#This adds "Quantity"to the position 0of splitted_string
splitted_string.append(string.split(" ")[0])
#The for goes from1to the lenght ofstring.split(" "),increasing the x by2
#The first iteration x is1and x+1is2, the second x=3and x+1=4 etc...
#The first iteration concatenate "[*,'EXTRA"and"05',*]"in one string
#The second iteration concatenate "[*,'EXTRA"and"09',*]"in one string#If the string would be bigger, it will worksfor x in range(1,len(string.split(" ")),2):
splitted_string.append("%s %s" % (string.split(" ")[x],string.split(" ")[x+1]))
When I execute the code, splitted string at the end contains:
['Quantity', "[*,'EXTRA 05',*]", "[*,'EXTRA 09',*]"]
splitted_string[0] = 'Quantity'
splitted_string[1] = "[*,'EXTRA 05',*]"
splitted_string[2] = "[*,'EXTRA 09',*]"
I think that is exactly what you're looking for. If I'm wrong let me know, or if you need some explanation of the code. I hope it helps
Solution 3:
Assuming you want a general solution for splitting at spaces but not on space in quotations: I don't know of any Python library to do this, but there doesn't mean there isn't one.
In the absence of a known pre-rolled solution I would simply roll my own. It's relatively easy to scan a string looking for spaces and then use the Python slice functionality to divide up the string into the parts you want. To ignore spaces in quotes you can simply include a flag that switches on encountering a quote symbol to switch the space sensing on and off.
This is some code I knocked up to do this, it is not extensively tested:
def spaceSplit(string) :
last = 0
splits = []
inQuote = None
for i, letter in enumerate(string) :
if inQuote :
if (letter == inQuote) :
inQuote = None
else :
if (letter == '"' or letter == "'") :
inQuote = letter
if not inQuote and letter == ' ' :
splits.append(string[last:i])
last = i+1if last < len(string) :
splits.append(string[last:])
return splits
Solution 4:
Try this
def parseString(inputString):
output = inputString.split()
res = []
count = 0
temp = []
for word in output:
if (word.startswith('"')) and count % 2 == 0:
temp.append(word)
count += 1
elif count % 2 == 1 and not word.endswith('"'):
temp.append(word)
elif word.endswith('"'):
temp.append(word)
count += 1
tempWord = ' '.join(temp)
res.append(tempWord)
temp = []
else:
res.append(word)
print(res)
Input:
parseString('This is "a test" to your split "string with quotes"')
Output: ['This', 'is', '"a test"', 'to', 'your', 'split', '"string with quotes"']
Post a Comment for "Python Split String By Spaces Except When In Quotes, But Keep The Quotes"