String Substitutions Based On The Matching Object (python)
Solution 1:
Despite of Wiktor's truly pythonic answer, there's still the question why the OP's orginal algorithm wouldn't work. Basically there are 2 issues:
The call of new_content = re.sub(regex, repl_func(mobj), content)
will substitute all matches of regex
with the replacement value of the very first match.
The correct call has to be new_content = re.sub(regex, repl_func, content)
.
As documented here, repl_func
gets invoked dynamically with the current match object!
repl_func(mobj)
does some unnecessary exception handling, which can be simplified:
my_dict = {'\n': '', '+':'rep1', '*':'rep2', '/':'rep3', '-':'rep4'}
defrepl_func(mobj):
global my_dict
return my_dict.get(mobj.group(0), '')
This is equivalent to Wiktor's solution - he just got rid of the function definition itself by using a lambda expression.
With this modification, the for mobj in re.finditer(regex, content):
loop has become superfluos, as it does the same calculation multiple times.
Just for the sake of completeness here is a working solution using re.finditer()
. It builds the result string from the matched slices of content
:
my_regx = r'[\n+*/-]'
my_dict = {'\n': '', '+':'rep1' , '*':'rep2', '/':'rep3', '-':'rep4'}
content = "A*B+C-D/E"
res = ""
cbeg = 0for mobj in re.finditer(my_regx, content):
# get matched string and its slice indexes
mstr = mobj.group(0)
mbeg = mobj.start()
mend = mobj.end()
# replace matched string
mrep = my_dict.get(mstr, '')
# append non-matched part of content plus replacement
res += content[cbeg:mbeg] + mrep
# set new start index of remaining slice
cbeg = mend
# finally add remaining non-matched slice
res += content[cbeg:]
print (res)
Solution 2:
The r'[+\-*/]'
regex does not match a newline, so your '\n': 'rep2'
would not be used. Else, add \n
to the regex: r'[\n+*/-]'
.
Next, you get None
because your regex does not contain any named capturing groups, see re
docs:
match.lastgroup
The name of the last matched capturing group, orNone
if the group didn’t have a name, or if no group was matched at all.
To replace using the match, you do not even need to use re.finditer
, use re.sub
with a lambda as the replacement:
import re
content = '''
Blah - blah \n blah * blah + blah.
'''
regex = r'[\n+*/-]'
my_dict = { '+': 'rep1', '\n': 'rep2'}
new_content = re.sub(regex, lambda m: my_dict.get(m.group(),""), content)
print(new_content)
# => rep2Blah blah rep2 blah blah rep1 blah.rep2
See the Python demo
The m.group()
gets the whole match (the whole match is stored in match.group(0)
). If you had a pair of unescaped parentheses in the pattern, it would create a capturing group and you could access the first one with m.group(1)
, etc.
Post a Comment for "String Substitutions Based On The Matching Object (python)"