Skip to content Skip to sidebar Skip to footer

String Substitutions Based On The Matching Object (python)

I struggle to understand the group method in Python's regular expressions library. In this context, I try to do substitutions on a string depending on the matching object. That is,

Solution 1:

Despite of Wiktor's truly pythonic answer, there's still the question why the OP's orginal algorithm wouldn't work. Basically there are 2 issues:

The call of new_content = re.sub(regex, repl_func(mobj), content) will substitute all matches of regex with the replacement value of the very first match.

The correct call has to be new_content = re.sub(regex, repl_func, content). As documented here, repl_func gets invoked dynamically with the current match object!

repl_func(mobj) does some unnecessary exception handling, which can be simplified:

my_dict = {'\n': '', '+':'rep1', '*':'rep2', '/':'rep3', '-':'rep4'}
defrepl_func(mobj):
    global my_dict
    return my_dict.get(mobj.group(0), '')

This is equivalent to Wiktor's solution - he just got rid of the function definition itself by using a lambda expression.

With this modification, the for mobj in re.finditer(regex, content): loop has become superfluos, as it does the same calculation multiple times.

Just for the sake of completeness here is a working solution using re.finditer(). It builds the result string from the matched slices of content:

my_regx = r'[\n+*/-]'
my_dict = {'\n': '', '+':'rep1'     , '*':'rep2', '/':'rep3', '-':'rep4'}
content = "A*B+C-D/E"
res = ""
cbeg = 0for mobj in re.finditer(my_regx, content):
    # get matched string and its slice indexes
    mstr = mobj.group(0)
    mbeg = mobj.start()
    mend = mobj.end()

    # replace matched string
    mrep = my_dict.get(mstr, '')

    # append non-matched part of content plus replacement
    res += content[cbeg:mbeg] + mrep

    # set new start index of remaining slice
    cbeg = mend

# finally add remaining non-matched slice
res += content[cbeg:]
print (res)

Solution 2:

The r'[+\-*/]' regex does not match a newline, so your '\n': 'rep2' would not be used. Else, add \n to the regex: r'[\n+*/-]'.

Next, you get None because your regex does not contain any named capturing groups, see re docs:

match.lastgroup The name of the last matched capturing group, or None if the group didn’t have a name, or if no group was matched at all.

To replace using the match, you do not even need to use re.finditer, use re.sub with a lambda as the replacement:

import re
content = '''
Blah - blah \n blah * blah + blah.
'''

regex = r'[\n+*/-]'
my_dict = { '+': 'rep1', '\n': 'rep2'}
new_content = re.sub(regex, lambda m: my_dict.get(m.group(),""), content)
print(new_content)
# => rep2Blah  blah rep2 blah  blah rep1 blah.rep2

See the Python demo

The m.group() gets the whole match (the whole match is stored in match.group(0)). If you had a pair of unescaped parentheses in the pattern, it would create a capturing group and you could access the first one with m.group(1), etc.

Post a Comment for "String Substitutions Based On The Matching Object (python)"