How Do I Get Lines Between Same Pattern Using Python Regex

September 28, 2022 Post a Comment

I have a string 's' as follows s='abc123abcfndfabc1234drfabc' I want to grep the strings which occurs in between 'abc'. In this case the output should be: 123, fndf, 1234drf

Solution 1:

Unless this is an assignment where you must use regex you should use vikramls's split()-based solution: it's over three times as fast as Avinash Raj's regex-based solution, and that's not including the time to import the re module.

Here are some timings done on a 2GHz Pentium 4, using Python 2.6.6.

$ timeit.py -n 100000 -s "import re;p=re.compile(r'(?<=abc).*?(?=abc)');s='abc123abcfndfabc1234drfabc'" "p.findall(s)"

100000 loops, best of 3: 6.32 usec per loop

$ timeit.py -n 100000 -s "p='abc';s='abc123abcfndfabc1234drfabc'" "s.split(p)"

100000 loops, best of 3: 2.03 usec per loop

And a variation of the above that discards the initial & final members of the list is slightly slower, but still better than twice as fast as the regex.

$ timeit.py -n 100000 -s "p='abc';s='abc123abcfndfabc1234drfabc'" "s.split(p)[1:-1]"

100000 loops, best of 3: 2.49 usec per loop

And for completeness, here's vks's regex. The "'!'" stuff is to prevent the ! from invoking bash history expansion. (Alternatively, you can use set +o histexpand to turn history expansion off and set -o histexpand to turn it back on).

$ timeit.py -n 100000 -s "import re;p=re.compile(r'(?<=abc)((?:(?"'!'"abc).)+)abc');s='abc123abcfndfabc1234drfabc'" "p.findall(s)"

100000 loops, best of 3: 6.67 usec per loop

Solution 2:

Not using regex:

s= "abc123abcfndfabc1234drfabc"
print ', '.join((w for w in s.split('abc') if w))

Output:

123, fndf, 1234drf

Solution 3:

(?<=abc)((?:(?!abc).)+)abc

Try this.Grab the capture.See demo.

http://regex101.com/r/yP3iB0/17

import re
p = re.compile(ur'(?<=abc)((?:(?!abc).)+)abc')
test_str = u"abc123abcfndfabc1234drfabc"

re.findall(p, test_str)

Solution 4:

Use a positive lookahead and lookbehind assertion like below.

>>> import re
>>> s="abc123abcfndfabc1234drfabc"
>>> re.findall(r'(?<=abc).*?(?=abc)', s)
['123', 'fndf', '1234drf']

DEMO

Explanation:

(?<=abc) Positive Lookbehind which asserts that the string preceds the match must be abc
.*? Non-greedy match of zero or more chracaters.
(?=abc) Positive lookahead which asserts that the string follows the match must be abc

Getting Started with Python