How Do I Get Lines Between Same Pattern Using Python Regex
Solution 1:
Unless this is an assignment where you must use regex you should use vikramls's split()
-based solution: it's over three times as fast as Avinash Raj's regex-based solution, and that's not including the time to import the re
module.
Here are some timings done on a 2GHz Pentium 4, using Python 2.6.6.
$ timeit.py -n 100000 -s "import re;p=re.compile(r'(?<=abc).*?(?=abc)');s='abc123abcfndfabc1234drfabc'" "p.findall(s)"
100000 loops, best of 3: 6.32 usec per loop
$ timeit.py -n 100000 -s "p='abc';s='abc123abcfndfabc1234drfabc'" "s.split(p)"
100000 loops, best of 3: 2.03 usec per loop
And a variation of the above that discards the initial & final members of the list is slightly slower, but still better than twice as fast as the regex.
$ timeit.py -n 100000 -s "p='abc';s='abc123abcfndfabc1234drfabc'" "s.split(p)[1:-1]"
100000 loops, best of 3: 2.49 usec per loop
And for completeness, here's vks's regex. The "'!'"
stuff is to prevent the !
from invoking bash history expansion. (Alternatively, you can use set +o histexpand
to turn history expansion off and set -o histexpand
to turn it back on).
$ timeit.py -n 100000 -s "import re;p=re.compile(r'(?<=abc)((?:(?"'!'"abc).)+)abc');s='abc123abcfndfabc1234drfabc'" "p.findall(s)"
100000 loops, best of 3: 6.67 usec per loop
Solution 2:
Not using regex:
s= "abc123abcfndfabc1234drfabc"
print ', '.join((w for w in s.split('abc') if w))
Output:
123, fndf, 1234drf
Solution 3:
(?<=abc)((?:(?!abc).)+)abc
Try this.Grab the capture.See demo.
http://regex101.com/r/yP3iB0/17
import re
p = re.compile(ur'(?<=abc)((?:(?!abc).)+)abc')
test_str = u"abc123abcfndfabc1234drfabc"
re.findall(p, test_str)
Solution 4:
Use a positive lookahead and lookbehind assertion like below.
>>> import re
>>> s="abc123abcfndfabc1234drfabc"
>>> re.findall(r'(?<=abc).*?(?=abc)', s)
['123', 'fndf', '1234drf']
Explanation:
(?<=abc)
Positive Lookbehind which asserts that the string preceds the match must beabc
.*?
Non-greedy match of zero or more chracaters.(?=abc)
Positive lookahead which asserts that the string follows the match must beabc
Post a Comment for "How Do I Get Lines Between Same Pattern Using Python Regex"