Skip to content Skip to sidebar Skip to footer

Why Can't The Underscore Be Matched By '\w'?

I know that _ cannot be matched by \W while any other punctuation can. As the docs state: \w is a set of alphanumeric characters and the underscore. At the same time: I have alway

Solution 1:

Lots of Python's regular expression syntax in the module re comes from Perl, which was influenced by sed and awk. The \w comes from there and has a long history.


In the original regex module (which was deprecated in Python 1.5), \w did not include _, as is evident from Python 1.4 documentation:

\w

Matches any alphanumeric character; this is equivalent to the set [a-zA-Z0-9].


P.S. While it is not very convenient can match all non-\w + _ with a character class [\W_].

Post a Comment for "Why Can't The Underscore Be Matched By '\w'?"