Python Regex Can't Find Substring But It Should

May 08, 2024 Post a Comment

I am trying to parse html using BeautifulSoup to try and extract the webpage title. Sometimes this does not work due to the website being badly written, such as Bad End tag. When

Solution 1:

You should use the dotall flag to make the . match newline characters as well.

result = re.search('\<title\>(.+?)\</title\>', html, re.DOTALL)

As the documentation says:

...without this flag, '.' will match anything except a newline

Solution 2:

If you want to grab the test between the <title> and <\title> tags you should use this regexp:

Baca Juga

Flask Import Error With Request Module
Using Pg_restore On Dump File
What Is Pythononic Way Of Slicing A Set?

pattern= "<title>([^<]+)</title>"

re.findall(pattern, html_string)

Getting Started with Python

Python Regex Can't Find Substring But It Should

Solution 1:

Solution 2:

Post a Comment for "Python Regex Can't Find Substring But It Should"