Beautiful Soup Just Extract Header Of A Table

May 08, 2024 Post a Comment

I want to extract information from the table in the following website using beautiful soup in python 3.5. http://www.askapatient.com/viewrating.asp?drug=19839&name=ZOLOFT I h

Solution 1:

This is because of the broken HTML of the page. You need to switch to a more lenient parser like html5lib. Here is what works for me:

from pprint import pprint

import requests
from bs4 import BeautifulSoup

url = "http://www.askapatient.com/viewrating.asp?drug=19839&name=ZOLOFT"
response = requests.get(url, headers={'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_4) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/51.0.2704.103 Safari/537.36'})

# HTML parsing part
soup = BeautifulSoup(response.content, "html5lib")
table = soup.find("table", attrs={"class":"ratingsTable"})
comments = [[td.get_text() for td in row.find_all("td")] 
            for row in table.find_all("tr")]
pprint(comments)

Baca Juga

List Comprehension With *args
Python Beautiful Soup - Getting Input Value
Flask Import Error With Request Module

Getting Started with Python

Beautiful Soup Just Extract Header Of A Table

Solution 1:

Post a Comment for "Beautiful Soup Just Extract Header Of A Table"