Trying To Scrape Multiple Urls From One Page

November 21, 2023 Post a Comment

I am trying to scrape the info from the election results in 18 NI constituencies here: http://www.eoni.org.uk/Elections/Election-results-and-statistics/Election-results-and-statist

Solution 1:

This seems to do the job.

I've removed some unnecessary imports and stuff that's not needed here, just readd them if you need them elsewhere of course.

The error message was due to triyng to do a regex comparison on a soup object, it needs to be cast to string (same problem as discussed in the link @Huzefa posted, so that was definitely relevant).

Fixing that still left the issue of trying to isolate the correct strings. I've simplified the regex for matching, then use a simple string split on " and selecting the second object resulting from the split (which is our url)

import requests
from bs4 import BeautifulSoup
importreurl='http://www.eoni.org.uk/Elections/Election-results-and-statistics/Election-results-and-statistics-2003-onwards/Elections-2019/UK-Parliamentary-Election-2019-Results'
response = requests.get(url)
text = requests.get(url).textsoup= BeautifulSoup(text, "html.parser")
re_pattern = "<a href=\".*/Elections-2019/.*"
link_list = []
for a in soup('a'):
    if a.has_attr('href') and re.match(re_pattern, str(a)):
        link_list.append((str(a).split('"')[1]))

Hope it fits your purpose, ask if anything is unclear.

Getting Started with Python

Trying To Scrape Multiple Urls From One Page

Solution 1:

Post a Comment for "Trying To Scrape Multiple Urls From One Page"