Trying To Print Href Tags From A Site And Getting Weird Results
I'm trying to print HREF tags of the link below. Here's my first attempt. # the Python 3 version: from bs4 import BeautifulSoup import urllib.request resp = urllib.request.urlopen
Solution 1:
LinkedIn loads in data asynchronously, if we actually view-source (Ctrl + U on Windows) on that URL you're fetching, you won't find your expected results, because Javascript is loading them after the page has already loaded with the base information.
BeautifulSoup won't execute the Javascript on the page that fetches that data.
To solve this, one would actually figure out the API functions and have your script call those.
Except adjusting your call to pass the CSRF check. Or actually utilizing their API.
Solution 2:
I tested some Selenium code which seems to do the trick.
from selenium import webdriver
from selenium.webdriver.common.keys import Keys
driver = webdriver.Firefox(executable_path=r'C:\files\geckodriver.exe')
driver.set_page_load_timeout(30)
driver.get("https://www.google.com/")
driver.get("https://www.linkedin.com/search/results/all/?keywords=tim%20morgan&origin=GLOBAL_SEARCH_HEADER")
continue_link = driver.find_element_by_tag_name('a')
elems = driver.find_elements_by_xpath("//a[@href]")
for elem in elems:
print(elem.get_attribute("href"))
Post a Comment for "Trying To Print Href Tags From A Site And Getting Weird Results"