Trying To Print Href Tags From A Site And Getting Weird Results

December 01, 2023 Post a Comment

I'm trying to print HREF tags of the link below. Here's my first attempt. # the Python 3 version: from bs4 import BeautifulSoup import urllib.request resp = urllib.request.urlopen

Solution 1:

LinkedIn loads in data asynchronously, if we actually view-source (Ctrl + U on Windows) on that URL you're fetching, you won't find your expected results, because Javascript is loading them after the page has already loaded with the base information.

BeautifulSoup won't execute the Javascript on the page that fetches that data.

To solve this, one would actually figure out the API functions and have your script call those.

https://www.linkedin.com/voyager/api/search/filters?filters=List()&keywords=tim%20morgan&q=universalAll&queryContext=List(primaryHitType-%3EPEOPLE)

Except adjusting your call to pass the CSRF check. Or actually utilizing their API.

Solution 2:

I tested some Selenium code which seems to do the trick.

from selenium import webdriver
from selenium.webdriver.common.keys import Keys


driver = webdriver.Firefox(executable_path=r'C:\files\geckodriver.exe')
driver.set_page_load_timeout(30)
driver.get("https://www.google.com/")

driver.get("https://www.linkedin.com/search/results/all/?keywords=tim%20morgan&origin=GLOBAL_SEARCH_HEADER")


continue_link = driver.find_element_by_tag_name('a')
elems = driver.find_elements_by_xpath("//a[@href]")
for elem in elems:
    print(elem.get_attribute("href"))

Getting Started with Python

Trying To Print Href Tags From A Site And Getting Weird Results

Solution 1:

Solution 2:

Post a Comment for "Trying To Print Href Tags From A Site And Getting Weird Results"