Skip to content Skip to sidebar Skip to footer

Python Web Scraping List From Webpage To Text File

I took a Python class my junior year of college but have forgotten a lot. For work I was asked to try to find a way to web scrape some date from a website. I have a python file tha

Solution 1:

I used selenium to be able to navigate pages.

Code:

import io
import time
from selenium import webdriver
from bs4 import BeautifulSoup
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC

# Selenium Intializations
driver = webdriver.Chrome()
driver.get('https://www.powertoolreplacementparts.com/briggs-stratton-part-finder/#/s/BRG//498260/1/y')
wait = WebDriverWait(driver, 30)
driver.maximize_window()

# Locating the "Where Used" Button
driver.find_element_by_xpath("//input[@id='aripartsSearch_whereUsedBtn_0'][@class='ariPartListWhereUsed ariImageOverride'][@title='Where Used']").click()
wait.until(EC.visibility_of_element_located((By.XPATH, '//*[@id="ari_searchResults_Grid"]/ul')))


# Intializing BS4 and looking for the "Show More" Button
soup = BeautifulSoup(driver.page_source, "html.parser")
show = soup.find('li', {'class': 'ari-search-showMore'})

# Keep clicking the "Show More" Button until it is not visible anymorewhilenot show isNone:
    time.sleep(2)
    hidden_element = driver.find_element_by_css_selector('#ari-showMore-unhide')
    if hidden_element.is_displayed():
        print("Element found")
        driver.find_element_by_css_selector('#ari-showMore-unhide').click()
        show = soup.find('li', {'class': 'ari-search-showMore'})
    else:
        print("Element not found")
        break# Write the data parsed to the text file "data.txt"with io.open("data.txt", "w", encoding="utf-8") as f:
    rows = soup.findAll('li', {'class': 'ari-ModelByPrompt'})
    for row in rows:
        part = str(row.text).replace(" ", "").replace("\n", "")
        print(part)
        f.write(part + ",")

Output:

Element found
Element found
Element found
Element not found
093412-0011-01
093412-0011-02
093412-0015-01
093412-0039-01
093412-0060-01
093412-0136-01
093412-0136-02
093412-0139-01
093412-0150-01
093412-0153-01
093412-0154-01
093412-0169-01
093412-0169-02
093412-0172-01
093412-0174-01
093412-0315-A1
093412-0339-A1
093412-0360-A1
093412-0636-A1
093412-0669-A1
093412-1015-E1
093412-1039-E1
093412-1060-E1
093412-1236-E1
093412-1236-E2
093412-1253-E1
093412-1254-E1
093412-1269-E1
093412-1274-E1
093412-1278-E1
093432-0035-01
093432-0035-02
093432-0035-03
093432-0036-01
093432-0036-03
093432-0036-04
093432-0037-01
093432-0038-01
093432-0038-03
093432-0041-01
093432-0140-01
093432-0145-01
093432-0149-01
093432-0152-01
093432-0157-01
093432-0158-01
093432-0160-01
093432-0192-B1
093432-0335-A1
093432-0336-A1
093432-0337-A1
093432-0338-A1
093432-1035-B1
093432-1035-E1
093432-1035-E2
093432-1035-E4
093432-1036-B1
093432-1036-E1
093432-1037-E1
093432-1038-B1
093432-1038-E1
093432-1240-B1
093432-1240-E1
093432-1257-E1
093432-1258-E1
093432-1280-B1
093432-1280-E1
093432-1281-B1
093432-1281-E1
093432-1282-B1
093432-1282-E1
093432-1286-B1
093452-0049-01
093452-0141-01
093452-0168-01
093452-0349-A1
093452-1049-B1
093452-1049-E1
093452-1049-E5
093452-1241-E1
093452-1242-E1
093452-1277-E1
093452-1283-B1
093452-1283-E1
09A412-0267-E1
09A413-0201-E1
09A413-0202-E1
09A413-0202-E2
09A413-0202-E3
09A413-0203-E1
09A413-0522-E1
09K432-0022-01
09K432-0023-01
09K432-0024-01
09K432-0115-01
09K432-0116-01
09K432-0116-02
09K432-0117-01
09K432-0118-01
120502-0015-E1

Content of the file:

093412-0011-01,093412-0011-02,093412-0015-01,093412-0039-01,093412-0060-01,093412-0136-01,093412-0136-02,093412-0139-01,093412-0150-01,093412-0153-01,093412-0154-01,093412-0169-01,093412-0169-02,093412-0172-01,093412-0174-01,093412-0315-A1,093412-0339-A1,093412-0360-A1,093412-0636-A1,093412-0669-A1,093412-1015-E1,093412-1039-E1,093412-1060-E1,093412-1236-E1,093412-1236-E2,093412-1253-E1,093412-1254-E1,093412-1269-E1,093412-1274-E1,093412-1278-E1,093432-0035-01,093432-0035-02,093432-0035-03,093432-0036-01,093432-0036-03,093432-0036-04,093432-0037-01,093432-0038-01,093432-0038-03,093432-0041-01,093432-0140-01,093432-0145-01,093432-0149-01,093432-0152-01,093432-0157-01,093432-0158-01,093432-0160-01,093432-0192-B1,093432-0335-A1,093432-0336-A1,093432-0337-A1,093432-0338-A1,093432-1035-B1,093432-1035-E1,093432-1035-E2,093432-1035-E4,093432-1036-B1,093432-1036-E1,093432-1037-E1,093432-1038-B1,093432-1038-E1,093432-1240-B1,093432-1240-E1,093432-1257-E1,093432-1258-E1,093432-1280-B1,093432-1280-E1,093432-1281-B1,093432-1281-E1,093432-1282-B1,093432-1282-E1,093432-1286-B1,093452-0049-01,093452-0141-01,093452-0168-01,093452-0349-A1,093452-1049-B1,093452-1049-E1,093452-1049-E5,093452-1241-E1,093452-1242-E1,093452-1277-E1,093452-1283-B1,093452-1283-E1,09A412-0267-E1,09A413-0201-E1,09A413-0202-E1,09A413-0202-E2,09A413-0202-E3,09A413-0203-E1,09A413-0522-E1,09K432-0022-01,09K432-0023-01,09K432-0024-01,09K432-0115-01,09K432-0116-01,09K432-0116-02,09K432-0117-01,09K432-0118-01,120502-0015-E1,

Solution 2:

1) Open chrome to https://www.powertoolreplacementparts.com/briggs-stratton-part-finder/#/s/BRG//498260/1/y

2) open network tab

3) click on "Where used"

4) See API call to endpoint 'GetModelSearchModelsForPrompt'

5) Copy url https://partstream.arinet.com/Search/GetModelSearchModelsForPrompt?cb=jsonp1506134982932&arib=BRG&arisku=498260&modelName=&responsive=true&arik=AjydG6MJi4Y9noWP0hFB&aril=en-US&ariv=https%253A%252F%252Fwww.powertoolreplacementparts.com%252Fbriggs-stratton-part-finder%252F

6) Open that with requests, you will need some clever thinking to parse that because they are returning HTML in "JSON".

Post a Comment for "Python Web Scraping List From Webpage To Text File"