Trying To Scrape Table Using Pandas From Selenium's Result
Solution 1:
You can get the table using the following code
import time
from selenium import webdriver
import pandas as pd
chrome_path = r"Path to chrome driver"
driver = webdriver.Chrome(chrome_path)
url = 'http://www.bursamalaysia.com/market/securities/equities/prices/#/?filter=BS02'
page = driver.get(url)
time.sleep(2)
df = pd.read_html(driver.page_source)[0]
print(df.head())
This is the output
No Code Name Rem Last Done LACP Chg % Chg Vol ('00) Buy Vol ('00) Buy Sell Sell Vol ('00) High Low
015284CB LCTITAN-CB s 0.0250.0200.005 +25.00406550198780.0200.0251066300.0250.015121201 SUMATEC [S] s 0.0500.050 - - 389354438150.0500.0551873010.0550.050235284 LCTITAN [S] s 4.4704.700 -0.230 -4.893673354304.4704.480344.7804.140340176 KRONO [S] - 0.8750.8050.070 +8.7030047337700.8700.8757970.9000.775455284CE LCTITAN-CE s 0.1300.135 -0.005 -3.7029237972140.1250.130500.1550.100
To get data from all pages you can crawl the remaining pages and use df.append
Solution 2:
Answer:
df = pd.read_html(target[0].get_attribute('outerHTML'))
Result:
Reason for target[0]
:
driver.find_elements_by_id('bm_equities_prices_table')
returns a list of selenium webelements, in your case, there's only 1 element, hence [0]
Reason for get_attribute('outerHTML')
:
we want to get the 'html' of the element. There are 2 types of such get_attribute methods
: 'innerHTML'
vs 'outerHTML'
. We chose the 'outerHTML'
becasue we need to include the current element, where the table headers are, I suppose, instead of only the inner contents of the element.
Reason for df[0]
pd.read_html()
returns a list of data frames, the first of which is the result we want, hence [0]
.
Post a Comment for "Trying To Scrape Table Using Pandas From Selenium's Result"