Skip to content Skip to sidebar Skip to footer

How To Find A Source When A Website Uses Javascript

What I want to achieve I am trying to scrape the website below using Beautiful-soup and when I load the page it does not give the table that shows various quotes. In my previous po

Solution 1:

Lots of pages in the web uses JS to change the page. These changes are not visible to Beautiful-soup because it doesn't execute JS. I can think of two options:

  • You could use tools like Selenium that actually runs a full fledged browser with JS.
  • You could open the website in Chrome or Firefox, open web inspector than refresh the page. Watch for XHR requests in network tab, you may find the request that brings the data you are looking for. If you found it you could directly load that page instead of the main page.

Solution 2:

Is there any way you could run a Python Web Client that actually executes the javascript on the page and then you can scrape the results?

Solution 3:

If you are not comfortable with selenium use PyQt:

"""
Install PyQt on Ubuntu:
    sudo apt-get install python3-pyqt5
    sudo apt-get install python3-pyqt5.qtwebengine
or on other OS (64 bit versions of Python)
    pip3 install PyQt5
"""from bs4 import BeautifulSoup
import sys
from PyQt5.QtWidgets import QApplication
from PyQt5.QtCore import QUrl
from PyQt5.QtWebEngineWidgets import QWebEngineView



classRender(QWebEngineView):
    def__init__(self, url):
        self.html = None
        self.app = QApplication(sys.argv)
        QWebEngineView.__init__(self)
        self.loadFinished.connect(self._loadFinished)
        self.load(QUrl(url))
        self.app.exec_()

    def_loadFinished(self, result):
        self.page().toHtml(self.callable)

    defcallable(self, data):
        self.html = data
        self.app.quit()


url = 'https://www.cmegroup.com/trading/energy/refined-products/methanol-t2-fob-rdam-icis.html'
html_source = Render(url).html
soup = BeautifulSoup(html_source, 'html.parser')
table = soup.find('table', {'id': 'quotesFuturesProductTable1'})
for tr in table.find_all('tr'):
    print(tr.get_text(" ", strip=True))

Outputs:

MonthChartsLastChangePriorSettleOpenHighLowVolumeHi/LowLimitUpdatedNOV2018 ShowPriceChart--357.00---0NoLimit/NoLimit18:01:39CT31Oct2018DEC2018 ShowPriceChart--357.00---0NoLimit/NoLimit18:01:39CT31Oct2018JAN2019 ShowPriceChart--345.00---0NoLimit/NoLimit18:01:39CT31Oct2018FEB2019 ShowPriceChart--345.00---0NoLimit/NoLimit18:01:36CT31Oct2018MAR2019 ShowPriceChart--342.00---0NoLimit/NoLimit18:02:29CT31Oct2018APR2019 ShowPriceChart--339.00---0NoLimit/NoLimit18:01:47CT31Oct2018MAY2019 ShowPriceChart--334.00---0NoLimit/NoLimit18:03:23CT31Oct2018JUN2019 ShowPriceChart--334.00---0NoLimit/NoLimit18:01:53CT31Oct2018JUL2019 ShowPriceChart--337.00---0NoLimit/NoLimit16:45:00CT31Oct2018AUG2019 ShowPriceChart--337.00---0NoLimit/NoLimit16:45:00CT31Oct2018SEP2019 ShowPriceChart--335.00---0NoLimit/NoLimit16:45:00CT31Oct2018OCT2019 ShowPriceChart--335.00---0NoLimit/NoLimit16:45:00CT31Oct2018NOV2019 ShowPriceChart--335.00---0NoLimit/NoLimit16:45:00CT31Oct2018DEC2019 ShowPriceChart--335.00---0NoLimit/NoLimit16:45:00CT31Oct2018

Some warnings are also sent to standard error.

Post a Comment for "How To Find A Source When A Website Uses Javascript"