Skip to content Skip to sidebar Skip to footer

Selenium Python Pull Data From Dynamic Table That Refreshes Every 5 Seconds

I am trying to pull data from a real time table/dashboard that refreshes every 5 seconds. Because it refreshes every 5 seconds, it gives me incomplete records[I think starting from

Solution 1:

You could just use requests and get the page, then the data would be complete.

import requests
import time

while True:
    url = "insert url here"
    page = requests.get(url)

    # Parse data

    time.sleep(5)

Solution 2:

From the comments you have a couple of approaches. As you're unable to share you're site, the best i can do is describe what you need to do and how i got your equivalent site working.

Both approaches use http://www.emojitracker.com/ as an example site.

Approach 1 - get your data at the network layer:

  • Go to your site in chrome.
  • Open devtools
  • Go to the network tab
  • Find the call that pull down your data - you're looking for the GET

For the example site provided, i can see i have an entry called rankings like so: devtool networking

The HEADERS tab describes the data you need. For this site there's no auth, there's nothing special and i don't need to send any payload. It's just the API and method that is needed:

Request URL: http://www.emojitracker.com/api/rankings
Request Method: GET

Couldn't be simpler to throw that into pyhton:

import requests

response = requests.get("http://www.emojitracker.com/api/rankings")
data = response.json()
for line in data:
    print(line['id'])
    print(line['score'])

That prints out the score and the ID from the json response. This is how we look when debugging: debugging in vscode


Approach 2 - Hacking the javascript

  • Go to the site, let the page load
  • go to devtools
  • go to the console
  • select the source tab and pause the javascript (top right corner) - pay attention to where the cursor stops. Restart and pause a few times and note the different functions involved. Also look at what they do the discern other functions involved.

When you're ready - go to the console tab and type this.stop(). On the site you provided, this stops the update-calls.

This should give you the time you need to get your data.

From here, you have two choices to get your data going again.

  1. The simplest way is to just refresh the page. This will restart the page with new, streaming data. Do this with:
driver.refresh()
  1. The more fun way, read the js and figure out how to restart the stream! Use the console's intellisense to help you.

Reviewing the JS, where it paused (from steps above), and a bit trial and error I found:

this.startRawScoreStreaming()

It does this output

application.js:90Subscribing to score stream (raw)
ƒ (event) {
      return incrementScore(event.data);
    }

And the page start streaming again.

Finally, to run these JS snippets in selenium - you use .execute_script

driver.execute_script('this.stop()')
## do your stuff
driver.execute_script('this.startRawScoreStreaming()')

Post a Comment for "Selenium Python Pull Data From Dynamic Table That Refreshes Every 5 Seconds"