Skip to content Skip to sidebar Skip to footer

Scraping Javascript Data Within A Grid Of A Webpage Using Selenium And Python

My issue is that I need all the data within the grid containing subdomains from the website https://applipedia.paloaltonetworks.com - (data containing NAME , CATEGORY, SUBCATEGORY,

Solution 1:

As per the url https://applipedia.paloaltonetworks.com/ to get the list of all apps having subdomains you need to induce WebDriverWait for the desired elements to be visible and you can use the following solution:

  • Code Block:

    from selenium import webdriver
    from selenium.webdriver.chrome.options import Options
    from selenium.webdriver.support.ui import WebDriverWait
    from selenium.webdriver.common.by import By
    from selenium.webdriver.support import expected_conditions as EC
    
    options = Options()
    options.add_argument("start-maximized")
    options.add_argument("disable-infobars")
    options.add_argument("--disable-extensions")
    options.add_argument("--disable-gpu")
    driver = webdriver.Chrome(chrome_options=options, executable_path=r'C:\WebDrivers\ChromeDriver\chromedriver_win32\chromedriver.exe')
    driver.get('https://applipedia.paloaltonetworks.com/')
    elements = WebDriverWait(driver, 20).until(EC.visibility_of_all_elements_located((By.XPATH, "//table[@class='btmTable' and @id='dataTable']//tbody[@id='bodyScrollingTable']//tr[not(@ottawagroup='0') and not(@ottawagroup='2')]/td/a")))
    for element in elements:
        print(element.get_attribute("innerHTML"))
    
  • Console Output:

    DevTools listening on ws://127.0.0.1:12927/devtools/browser/d4a5d576-a4b0-4a3d-959b-9d37aff36fc6
    
                                    2ch
    
    
                                    51.com
    
    
                                    adobe-connect
    
    
                                    adobe-connectnow
    
    
                                    adobe-creative-cloud
    
    
                                    aim
    
    
                                    aim-express
    
    
                                    ali-wangwang
    
    
                                    amazon-cloud-drive
    
    
                                    amazon-music
    
    
                                    ameba-now
    
    
                                    assembla
    
    
                                    autodesk360
    
    
                                    avaya-webalive
    
    
                                    bacnet
    
    
                                    baidu-hi
    
    
                                    bebo
    
    
                                    bitbucket
    
    
                                    boxnet
    
    
                                    buddybuddy
    
    
                                    chinaren
    
    
                                    cisco-spark
    
    
                                    cloudapp
    
    
                                    cloudforge
    
    
                                    cloudinary
    
    
                                    concur
    
    
                                    confluence
    
    
                                    convo
    
    
                                    cyph
    
    
                                    daum
    
    
                                    dcinside
    
    
                                    diameter
    
    
                                    dnp3
    
    
                                    dochub
    
    
                                    docstoc
    
    
                                    docusign
    
    
                                    draw.io
    
    
                                    dropbox
    
    
                                    egnyte
    
    
                                    evernote
    
    
                                    facebook
    
    
                                    fetion
    
    
                                    filestack
    
    
                                    flickr
    
    
                                    flixwagon
    
    
                                    fuze-meeting
    
    
                                    gatherplace
    
    
                                    genesys
    
    
                                    git
    
    
                                    github
    
    
                                    gitlab
    
    
                                    glassdoor
    
    
                                    globalmeet
    
    
                                    gmail
    
    
                                    google-calendar
    
    
                                    google-cloud-storage
    
    
                                    google-docs
    
    
                                    google-hangouts
    
    
                                    google-plus
    
    
                                    google-spaces
    
    
                                    google-talk
    
    
                                    google-translate
    
    
                                    google-video
    
    
                                    gotomypc
    
    
                                    gotowebinar
    
    
                                    gtp
    
    
                                    hadoop
    
    
                                    hightail
    
    
                                    hipchat
    
    
                                    hootsuite
    
    
                                    huddle
    
    
                                    hulu
    
    
                                    hyves
    
    
                                    iccp
    
    
                                    icloud
    
    
                                    iec-60870-5-104
    
    
                                    imeet
    
    
                                    imgur
    
    
                                    instagram
    
    
                                    instan-t
    
    
                                    ip-messenger
    
    
                                    ipsec
    
    
                                    irc
    
    
                                    issuu
    
    
                                    itunes
    
    
                                    jira
    
    
                                    join-me
    
    
                                    jumpshare
    
    
                                    kaixin
    
    
                                    kaixin001
    
    
                                    kakaotalk
    
    
                                    laiwang
    
    
                                    landesk
    
    
                                    linkedin
    
    
                                    live-mesh
    
    
                                    lotus-notes
    
    
                                    lotuslive
    
    
                                    lucidpress
    
    
                                    mail.ru
    
    
                                    mail.ru-agent
    
    
                                    maytech
    
    
                                    meebo
    
    
                                    meetup
    
    
                                    mega
    
    
                                    mendeley
    
    
                                    mercurial
    
    
                                    mixi
    
    
                                    modbus
    
    
                                    ms-ds-smb
    
    
                                    ms-lync
    
    
                                    ms-office365
    
    
                                    ms-onedrive
    
    
                                    msn
    
    
                                    myspace
    
    
                                    nateon-im
    
    
                                    netease-webdisk
    
    
                                    netflix
    
    
                                    ning
    
    
                                    noteworthy
    
    
                                    now-tv
    
    
                                    odnoklassniki
    
    
                                    onehub
    
    
                                    owncloud
    
    
                                    paltalk
    
    
                                    pastebin
    
    
                                    pcanywhere
    
    
                                    pinterest
    
    
                                    pivotaltracker
    
    
                                    powow
    
    
                                    prezi
    
    
                                    proofhub
    
    
                                    qik
    
    
                                    qliksense-cloud
    
    
                                    qq
    
    
                                    quip
    
    
                                    quora
    
    
                                    rally-software
    
    
                                    readytalk
    
    
                                    reddit
    
    
                                    rediffbol
    
    
                                    renren
    
    
                                    rtp
    
    
                                    salesforce
    
    
                                    sap-jam
    
    
                                    screencast
    
    
                                    scribd
    
    
                                    second-life
    
    
                                    secure-data-space
    
    
                                    sendthisfile
    
    
                                    service-now
    
    
                                    sharefile
    
    
                                    sharepoint
    
    
                                    sharevault
    
    
                                    showmax
    
    
                                    siemens-s7
    
    
                                    signiant
    
    
                                    sina-uc
    
    
                                    sina-weibo
    
    
                                    skydrive
    
    
                                    slack
    
    
                                    slideshare
    
    
                                    smartsheet
    
    
                                    snmp
    
    
                                    softros-messenger
    
    
                                    solarwinds
    
    
                                    soundcloud
    
    
                                    sourceforge
    
    
                                    spark-im
    
    
                                    ss7-map
    
    
                                    stocktwits
    
    
                                    storify
    
    
                                    subversion
    
    
                                    surveymonkey
    
    
                                    syncplicity
    
    
                                    tableau
    
    
                                    teamdrive
    
    
                                    teamup-calendar
    
    
                                    teamviewer
    
    
                                    thwapr
    
    
                                    torch-browser
    
    
                                    trello
    
    
                                    tumblr
    
    
                                    twitter
    
    
                                    uc-yun
    
    
                                    viber
    
    
                                    vimeo
    
    
                                    vine
    
    
                                    virustotal
    
    
                                    vkontakte
    
    
                                    vnc
    
    
                                    watchdox
    
    
                                    webex
    
    
                                    wechat
    
    
                                    weiyun
    
    
                                    whatsapp
    
    
                                    windows-azure
    
    
                                    windows-defender-atp
    
    
                                    workday
    
    
                                    yahoo-im
    
    
                                    yammer
    
    
                                    youku
    
    
                                    yousendit
    
    
                                    youtube
    
    
                                    yunpan360
    
    
                                    yy-voice
    
    
                                    zalo
    
    
                                    zendesk
    
    
                                    zenefits
    
    
                                    zettahost
    

Solution 2:

With code below you can get list of domains with subdomains fast and clear:

WebDriverWait(driver, 20).until(EC. visibility_of_element_located((By.CSS_SELECTOR, "[ottawagroup='1'] a")))
domains = driver.execute_script("return  [...document.querySelectorAll(\"[ottawagroup='1'] a\")].map(e=>e.textContent.trim())")

Post a Comment for "Scraping Javascript Data Within A Grid Of A Webpage Using Selenium And Python"