Following Siblings Selenium Python With Conditions

February 02, 2023 Post a Comment

I'm trying to collect following siblings until a certain sibling, But I still can't figure out how to do it, I tried to locate before and after sibling with class name but I got wr

Solution 1:

Regarding your expected output, why don't you extract the text from all span elements since they are already in order ? For example, with LXML :

data=tree.xpath("//span/text()")
print(*data, sep="\n")

Output :

2 August 2020
1
2
3
4
15 August 2020
5
6

If you really want to use loops and create a dictionnary, here's a proposal. First, the data :

data = """<div class="MainClass">

        <div class="InfoClass">
            <div class="left-wrap">
              <span class="date">2 August 2020</span>
            </div>
        </div>

        <div class="DataClass">
            <em class="Code">
                <span>1</span>
            </em>
        </div>
        
        <div class="DataClass">
            <em class="Code">
                <span>2</span>
            </em>
        </div>
        
        <div class="DataClass">
            <em class="Code">
                <span>3</span>
            </em>
        </div>
        
        <div class="DataClass">
            <em class="Code">
                <span>4</span>
            </em>
        </div>
    
        <div class="InfoClass">
            <div class="left-wrap">
              <span class="date">15 August 2020</span>
            </div>
        </div>

        <div class="DataClass">
            <em class="Code">
                <span>5</span>
            </em>
        </div>

        <div class="DataClass">
            <em class="Code">
                <span>6</span>
            </em>
        </div>
</div>"""

Then, the code :

import lxml.html
tree = lxml.html.fromstring(data)

dates = [el.text for el in tree.xpath("//span[@class='date']")]
print(dates)

dc=[]
for els in dates:
    lists=[el.text for el in tree.xpath("//div[span[text()='"+els+"']]/../following-sibling::div[@class='DataClass']//span[preceding::span[@class='date'][1][.='"+els+"']]")]
    dc.append(lists)

print(dc)

dictionary = dict(zip(dates,dc))
print(dictionary)

Comments :

First, you extract the dates in a list. Then, all rely upon the following XPath (the one you were looking for ?) to get the corresponding dataclasses :

//div[span[text()='"+els+"']]/../following-sibling::div[@class='DataClass']//span[preceding::span[@class='date'][1][.='"+els+"']]

+els+ are the dates previously fetched.

Finally, you construct the dictionnary. This code is written for LXML. Just replace the tree.xpath with the Selenium equialent(driver.find_elements_by_xpath) to make it work.

Output (dates, dataclasses, dictionnary) :

['2 August 2020', '15 August 2020']
[['1', '2', '3', '4'], ['5', '6']]
{'2 August 2020': ['1', '2', '3', '4'], '15 August 2020': ['5', '6']}

EDIT : If you need to print the dictionnary, you can use :

for keys,values in dictionary.items():
    print(keys)
    print(*values,sep='\n')

Output as requested :

2 August 2020
1
2
3
4
15 August 2020
5
6

Solution 2:

You can use same simple code as for previous question but using list to collect correct values if .Code is not unique. It work also if 2 August 2020 and 15 August 2020 will same code

codes = list()
for e in driver.find_elements_by_class_name('Code'):
    code = e.text
    date = e.find_element_by_xpath("(./preceding::span[@class='date'])[last()]").text
    codes.append({"date": date, "code": code})

for c in codes:
    print(f'date: {c["date"]}, code: {c["code"]}')

The output:

date: 2 August 2020, code: 1
date: 2 August 2020, code: 2
date: 2 August 2020, code: 3
date: 2 August 2020, code: 4
date: 15 August 2020, code: 5
date: 15 August 2020, code: 6

If you want dict with date as a key and codes as values:

codes = dict()
for e in driver.find_elements_by_class_name('Code'):
    code = e.text
    date = e.find_element_by_xpath("(./preceding::span[@class='date'])[last()]").text
    if date in codes:
        codes[date].append(code)
    else:
        codes.update({date: [code]})

for k, v in codes.items():
    print(f'{k} : {v}')

With output:

2 August 2020 : ['1', '2', '3', '4']
15 August 2020 : ['5', '6']

Solution 3:

I have found a way that will display the text you want it.

mainClassText = driver.find_element_by_xpath("//div[@class='MainClass']").text
print(mainClassText)

if you want you can also turn this into list.

mainClassTextList = mainClassText.split("\n")
for ele in mainClassTextList:
    print(ele)

It will be displayed in both cases:

2 August 2020
1
2
3
4
15 August 2020
5
6

Solution 4:

As all the divs containing date and data are at same level under MainClass div. We can get desired result uisng one generic xpaths for all spans containing date and data.

 driver = webdriver.Chrome()
driver.get("https://bilalzamel.htmlsave.net/")

mainClass = driver.find_elements_by_xpath("//div[@class='MainClass']//span")
for mc in mainClass:
    kDate = mc.text
    print(kDate)

Getting Started with Python

Following Siblings Selenium Python With Conditions

Solution 1:

Solution 2:

Solution 3:

Solution 4:

Post a Comment for "Following Siblings Selenium Python With Conditions"