Following Siblings Selenium Python With Conditions
Solution 1:
Regarding your expected output, why don't you extract the text from all span elements since they are already in order ? For example, with LXML :
data=tree.xpath("//span/text()")
print(*data, sep="\n")
Output :
2 August 2020
1
2
3
4
15 August 2020
5
6
If you really want to use loops and create a dictionnary, here's a proposal. First, the data :
data = """<div class="MainClass">
<div class="InfoClass">
<div class="left-wrap">
<span class="date">2 August 2020</span>
</div>
</div>
<div class="DataClass">
<em class="Code">
<span>1</span>
</em>
</div>
<div class="DataClass">
<em class="Code">
<span>2</span>
</em>
</div>
<div class="DataClass">
<em class="Code">
<span>3</span>
</em>
</div>
<div class="DataClass">
<em class="Code">
<span>4</span>
</em>
</div>
<div class="InfoClass">
<div class="left-wrap">
<span class="date">15 August 2020</span>
</div>
</div>
<div class="DataClass">
<em class="Code">
<span>5</span>
</em>
</div>
<div class="DataClass">
<em class="Code">
<span>6</span>
</em>
</div>
</div>"""
Then, the code :
import lxml.html
tree = lxml.html.fromstring(data)
dates = [el.text for el in tree.xpath("//span[@class='date']")]
print(dates)
dc=[]
for els in dates:
lists=[el.text for el in tree.xpath("//div[span[text()='"+els+"']]/../following-sibling::div[@class='DataClass']//span[preceding::span[@class='date'][1][.='"+els+"']]")]
dc.append(lists)
print(dc)
dictionary = dict(zip(dates,dc))
print(dictionary)
Comments :
First, you extract the dates in a list. Then, all rely upon the following XPath (the one you were looking for ?) to get the corresponding dataclasses :
//div[span[text()='"+els+"']]/../following-sibling::div[@class='DataClass']//span[preceding::span[@class='date'][1][.='"+els+"']]
+els+
are the dates previously fetched.
Finally, you construct the dictionnary. This code is written for LXML
. Just replace the tree.xpath
with the Selenium equialent(driver.find_elements_by_xpath
) to make it work.
Output (dates, dataclasses, dictionnary) :
['2 August 2020', '15 August 2020']
[['1', '2', '3', '4'], ['5', '6']]
{'2 August 2020': ['1', '2', '3', '4'], '15 August 2020': ['5', '6']}
EDIT : If you need to print the dictionnary, you can use :
for keys,values in dictionary.items():
print(keys)
print(*values,sep='\n')
Output as requested :
2 August 2020
1
2
3
4
15 August 2020
5
6
Solution 2:
You can use same simple code as for previous question but using list
to collect correct values if .Code
is not unique. It work also if 2 August 2020 and 15 August 2020 will same code
codes = list()
for e in driver.find_elements_by_class_name('Code'):
code = e.text
date = e.find_element_by_xpath("(./preceding::span[@class='date'])[last()]").text
codes.append({"date": date, "code": code})
for c in codes:
print(f'date: {c["date"]}, code: {c["code"]}')
The output:
date: 2 August 2020, code: 1
date: 2 August 2020, code: 2
date: 2 August 2020, code: 3
date: 2 August 2020, code: 4
date: 15 August 2020, code: 5
date: 15 August 2020, code: 6
If you want dict with date as a key and codes as values:
codes = dict()
for e in driver.find_elements_by_class_name('Code'):
code = e.text
date = e.find_element_by_xpath("(./preceding::span[@class='date'])[last()]").text
if date in codes:
codes[date].append(code)
else:
codes.update({date: [code]})
for k, v in codes.items():
print(f'{k} : {v}')
With output:
2 August 2020 : ['1', '2', '3', '4']
15 August 2020 : ['5', '6']
Solution 3:
I have found a way that will display the text you want it.
mainClassText = driver.find_element_by_xpath("//div[@class='MainClass']").text
print(mainClassText)
if you want you can also turn this into list.
mainClassTextList = mainClassText.split("\n")
for ele in mainClassTextList:
print(ele)
It will be displayed in both cases:
2 August 2020
1
2
3
4
15 August 2020
5
6
Solution 4:
As all the divs containing date and data are at same level under MainClass div. We can get desired result uisng one generic xpaths for all spans containing date and data.
driver = webdriver.Chrome()
driver.get("https://bilalzamel.htmlsave.net/")
mainClass = driver.find_elements_by_xpath("//div[@class='MainClass']//span")
for mc in mainClass:
kDate = mc.text
print(kDate)
Post a Comment for "Following Siblings Selenium Python With Conditions"