No Nested Nodes. How To Get One Piece Of Information And Then To Get Additional Info Respectively?
For the code below I need to get dates and their times+hrefs+formats+...(not shown) respectively.
The Little Prince
<Solution 1:
Turns out that lxml
support referencing python variable from XPath expression, which proven to be useful for this case i.e for every div date
, you can get the following sibling span
which the nearest preceding sibling div date
is the current div date
, where reference to the current div date
is stored in python variabledates
:
for dates in movie.xpath('.//div[@class="showstimes"]/div[@class="date"]'):
date = dates.xpath('normalize-space()')
for times in dates.xpath('following-sibling::span[preceding-sibling::div[1]=$current]', current=dates):
time = times.xpath('a/text()')[0]
url = times.xpath('a/@href')[0]
format_type = times.xpath('span/text()')[0]
printdate, time, url, format_type
output :
'9 December, Wednesday', '12:30', 'http://www.test.com', '3D''9 December, Wednesday', '15:30', 'http://www.test.com', '3D''9 December, Wednesday', '18:30', 'http://www.test.com', '3D''10 December, Thursday', '12:30', 'http://www.test.com', '2D''10 December, Thursday', '15:30', 'http://www.test.com', '3D'
References :
Post a Comment for "No Nested Nodes. How To Get One Piece Of Information And Then To Get Additional Info Respectively?"