Extract Information From Website Using Xpath, Python
Trying to extract some useful information from a website. I came a bit now im stuck and in need of your help! I need the information from this table http://gbgfotboll.se/serier/?sc
Solution 1:
I think is it what you want:
#coding: utf-8
from lxml import etree
import lxml.html
collected = [] #list-tuple of [(col1, col2...), (col1, col2...)]
dom = lxml.html.parse("http://gbgfotboll.se/serier/?scr=scorers&ftid=57700")
#all table rows
xpatheval = etree.XPathDocumentEvaluator(dom)
rows = xpatheval('//div[@id="content-primary"]/div/table[1]/tbody/tr')
# If there are less than 12 rows (or <=12): Take all the rows except the last.
if len(rows) <= 12:
rows.pop()
else:
# If there are more than 12 rows: Simply take the first 12 rows.
rows = rows[0:12]
for row in rows:
# all columns of current table row (Spelare, Lag, Mal, straffmal)
columns = row.findall("td")
# pick textual data from each <td>
collected.append([column.text for column in columns])
for i in collected: print i
Output:
Solution 2:
This is how you can get the rows you need based on what you described in your post. This is just the logic based on concept that rows
is a list, you have to incorporate this into your code as needed.
if len(rows) <=12:
print rows[0:-1]
elif len(rows) > 12:
print rows[0:12]
Post a Comment for "Extract Information From Website Using Xpath, Python"