Skip to content Skip to sidebar Skip to footer

Extract Information From Website Using Xpath, Python

Trying to extract some useful information from a website. I came a bit now im stuck and in need of your help! I need the information from this table http://gbgfotboll.se/serier/?sc

Solution 1:

I think is it what you want:

#coding: utf-8
from lxml import etree
import lxml.html

collected = [] #list-tuple of [(col1, col2...), (col1, col2...)]
dom = lxml.html.parse("http://gbgfotboll.se/serier/?scr=scorers&ftid=57700")
#all table rows
xpatheval = etree.XPathDocumentEvaluator(dom)
rows = xpatheval('//div[@id="content-primary"]/div/table[1]/tbody/tr')
# If there are less than 12 rows (or <=12): Take all the rows except the last.
if len(rows) <= 12:
    rows.pop() 
else:
    # If there are more than 12 rows: Simply take the first 12 rows.
    rows = rows[0:12]

for row in rows:
    # all columns of current table row (Spelare, Lag, Mal, straffmal)
    columns = row.findall("td")
    # pick textual data from each <td>
    collected.append([column.text for column in columns])

for i in collected: print i

Output:

enter image description here


Solution 2:

This is how you can get the rows you need based on what you described in your post. This is just the logic based on concept that rows is a list, you have to incorporate this into your code as needed.

if len(rows) <=12:
    print rows[0:-1]
elif len(rows) > 12:
    print rows[0:12]

Post a Comment for "Extract Information From Website Using Xpath, Python"