Beautiful Soup Throws `IndexError`
I am scraping a website using Python 2.7 and Beautiful Soup 3.2. I am new to both languages, but from the documentation I got a bit started. I am reading the next documentations: h
Solution 1:
The problem is that the "away" cell (column name) is inside a td with "away" class:
<thead class="title">
...
<tr class="sub">
...
<td>Home-team</td>
<td></td>
<td class="away">Away-team</td>
<td class="broadcast">Broadcast</td>
</tr>
</thead>
</thead>
Just skip it by using slicing:
awayteamsTd = soup.findAll('td', { "class" : "away" })[1:]
Also, if you want to exclude Dutch KNVB Beker
from the list of home teams, add a condition to the list comprehension expression:
hometeams = [tag.contents[1] for tag in hometeamsTd if tag.contents[1] != 'Dutch KNVB Beker']
Solution 2:
awayteams = []
for tag in awayteamsTd:
if len(tag.contents) > 1:
awayteams.append(tag.contents[1])
Post a Comment for "Beautiful Soup Throws `IndexError`"