Skip to content Skip to sidebar Skip to footer

Beautiful Soup Throws `IndexError`

I am scraping a website using Python 2.7 and Beautiful Soup 3.2. I am new to both languages, but from the documentation I got a bit started. I am reading the next documentations: h

Solution 1:

The problem is that the "away" cell (column name) is inside a td with "away" class:

<thead class="title">
    ...
    <tr class="sub">
      ...  
      <td>Home-team</td>
      <td></td>
      <td class="away">Away-team</td>
      <td class="broadcast">Broadcast</td>
    </tr>
  </thead>
</thead>

Just skip it by using slicing:

awayteamsTd = soup.findAll('td', { "class" : "away" })[1:]

Also, if you want to exclude Dutch KNVB Beker from the list of home teams, add a condition to the list comprehension expression:

hometeams = [tag.contents[1] for tag in hometeamsTd if tag.contents[1] != 'Dutch KNVB Beker']

Solution 2:

awayteams = []
for tag in awayteamsTd:
    if len(tag.contents) > 1:
        awayteams.append(tag.contents[1])

Post a Comment for "Beautiful Soup Throws `IndexError`"