Skip to content Skip to sidebar Skip to footer

Scraping Paginated Web Table With Python Pandas & Beautifulsoup

I am a beginner in python pandas, i am trying to scrap a paginated table using beautiful soup package, the data is scraped, but the content of each cell comes in a single row, i co

Solution 1:

i did some cleaning. first, why the bytes type? you're writing text. then why ascii? please use unicode. if later in your code you really need ascii encode to ascii then. the use of findAll is deprecated, please use find_all. you had also a possible issue with commas in the surface value. finally, always use context managers when possible (here: working with files)

and now for your question, you had two problems:

  1. your test if len(saverec)!=0: was in the for-loop, generating lots of useless data.
  2. you were not stripping the data of its unneeded whitespaces

.

import urllib
import urllib.requestfrom bs4 import BeautifulSoup
import os


def make_soup(url):
    thepage=urllib.request.urlopen(url)
    soupdata=BeautifulSoup(thepage,"html.parser")
    return soupdata


save=""
for num in range(0, 22):
    soup=make_soup("http://www.ceetrus.com/fr/implantations-sites-commerciaux?page="+str(num))
    for rec in soup.find_all('tr'):
        saverec=""
        for data in rec.find_all('td'):
            data = data.text.strip()
            if "," in data:
                data = data.replace(",", "")
            saverec=saverec+","+data
        if len(saverec)!=0:
         save=save+"\n"+saverec[1:]
    print('#%d done' % num)

headers="Nom_commercial_du_Site,Ville,Etat,Surface_GLA,Nombre_de_boutique,Contact"
with open(os.path.expanduser("sites_commerciaux.csv"), "w") as csv_file:
    csv_file.write(headers)
    csv_file.write(save)

which outputs for the first page:

Nom_commercial_du_Site,Ville,Etat,Surface_GLA,Nombre_de_boutique,Contact
ALCORCÓN,ALCORCÓN - MADRID,Ouvert,4298 m²,40,José Carlos GARCIA
Alegro Alfragide,CARNAXIDE,Ouvert,11461 m²,122,
Alegro Castelo Branco,CASTELO BRANCO,Ouvert,6830 m²,55,
Alegro Setúbal,Setúbal,Ouvert,27000 m²,114,
Ancona,Ancona,Ouvert,7644 m²,41,Ettore PAPPONETTI
Angoulême La Couronne,LA COURONNE,Ouvert,6141 m²,45,Juliette GALLOUEDEC
Annecy Grand Epagny,EPAGNY,Ouvert,20808 m²,61,Delphine BENISTY
Anping,Tainan,Ouvert,969 m²,21,Roman LEE
АКВАРЕЛЬ,Volgograd,Ouvert,94025 m²,182,Viktoria ZAITSEVA
Arras,ARRAS,Ouvert,4000 m²,26,Anais NIZON

Post a Comment for "Scraping Paginated Web Table With Python Pandas & Beautifulsoup"