Skip to content Skip to sidebar Skip to footer

My Job Scrapes Only The Last Page Instead Of All Of Them

My scraping job only seems to write to CSV the last page of the website. I assume this is because it is looping through all pages and then writes to the csv. It does scrape the e

Solution 1:

The problem is that you are overwriting the CSV every single iteration and hence only last record remains when the script ends.

Change

withopen('vtg12.csv', 'a', newline='') as outfile:
    writer = csv.writer(outfile)
    for row inzip(langs1_text, langs_text, elem_href):
        writer.writerow(row)

to

withopen('vtg12.csv', 'a+', newline='') as outfile:
    writer = csv.writer(outfile)
    for row inzip(langs1_text, langs_text, elem_href):
        writer.writerow(row)

a+ will open the file in append mode

Solution 2:

At the very top:

defappend_to_csv(csv_list, output_filename):
    withopen(output_filename, 'a', newline='') as fp:
        a = csv.writer(fp)
        data = [csv_list]
        a.writerows(data)

Then replace

withopen('vtg12.csv', 'a', newline='') as outfile:
        writer = csv.writer(outfile)
        for row inzip(langs1_text, langs_text, elem_href):
            writer.writerow(row)

with:

for row in zip(langs_text, langs2_text, langs_text, elem_href):

    append_to_csv(row, 'vtg12.csv')

Post a Comment for "My Job Scrapes Only The Last Page Instead Of All Of Them"