My Job Scrapes Only The Last Page Instead Of All Of Them
My scraping job only seems to write to CSV the last page of the website. I assume this is because it is looping through all pages and then writes to the csv. It does scrape the e
Solution 1:
The problem is that you are overwriting the CSV every single iteration and hence only last record remains when the script ends.
Change
withopen('vtg12.csv', 'a', newline='') as outfile:
writer = csv.writer(outfile)
for row inzip(langs1_text, langs_text, elem_href):
writer.writerow(row)
to
withopen('vtg12.csv', 'a+', newline='') as outfile:
writer = csv.writer(outfile)
for row inzip(langs1_text, langs_text, elem_href):
writer.writerow(row)
a+
will open the file in append mode
Solution 2:
At the very top:
defappend_to_csv(csv_list, output_filename):
withopen(output_filename, 'a', newline='') as fp:
a = csv.writer(fp)
data = [csv_list]
a.writerows(data)
Then replace
withopen('vtg12.csv', 'a', newline='') as outfile:
writer = csv.writer(outfile)
for row inzip(langs1_text, langs_text, elem_href):
writer.writerow(row)
with:
for row in zip(langs_text, langs2_text, langs_text, elem_href):
append_to_csv(row, 'vtg12.csv')
Post a Comment for "My Job Scrapes Only The Last Page Instead Of All Of Them"