Scrapy Shell Works But Actual Script Returns 404 Error
scrapy shell http://www.zara.com/us Returns a correct 200 code 2017-01-05 18:34:20 [scrapy.utils.log] INFO: Scrapy 1.3.0 started (bot: zara) 2017-01-05 18:34:20 [scrapy.utils.log]
Solution 1:
Scrapy by default for every new projects turns on ROBOTS_TXT_OBEY
to True, which means before your spider can scrape anything it checks websites robots.txt
file for what is allowed and disallowed to be scraped.
To disable this simply delete the setting ROBOTS_TXT_OBEY
from settings.py
file.
See more here
Post a Comment for "Scrapy Shell Works But Actual Script Returns 404 Error"