Skip to content Skip to sidebar Skip to footer

Scrapy Shell Works But Actual Script Returns 404 Error

scrapy shell http://www.zara.com/us Returns a correct 200 code 2017-01-05 18:34:20 [scrapy.utils.log] INFO: Scrapy 1.3.0 started (bot: zara) 2017-01-05 18:34:20 [scrapy.utils.log]

Solution 1:

Scrapy by default for every new projects turns on ROBOTS_TXT_OBEY to True, which means before your spider can scrape anything it checks websites robots.txt file for what is allowed and disallowed to be scraped.

To disable this simply delete the setting ROBOTS_TXT_OBEY from settings.py file.

See more here

Post a Comment for "Scrapy Shell Works But Actual Script Returns 404 Error"