Scrapy Shell Works But Actual Script Returns 404 Error

January 21, 2024 Post a Comment

scrapy shell http://www.zara.com/us Returns a correct 200 code 2017-01-05 18:34:20 [scrapy.utils.log] INFO: Scrapy 1.3.0 started (bot: zara) 2017-01-05 18:34:20 [scrapy.utils.log]

Solution 1:

Scrapy by default for every new projects turns on ROBOTS_TXT_OBEY to True, which means before your spider can scrape anything it checks websites robots.txt file for what is allowed and disallowed to be scraped.

To disable this simply delete the setting ROBOTS_TXT_OBEY from settings.py file.

See more here

Getting Started with Python

Scrapy Shell Works But Actual Script Returns 404 Error

Solution 1:

Post a Comment for "Scrapy Shell Works But Actual Script Returns 404 Error"