Scrapy And Selenium : Only Scrape Two Pages
I want to crawl a website, there are more than 10 pages every page has 10 links, the spider will get the linksdef parse(): and go the the link to crawl another data I want def par
Solution 1:
Make i
persistent:
def __init__(self):
self.page_num = 0
self.driver = webdriver.Firefox()
dispatcher.connect(self.spider_closed, signals.spider_closed)
#how to write to only catch 2 pages??
if self.page_num < 2:
try:
next = self.driver.find_element_by_xpath("/li[@class='p_next'][1]")
next_page = next.text
if next_page == "next_page":
next.click()
self.driver.refresh()
yield Request(self.driver.current_url, callback=self.parse)
self.page_num += 1
except:
print "page not found"
Post a Comment for "Scrapy And Selenium : Only Scrape Two Pages"