Skip to content Skip to sidebar Skip to footer

Scrapy And Selenium : Only Scrape Two Pages

I want to crawl a website, there are more than 10 pages every page has 10 links, the spider will get the linksdef parse(): and go the the link to crawl another data I want def par

Solution 1:

Make i persistent:

def __init__(self):
    self.page_num = 0
    self.driver = webdriver.Firefox()
    dispatcher.connect(self.spider_closed, signals.spider_closed)
    #how to write to only catch 2 pages??
    if self.page_num < 2:
        try:
            next = self.driver.find_element_by_xpath("/li[@class='p_next'][1]")   
            next_page = next.text
            if next_page == "next_page":  
                next.click()    
                self.driver.refresh()  
                yield Request(self.driver.current_url, callback=self.parse)
                self.page_num += 1
        except:
             print "page not found"

Post a Comment for "Scrapy And Selenium : Only Scrape Two Pages"