Crawling Dynamic Content With Scrapy
I am trying to get latest review from Google play store. I'm following this question for getting the latest reviews here Method specified in the above link's answer works fine with
Solution 1:
Seems like you haven't changing the id
in the form data.
defparseApp(self, response):
apps = list(set(response.xpath('//a[@class="card-click-target"]/@href').extract()))
url = "https://play.google.com/store/getreviews"for app in apps:
_id = app.strip('/store/apps/details?id=')
form_data = {"id": _id, "reviewType": '0', "reviewSortOrder": '0', "pageNum":'0'}
sleep(5)
yield FormRequest(url=url, formdata=form_data, callback=self.parse_data)
defparse_app(self, response):
response_data = re.findall("\[\[.*", response.body)
if response_data:
try:
text = json.loads(response_data[0] + ']')
sell = Selector(text=text[0][2])
except:
pass# do whatever you want to extract using sell.xapth('YOUR_XPATH_HERE')
A sample review after cleaning the data you will be getting something like this
<div class="single-review">
<ahref="/store/people/details?id=106726831005267540508"><imgclass="author-image"alt="Lorence Gerona avatar image"src="https://lh3.googleusercontent.com/uFp_tsTJboUY7kue5XAsGA=w48-c-h48"></a><divclass="review-header"data-expand-target=""data-reviewid="gp:AOqpTOHnsExa_P6JFRJD6HF5h71fpY91tNaEODjtfiTu-zPFki9ZnYsNp1HEcGFpGEfu9xqwJL_j-03Tx0e9lw"><divclass="review-info"><spanclass="author-name"><ahref="/store/people/details?id=106726831005267540508">Lorence Gerona</a></span><spanclass="review-date">3 June 2015</span><aclass="reviews-permalink"href="/store/apps/details?id=com.supercell.boombeach&reviewId=Z3A6QU9xcFRPSG5zRXhhX1A2SkZSSkQ2SEY1aDcxZnBZOTF0TmFFT0RqdGZpVHUtelBGa2k5Wm5Zc05wMUhFY0dGcEdFZnU5eHF3Skxfai0wM1R4MGU5bHc"title="Link to this review"></a><divclass="review-source"style="display:none"></div><divclass="review-info-star-rating"><divclass="tiny-star star-rating-non-editable-container"aria-label="Rated 5 stars out of five stars"><divclass="current-rating"style="width: 100%;"></div></div></div></div><divclass="rate-review-wrapper"><divclass="play-button icon-button small rate-review"title="Spam"data-rating="SPAM"><divclass="icon spam-flag"></div></div><divclass="play-button icon-button small rate-review"title="Helpful"data-rating="HELPFUL"><divclass="icon thumbs-up"></div></div><divclass="play-button icon-button small rate-review"title="Unhelpful"data-rating="UNHELPFUL"><divclass="icon thumbs-down"></div></div></div></div><divclass="review-body"><spanclass="review-title">Team BOOM BEACH</span>
Amazing game I can defeat hammerman
<divclass="review-link"style="display:none"><aclass="id-no-nav play-button tiny"href="#"target="_blank">Full Review</a></div></div></div>
Post a Comment for "Crawling Dynamic Content With Scrapy"