爬取评论的疑问
来源:6-18 知乎spider爬虫逻辑的实现以及answer的提取 - 2
EnzoLiu
2018-09-25
有关抓取评论的api接口里面的limit和offset参数,我看到bobby老师给写死20和0了
如果评论较多,这样抓取是不是存在问题呢?
start_answer_url = "https://www.zhihu.com/api/v4/questions/{0}/answers?sort_by=default&include=data%5B%2A%5D.is_normal%2Cadmin_closed_comment%2Creward_info%2Cis_collapsed%2Cannotation_action%2Cannotation_detail%2Ccollapse_reason%2Cis_sticky%2Ccollapsed_by%2Csuggest_edit%2Ccomment_count%2Ccan_comment%2Ccontent%2Ceditable_content%2Cvoteup_count%2Creshipment_settings%2Ccomment_permission%2Ccreated_time%2Cupdated_time%2Creview_info%2Crelevant_info%2Cquestion%2Cexcerpt%2Crelationship.is_authorized%2Cis_author%2Cvoting%2Cis_thanked%2Cis_nothelp%3Bdata%5B%2A%5D.mark_infos%5B%2A%5D.url%3Bdata%5B%2A%5D.author.follower_count%2Cbadge%5B%2A%5D.topics&limit={1}&offset={2}"
...
yield scrapy.Request(url=self.start_answer_url.format(question_id, 20, 0), headers=self.headers, callback=self.parse_answer)
写回答
1回答
-
这个地方的逻辑我只需要知道第一页的数据就行了,第一页获取20条数据,至于下一页是什么以及获取多少条知乎已经返回了下一页的url了
012018-09-26
相似问题