无法进入def parse_question(self,response):
来源:6-14 item loder方式提取question - 1
weixin_慕勒4383646
2020-06-07
Bobby老师:
我的 parse函数是如下写的,但是运行时进步了parse_question函数,我尝试了带cookies、headers和在headers中加refer,都没成功,求指教问题出在哪里?谢谢
def parse(self, response):
cookie_dict = response.meta.get("cookies","")
all_urls = response.css("a::attr(href)").extract()
all_urls = [urljoin(response.url,url) for url in all_urls]
all_urls = filter(lambda x:True if x.startswith("https") else False,all_urls)
for url in all_urls:
match_obj = re.match("(.*zhihu.com/question/(\d+))(/|$).*",url)
if match_obj:
request_url = match_obj.group(1)
question_id = match_obj.group(2)
self.headers["referer"] = url
yield scrapy.Request(request_url,headers=self.headers,callback=self.parse_question,cookies=cookie_dict)
break
def parse_question(self,response):
pass
写回答
1回答
-
bobby
2020-06-09
你看看console中有没有发起 question相关的url 以及反正的状态码是否为200?
00
相似问题