调试模式下为什么yield Request()无法跳转到另外一个方法(断点2)

来源:4-24 大规模抓取图片下载出错的问题

情深酒烈

2020-05-21

图片描述

URL和方法名称都是对的,就是不能进入parse_topic_detail方法

def parse_topic(self, response):
    """提取帖子列表中的信息"""
    topic_item = TopicspiderItem()
    all_tr = response.xpath("//tbody//tr")[:1]
    for item in all_tr:
        status = item.xpath(".//td[@class='forums_topic_flag']//span/text()").extract_first("")
        score = item.xpath(".//td[@class='forums_score']//em/text()").extract_first(0)
        title = item.xpath(".//td[@class='forums_topic']//a[last()]/text()").extract_first("")
        author_name = item.xpath(".//td[@class='forums_author']//a/text()").extract_first("")
        create_time = item.xpath(".//td[@class='forums_author']//em/text()").extract_first("")
        last_time = item.xpath(".//td[@class='forums_last_pub']//em/text()").extract_first("")

        answer_check_str = item.xpath(".//td[@class='forums_reply']//span/text()").extract_first("")
        if answer_check_str:
            answer_nums = answer_check_str.split("/")[0]
            check_nums = answer_check_str.split("/")[1]
        else:
            answer_nums = 0
            check_nums = 0

        topic_item["status"] = status
        topic_item["score"] = int(score)
        topic_item["title"] = title
        topic_item["author_name"] = author_name
        if create_time:
            create_time = datetime.strptime(create_time, "%Y-%m-%d %H:%M")
        topic_item["create_time"] = create_time

        if last_time:
            last_time = datetime.strptime(last_time, "%Y-%m-%d %H:%M")
        topic_item["last_time"] = last_time

        topic_item["answer_nums"] = int(answer_nums)
        topic_item["check_nums"] = int(check_nums)

        topic_detail_url = item.xpath(".//td[@class='forums_topic']//a[last()]/@href").extract_first("")
        detail_url = parse.urljoin(self.domain, topic_detail_url)
        yield Request(url=detail_url, meta={"topic_item": topic_item}, callback=self.parse_topic_detail)
    pass

def parse_topic_detail(self, response):
    """获取帖子详情页面信息"""
    url = response.url
    match_path = re.match(r".*?(d+)")
    if match_path:
        topic_id = match_path.group(1)
        topic_item = response.get("topic_item", [])
    pass
写回答

1回答

bobby

2020-05-22

这不是无法跳转到parse_detail,这个不是同步io框架 不能理解为我只要yield之后下一个函数调用就应该是parse_detail,你可以将上面的断点取消然后运行试试

0
0

Scrapy打造搜索引擎 畅销4年的Python分布式爬虫课

带你彻底掌握Scrapy,用Django+Elasticsearch搭建搜索引擎

5796 学习 · 6290 问题

查看课程