启动scrapy crawl jobbole -s JOBDIR=job_info/001报错

来源:9-6 scrapy的暂停与重启

愚墨

2020-04-13

图片描述
Unhandled error in Deferred:
2020-04-13 16:21:35 [twisted] CRITICAL: Unhandled error in Deferred:

2020-04-13 16:21:35 [twisted] CRITICAL:
Traceback (most recent call last):
File “H:\anaconda3.4\anaconda_\lib\site-packages\twisted\internet\task.py”, line 517, in oneWorkUnit
result = next(self.iterator)
File "H:\anaconda3.4\anaconda
\lib\site-packages\scrapy\utils\defer.py", line 63, in
work = (callable(elem, *args, **named) for elem in iterable)
File "H:\anaconda3.4\anaconda
\lib\site-packages\scrapy\core\scraper.py", line 183, in process_spidermw_output
self.crawler.engine.crawl(request=output, spider=spider)
File "H:\anaconda3.4\anaconda
\lib\site-packages\scrapy\core\engine.py", line 210, in crawl
self.schedule(request, spider)
File “H:\anaconda3.4\anaconda_\lib\site-packages\scrapy\core\engine.py”, line 216, in schedule
if not self.slot.scheduler.enqueue_request(request):
File “H:\anaconda3.4\anaconda_\lib\site-packages\scrapy\core\scheduler.py”, line 57, in enqueue_request
dqok = self.dqpush(request)
File "H:\anaconda3.4\anaconda
\lib\site-packages\scrapy\core\scheduler.py", line 86, in dqpush
self.dqs.push(reqd, -request.priority)
File "H:\anaconda3.4\anaconda
\lib\site-packages\queuelib\pqueue.py", line 35, in push
q.push(obj) # this may fail (eg. serialization error)
File “H:\anaconda3.4\anaconda_\lib\site-packages\scrapy\squeues.py”, line 15, in push
s = serialize(obj)
File “H:\anaconda3.4\anaconda_\lib\site-packages\scrapy\squeues.py”, line 27, in pickle_serialize
return pickle.dumps(obj, protocol=2)
File "H:\anaconda3.4\anaconda
\lib\site-packages\parsel\selector.py", line 204, in getstate
raise TypeError(“can’t pickle Selector objects”)
TypeError: can’t pickle Selector objects

写回答

2回答

bobby

2020-04-16

//img1.sycdn.imooc.com/szimg/5e980eb6095bdd5509690451.jpg 你先试试不放这个进去会不会报错?

0
2
bobby
回复
愚墨
你留下qq我加你看看 课程讲解的如何可以的话 那么应该就是可以 你的scrapy版本是多少
2020-04-17
共2条回复

bobby

2020-04-14

yield request对象的时候不能将selector对象放入到meta属性中,因为这个地方的值会进行pickle 这样会抛出异常 你可以将response的html放入进去 然后到另一个函数的时候在生成selector对象

0
2
愚墨
这是我的解析代码: def parse_detail(self, response): match_re = re.match('.*?(\d+)', response.url) if match_re: post_id = match_re.group(1) item_loder = ArticleItemLoder(item=JobBoleArticleItem(), response=response) item_loder.add_xpath('title', '//div[@id="news_main"]/div[@id="news_title"]/a/text()') ........ yield Request(url=parse.urljoin(response.url, "/NewsAjax/GetAjaxNewsInfo?contentId={}".format(post_id)), meta={'article_item': item_loder, 'url': response.url}, callback=self.parse_nums )
2020-04-14
共2条回复

Scrapy打造搜索引擎 畅销4年的Python分布式爬虫课

带你彻底掌握Scrapy,用Django+Elasticsearch搭建搜索引擎

5795 学习 · 6290 问题

查看课程