ERROR: Error caught on signal handler 启动ScrapyRedisTest jobbole 错误

来源:10-4 scrapy-redis编写分布式爬虫代码

杜小牧

2018-08-26

2018-08-26 10:36:41 [scrapy.utils.signal] ERROR: Error caught on signal handler: <bound method RedisMixin.spider_idle of <JobboleSpider 'jobbole' at 0x54fa4f0>>

Traceback (most recent call last):

  File "D:\360Downloaded\Project\aiticle_spider\lib\site-packages\scrapy\utils\signal.py", line 30, in send_catch_log

    *arguments, **named)

  File "D:\360Downloaded\Project\aiticle_spider\lib\site-packages\pydispatch\robustapply.py", line 55, in robustApply

    return receiver(*arguments, **named)

  File "D:/360Downloaded/Project/ScrapyRedisTest\scrapy_redis\spiders.py", line 121, in spider_idle

    self.schedule_next_requests()

  File "D:/360Downloaded/Project/ScrapyRedisTest\scrapy_redis\spiders.py", line 115, in schedule_next_requests

    for req in self.next_requests():

  File "D:/360Downloaded/Project/ScrapyRedisTest\scrapy_redis\spiders.py", line 87, in next_requests

    req = self.make_request_from_data(data)

  File "D:/360Downloaded/Project/ScrapyRedisTest\scrapy_redis\spiders.py", line 110, in make_request_from_data

    return self.make_requests_from_url(url)

  File "D:\360Downloaded\Project\aiticle_spider\lib\site-packages\scrapy\spiders\__init__.py", line 87, in make_requests_from_url

    return Request(url, dont_filter=True)

  File "D:\360Downloaded\Project\aiticle_spider\lib\site-packages\scrapy\http\request\__init__.py", line 25, in __init__

    self._set_url(url)

  File "D:\360Downloaded\Project\aiticle_spider\lib\site-packages\scrapy\http\request\__init__.py", line 62, in _set_url

    raise ValueError('Missing scheme in request url: %s' % self._url)

ValueError: Missing scheme in request url: blog.jobbole.com

2018-08-26 10:36:41 [scrapy.core.engine] INFO: Closing spider (finished)

2018-08-26 10:36:41 [scrapy.statscollectors] INFO: Dumping Scrapy stats:

{'finish_reason': 'finished',

 'finish_time': datetime.datetime(2018, 8, 26, 2, 36, 41, 335233),

 'log_count/DEBUG': 1,

 'log_count/ERROR': 1,

 'log_count/INFO': 8,

 'start_time': datetime.datetime(2018, 8, 26, 2, 36, 26, 330813)}

2018-08-26 10:36:41 [scrapy.core.engine] INFO: Spider closed (finished)


写回答

1回答

bobby

2018-08-27

你的item中给image相关的字段赋值的时候 需要设置为list类型 不能直接复制字符串类型,这个在课程中讲解过

0
0

Scrapy打造搜索引擎 畅销4年的Python分布式爬虫课

带你彻底掌握Scrapy,用Django+Elasticsearch搭建搜索引擎

5796 学习 · 6290 问题

查看课程