scrpyd 部署分布式爬虫到服务器报错,不用scrapyd运行没有问题,NotADirectoryError
来源:16-1 scrapyd部署scrapy项目
defau
2017-10-22
打包本地代码到scrapyd上运行 ,这个是个scrapy-redis分布式爬虫,使用scrapy runspider 在该服务器运行没有一点问题(环境应该没有问题),但是使用scrapyd每次运行几秒就报错 ,这个错误百度谷歌都没有解决
查到如下可能有用的资料:
事实上如果在setting里加入了类似与分布式爬虫、大规模爬虫、等设定,但是没有设置完整,缺少后续操作步骤就会报错
爬虫运行日志文件记录如下:
2017-10-21 23:44:33 [scrapy.utils.log] INFO: Scrapy 1.4.0 started (bot: Blog)
2017-10-21 23:44:33 [scrapy.utils.log] INFO: Overridden settings: {'BOT_NAME': 'Blog', 'DOWNLOAD_DELAY': 0.5, 'DUPEFILTER_CLASS': 'scrapy_redis.dupefilter.RFPDupeFilter', 'LOG_FILE': 'logs/Blog/csdn/b9b36972b67611e79c6100163e001058.log', 'NEWSPIDER_MODULE': 'Blog.spiders', 'SCHEDULER': 'scrapy_redis.scheduler.Scheduler', 'SPIDER_MODULES': ['Blog.spiders']}
2017-10-21 23:44:33 [scrapy.middleware] INFO: Enabled extensions:
['scrapy.extensions.corestats.CoreStats',
'scrapy.extensions.telnet.TelnetConsole',
'scrapy.extensions.memusage.MemoryUsage',
'scrapy.extensions.logstats.LogStats']
2017-10-21 23:44:33 [csdn] INFO: Reading start URLs from redis key 'csdn:start_urls' (batch size: 16, encoding: utf-8
2017-10-21 23:44:33 [scrapy.middleware] INFO: Enabled downloader middlewares:
['scrapy.downloadermiddlewares.httpauth.HttpAuthMiddleware',
'scrapy.downloadermiddlewares.downloadtimeout.DownloadTimeoutMiddleware',
'scrapy.downloadermiddlewares.defaultheaders.DefaultHeadersMiddleware',
'scrapy.downloadermiddlewares.useragent.UserAgentMiddleware',
'Blog.middlewares.RandomUserAgentMiddleware',
'scrapy.downloadermiddlewares.retry.RetryMiddleware',
'scrapy.downloadermiddlewares.redirect.MetaRefreshMiddleware',
'scrapy.downloadermiddlewares.httpcompression.HttpCompressionMiddleware',
'scrapy.downloadermiddlewares.redirect.RedirectMiddleware',
'scrapy.downloadermiddlewares.cookies.CookiesMiddleware',
'scrapy.downloadermiddlewares.httpproxy.HttpProxyMiddleware',
'scrapy.downloadermiddlewares.stats.DownloaderStats']
2017-10-21 23:44:33 [scrapy.middleware] INFO: Enabled spider middlewares:
['scrapy.spidermiddlewares.httperror.HttpErrorMiddleware',
'scrapy.spidermiddlewares.offsite.OffsiteMiddleware',
'scrapy.spidermiddlewares.referer.RefererMiddleware',
'scrapy.spidermiddlewares.urllength.UrlLengthMiddleware',
'scrapy.spidermiddlewares.depth.DepthMiddleware']
2017-10-21 23:44:33 [twisted] CRITICAL: Unhandled error in Deferred:
2017-10-21 23:44:33 [twisted] CRITICAL:
Traceback (most recent call last):
File "/usr/local/lib/python3.6/site-packages/Twisted-17.9.1.dev0-py3.6-linux-x86_64.egg/twisted/internet/defer.py", line 1386, in _inlineCallbacks
result = g.send(result)
File "/usr/local/lib/python3.6/site-packages/scrapy/crawler.py", line 77, in crawl
self.engine = self._create_engine()
File "/usr/local/lib/python3.6/site-packages/scrapy/crawler.py", line 102, in _create_engine
return ExecutionEngine(self, lambda _: self.stop())
File "/usr/local/lib/python3.6/site-packages/scrapy/core/engine.py", line 70, in __init__
self.scraper = Scraper(crawler)
File "/usr/local/lib/python3.6/site-packages/scrapy/core/scraper.py", line 71, in __init__
self.itemproc = itemproc_cls.from_crawler(crawler)
File "/usr/local/lib/python3.6/site-packages/scrapy/middleware.py", line 58, in from_crawler
return cls.from_settings(crawler.settings, crawler)
File "/usr/local/lib/python3.6/site-packages/scrapy/middleware.py", line 36, in from_settings
mw = mwcls.from_crawler(crawler)
File "/usr/local/lib/python3.6/site-packages/scrapy/pipelines/media.py", line 68, in from_crawler
pipe = cls.from_settings(crawler.settings)
File "/usr/local/lib/python3.6/site-packages/scrapy/pipelines/images.py", line 95, in from_settings
return cls(store_uri, settings=settings)
File "/usr/local/lib/python3.6/site-packages/scrapy/pipelines/images.py", line 52, in __init__
download_func=download_func)
File "/usr/local/lib/python3.6/site-packages/scrapy/pipelines/files.py", line 234, in __init__
self.store = self._get_store(store_uri)
File "/usr/local/lib/python3.6/site-packages/scrapy/pipelines/files.py", line 270, in _get_store
return store_cls(uri)
File "/usr/local/lib/python3.6/site-packages/scrapy/pipelines/files.py", line 48, in __init__
self._mkdir(self.basedir)
File "/usr/local/lib/python3.6/site-packages/scrapy/pipelines/files.py", line 77, in _mkdir
os.makedirs(dirname)
File "/usr/local/lib/python3.6/os.py", line 210, in makedirs
makedirs(head, mode, exist_ok)
File "/usr/local/lib/python3.6/os.py", line 220, in makedirs
mkdir(name, mode)
NotADirectoryError: [Errno 20] Not a directory: '/tmp/Blog-1508600650-duu8m0jk.egg/Blog'其中最后一行的Blog就是我项目的名字 (我代码里面没有涉及该文件夹的内容)
scrapyd的目录在/package/scrapyd
下面是scrapyd运行日志
0-22T15:30:38+0800 [-] Loading /usr/local/lib/python3.6/site-packages/scrapyd/txapp.py... 2017-10-22T15:30:40+0800 [-] Scrapyd web console available at http://127.0.0.1:6800/ 2017-10-22T15:30:40+0800 [-] Loaded. 2017-10-22T15:30:40+0800 [twisted.scripts._twistd_unix.UnixAppLogger#info] twistd 17.9.1dev0 (/usr/local/bin/python3.6 3.6.2) starting up. 2017-10-22T15:30:40+0800 [twisted.scripts._twistd_unix.UnixAppLogger#info] reactor class: twisted.internet.epollreactor.EPollReactor. 2017-10-22T15:30:40+0800 [-] Site starting on 6800 2017-10-22T15:30:40+0800 [twisted.web.server.Site#info] Starting factory <twisted.web.server.Site object at 0x7fbb7cb72898> 2017-10-22T15:30:40+0800 [Launcher] Scrapyd 1.2.0 started: max_proc=4, runner='scrapyd.runner' 2017-10-22T15:37:09+0800 [twisted.python.log#info] "127.0.0.1" - - [22/Oct/2017:07:37:04 +0000] "POST /addversion.json HTTP/1.1" 200 108 "-" "Python-urllib/3.6" 2017-10-22T15:37:25+0800 [twisted.python.log#info] "127.0.0.1" - - [22/Oct/2017:07:37:21 +0000] "POST /schedule.json HTTP/1.1" 200 95 "-" "curl/7.47.0" 2017-10-22T15:37:25+0800 [-] Process started: project='Blog' spider='csdn' job='d8f46e28b6fb11e7ba4d525400db770c' pid=32734 log='logs/Blog/csdn/d8f46e28b6fb11e7ba4d525400db770c.log' items=None 2017-10-22T15:37:29+0800 [Launcher,32734/stderr] Unhandled error in Deferred: 2017-10-22T15:37:29+0800 [-] Process finished: project='Blog' spider='csdn' job='d8f46e28b6fb11e7ba4d525400db770c' pid=32734 log='logs/Blog/csdn/d8f46e28b6fb11e7ba4d525400db770c.log' items=None ^C2017-10-22T16:00:59+0800 [-] Received SIGINT, shutting down. 2017-10-22T16:00:59+0800 [-] (TCP Port 6800 Closed) 2017-10-22T16:00:59+0800 [twisted.web.server.Site#info] Stopping factory <twisted.web.server.Site object at 0x7fbb7cb72898> 2017-10-22T16:00:59+0800 [-] Main loop terminated. 2017-10-22T16:00:59+0800 [twisted.scripts._twistd_unix.UnixAppLogger#info] Server Shut Down.
写回答
2回答
-
你给我发个qq消息 我看看呢
142018-07-16 -
慕粉3691223
2018-07-11
应该是使用scrapyd部署上去的时候,BASE_DIR会被其覆盖,导致我们很多的log都找错了地方,显示错误,,不要使用BASE_DIR在系统中拼装各种路径, 我们要新建一个 LOG_DIR代替BASE_DIR来拼装各路径
可以参考下:
https://stackoverflow.com/questions/50453479/valueerror-while-deploying-scrapy
232019-02-11
相似问题