scrapyd-deloy打包的时候出现错误
来源:16-1 scrapyd部署scrapy项目
慕函数7358036
2020-04-20
scrapyd-deloy打包的时候出现错误,返回的错误信息如下
{“node_name”: “localhost.localdomain”, “status”: “error”, “message”: “/root/PycharmProjects/Graduation_design/venv/lib/python3.7/site-packages/scrapy/utils/project.py:94: ScrapyDeprecationWarning: Use of environment variables prefixed with SCRAPY_ to override settings is deprecated. The following environment variables are currently defined: EGG_VERSION\n ScrapyDeprecationWarning\nTraceback (most recent call last):\n File “/usr/local/python3/lib/python3.7/runpy.py”, line 193, in _run_module_as_main\n “main”, mod_spec)\n File “/usr/local/python3/lib/python3.7/runpy.py”, line 85, in _run_code\n exec(code, run_globals)\n File “/root/PycharmProjects/Graduation_design/venv/lib/python3.7/site-packages/scrapyd/runner.py”, line 40, in \n main()\n File “/root/PycharmProjects/Graduation_design/venv/lib/python3.7/site-packages/scrapyd/runner.py”, line 37, in main\n execute()\n File “/root/PycharmProjects/Graduation_design/venv/lib/python3.7/site-packages/scrapy/cmdline.py”, line 144, in execute\n cmd.crawler_process = CrawlerProcess(settings)\n File “/root/PycharmProjects/Graduation_design/venv/lib/python3.7/site-packages/scrapy/crawler.py”, line 265, in init\n super(CrawlerProcess, self).init(settings)\n File “/root/PycharmProjects/Graduation_design/venv/lib/python3.7/site-packages/scrapy/crawler.py”, line 137, in init\n self.spider_loader = _get_spider_loader(settings)\n File “/root/PycharmProjects/Graduation_design/venv/lib/python3.7/site-packages/scrapy/crawler.py”, line 345, in _get_spider_loader\n return loader_cls.from_settings(settings.frozencopy())\n File “/root/PycharmProjects/Graduation_design/venv/lib/python3.7/site-packages/scrapy/spiderloader.py”, line 60, in from_settings\n return cls(settings)\n File “/root/PycharmProjects/Graduation_design/venv/lib/python3.7/site-packages/scrapy/spiderloader.py”, line 24, in init\n self._load_all_spiders()\n File “/root/PycharmProjects/Graduation_design/venv/lib/python3.7/site-packages/scrapy/spiderloader.py”, line 46, in _load_all_spiders\n for module in walk_modules(name):\n File “/root/PycharmProjects/Graduation_design/venv/lib/python3.7/site-packages/scrapy/utils/misc.py”, line 77, in walk_modules\n submod = import_module(fullpath)\n File “/usr/local/python3/lib/python3.7/importlib/init.py”, line 127, in import_module\n return _bootstrap._gcd_import(name[level:], package, level)\n File “
但是我根本不知道这缺的是个什么模块,而且在settings中我也导入环境变量了,scrapy list也没有问题
目录结构如下
settings配置如下
-- coding: utf-8 --
Scrapy settings for master_spider project
For simplicity, this file contains only settings considered important or
commonly used. You can find more settings consulting the documentation:
import sys
import os
BASE_DIR = os.path.dirname(os.path.dirname(os.path.abspath(file)))
sys.path.insert(0, os.path.join(BASE_DIR, ‘master_spider’))
BOT_NAME = 'master_spider’
LOG_LEVEL = “INFO”
SPIDER_MODULES = [‘master_spider.spiders’]
NEWSPIDER_MODULE = ‘master_spider.spiders’
SCHEDULER = "scrapy_redis.scheduler.Scheduler"
DUPEFILTER_CLASS = “scrapy_redis.dupefilter.RFPDupeFilter”
SPLASH_URL = ‘http://192.168.126.128:8050’
HTTPCACHE_ENABLED = True
HTTPCACHE_STORAGE = ‘scrapy_splash.SplashAwareFSCacheStorage’
Crawl responsibly by identifying yourself (and your website) on the user-agent
USER_AGENT = ‘master_spider (+http://www.yourdomain.com)’
Obey robots.txt rules
ROBOTSTXT_OBEY = False
Configure maximum concurrent requests performed by Scrapy (default: 16)
CONCURRENT_REQUESTS = 32
Configure a delay for requests for the same website (default: 0)
See also autothrottle settings and docs
DOWNLOAD_DELAY = 3
The download delay setting will honor only one of:
CONCURRENT_REQUESTS_PER_DOMAIN = 16
CONCURRENT_REQUESTS_PER_IP = 16
Disable cookies (enabled by default)
COOKIES_ENABLED = False
Disable Telnet Console (enabled by default)
TELNETCONSOLE_ENABLED = False
Override the default request headers:
DEFAULT_REQUEST_HEADERS = {
‘Accept’: ‘text/html,application/xhtml+xml,application/xml;q=0.9,/;q=0.8’,
‘Accept-Language’: ‘en’,
}
Enable or disable spider middlewares
SPIDER_MIDDLEWARES = {
‘scrapy_splash.SplashDeduplicateArgsMiddleware’: 100,
# ‘master_spider.middlewares.MasterSpiderSpiderMiddleware’: 543,
}
Enable or disable downloader middlewares
DOWNLOADER_MIDDLEWARES = {
# ‘master_spider.middlewares.MasterSpiderDownloaderMiddleware’: 543,
‘scrapy_splash.SplashCookiesMiddleware’: 723,
‘scrapy_splash.SplashMiddleware’: 725,
‘scrapy.downloadermiddlewares.httpcompression.HttpCompressionMiddleware’: 810,
‘master_spider.middlewares.RandomUserAgentMiddleware’: 410,
‘master_spider.middlewares.RandomDelayMiddleware’: 999,
}
Enable or disable extensions
EXTENSIONS = {
‘scrapy.extensions.telnet.TelnetConsole’: None,
}
Configure item pipelines
ITEM_PIPELINES = {
# ‘master_spider.pipelines.MasterSpiderPipeline’: 300,
# ‘scrapy_redis.pipelines.RedisPipeline’: 300,
‘master_spider.pipelines.SaveToES’: 310
}
Enable and configure the AutoThrottle extension (disabled by default)
AUTOTHROTTLE_ENABLED = True
The initial download delay
AUTOTHROTTLE_START_DELAY = 5
The maximum download delay to be set in case of high latencies
AUTOTHROTTLE_MAX_DELAY = 60
The average number of requests Scrapy should be sending in parallel to
each remote server
AUTOTHROTTLE_TARGET_CONCURRENCY = 1.0
Enable showing throttling stats for every response received:
AUTOTHROTTLE_DEBUG = False
Enable and configure HTTP caching (disabled by default)
See https://docs.scrapy.org/en/latest/topics/downloader-middleware.html#httpcache-middleware-settings
HTTPCACHE_ENABLED = True
HTTPCACHE_EXPIRATION_SECS = 0
HTTPCACHE_DIR = ‘httpcache’
HTTPCACHE_IGNORE_HTTP_CODES = []
HTTPCACHE_STORAGE = ‘scrapy.extensions.httpcache.FilesystemCacheStorage’
REDIS_HOST = 'localhost’
REDIS_PORT = 6379
DATETIME_FORMAT = “%Y-%m-%d %H:%M:%S”
RANDOM_DELAY = 2
应该也不是spider文件夹中存在好几个爬虫文件的问题,因为删了也没用
1回答
-
bobby
2020-04-21
代码能否格式化以下 这样看起来很乱 看不到关键信息
022020-04-22
相似问题