scrapyd-deloy打包的时候出现错误

来源:16-1 scrapyd部署scrapy项目

慕函数7358036

2020-04-20

scrapyd-deloy打包的时候出现错误,返回的错误信息如下

{“node_name”: “localhost.localdomain”, “status”: “error”, “message”: “/root/PycharmProjects/Graduation_design/venv/lib/python3.7/site-packages/scrapy/utils/project.py:94: ScrapyDeprecationWarning: Use of environment variables prefixed with SCRAPY_ to override settings is deprecated. The following environment variables are currently defined: EGG_VERSION\n ScrapyDeprecationWarning\nTraceback (most recent call last):\n File “/usr/local/python3/lib/python3.7/runpy.py”, line 193, in _run_module_as_main\n “main”, mod_spec)\n File “/usr/local/python3/lib/python3.7/runpy.py”, line 85, in _run_code\n exec(code, run_globals)\n File “/root/PycharmProjects/Graduation_design/venv/lib/python3.7/site-packages/scrapyd/runner.py”, line 40, in \n main()\n File “/root/PycharmProjects/Graduation_design/venv/lib/python3.7/site-packages/scrapyd/runner.py”, line 37, in main\n execute()\n File “/root/PycharmProjects/Graduation_design/venv/lib/python3.7/site-packages/scrapy/cmdline.py”, line 144, in execute\n cmd.crawler_process = CrawlerProcess(settings)\n File “/root/PycharmProjects/Graduation_design/venv/lib/python3.7/site-packages/scrapy/crawler.py”, line 265, in init\n super(CrawlerProcess, self).init(settings)\n File “/root/PycharmProjects/Graduation_design/venv/lib/python3.7/site-packages/scrapy/crawler.py”, line 137, in init\n self.spider_loader = _get_spider_loader(settings)\n File “/root/PycharmProjects/Graduation_design/venv/lib/python3.7/site-packages/scrapy/crawler.py”, line 345, in _get_spider_loader\n return loader_cls.from_settings(settings.frozencopy())\n File “/root/PycharmProjects/Graduation_design/venv/lib/python3.7/site-packages/scrapy/spiderloader.py”, line 60, in from_settings\n return cls(settings)\n File “/root/PycharmProjects/Graduation_design/venv/lib/python3.7/site-packages/scrapy/spiderloader.py”, line 24, in init\n self._load_all_spiders()\n File “/root/PycharmProjects/Graduation_design/venv/lib/python3.7/site-packages/scrapy/spiderloader.py”, line 46, in _load_all_spiders\n for module in walk_modules(name):\n File “/root/PycharmProjects/Graduation_design/venv/lib/python3.7/site-packages/scrapy/utils/misc.py”, line 77, in walk_modules\n submod = import_module(fullpath)\n File “/usr/local/python3/lib/python3.7/importlib/init.py”, line 127, in import_module\n return _bootstrap._gcd_import(name[level:], package, level)\n File “

但是我根本不知道这缺的是个什么模块,而且在settings中我也导入环境变量了,scrapy list也没有问题

目录结构如下
图片描述

settings配置如下

-- coding: utf-8 --

Scrapy settings for master_spider project

For simplicity, this file contains only settings considered important or

commonly used. You can find more settings consulting the documentation:

import sys
import os

BASE_DIR = os.path.dirname(os.path.dirname(os.path.abspath(file)))
sys.path.insert(0, os.path.join(BASE_DIR, ‘master_spider’))

BOT_NAME = 'master_spider’
LOG_LEVEL = “INFO”

SPIDER_MODULES = [‘master_spider.spiders’]
NEWSPIDER_MODULE = ‘master_spider.spiders’

SCHEDULER = "scrapy_redis.scheduler.Scheduler"
DUPEFILTER_CLASS = “scrapy_redis.dupefilter.RFPDupeFilter”

SPLASH_URL = ‘http://192.168.126.128:8050

HTTPCACHE_ENABLED = True
HTTPCACHE_STORAGE = ‘scrapy_splash.SplashAwareFSCacheStorage’

Crawl responsibly by identifying yourself (and your website) on the user-agent

USER_AGENT = ‘master_spider (+http://www.yourdomain.com)’

Obey robots.txt rules

ROBOTSTXT_OBEY = False

Configure maximum concurrent requests performed by Scrapy (default: 16)

CONCURRENT_REQUESTS = 32

Configure a delay for requests for the same website (default: 0)

See also autothrottle settings and docs

DOWNLOAD_DELAY = 3

The download delay setting will honor only one of:

CONCURRENT_REQUESTS_PER_DOMAIN = 16

CONCURRENT_REQUESTS_PER_IP = 16

Disable cookies (enabled by default)

COOKIES_ENABLED = False

Disable Telnet Console (enabled by default)

TELNETCONSOLE_ENABLED = False

Override the default request headers:

DEFAULT_REQUEST_HEADERS = {

‘Accept’: ‘text/html,application/xhtml+xml,application/xml;q=0.9,/;q=0.8’,

‘Accept-Language’: ‘en’,

}

Enable or disable spider middlewares

SPIDER_MIDDLEWARES = {
‘scrapy_splash.SplashDeduplicateArgsMiddleware’: 100,
# ‘master_spider.middlewares.MasterSpiderSpiderMiddleware’: 543,
}

Enable or disable downloader middlewares

DOWNLOADER_MIDDLEWARES = {
# ‘master_spider.middlewares.MasterSpiderDownloaderMiddleware’: 543,
‘scrapy_splash.SplashCookiesMiddleware’: 723,
‘scrapy_splash.SplashMiddleware’: 725,
‘scrapy.downloadermiddlewares.httpcompression.HttpCompressionMiddleware’: 810,
‘master_spider.middlewares.RandomUserAgentMiddleware’: 410,
‘master_spider.middlewares.RandomDelayMiddleware’: 999,
}

Enable or disable extensions

EXTENSIONS = {

‘scrapy.extensions.telnet.TelnetConsole’: None,

}

Configure item pipelines

ITEM_PIPELINES = {
# ‘master_spider.pipelines.MasterSpiderPipeline’: 300,
# ‘scrapy_redis.pipelines.RedisPipeline’: 300,
‘master_spider.pipelines.SaveToES’: 310
}

Enable and configure the AutoThrottle extension (disabled by default)

AUTOTHROTTLE_ENABLED = True

The initial download delay

AUTOTHROTTLE_START_DELAY = 5

The maximum download delay to be set in case of high latencies

AUTOTHROTTLE_MAX_DELAY = 60

The average number of requests Scrapy should be sending in parallel to

each remote server

AUTOTHROTTLE_TARGET_CONCURRENCY = 1.0

Enable showing throttling stats for every response received:

AUTOTHROTTLE_DEBUG = False

Enable and configure HTTP caching (disabled by default)

HTTPCACHE_ENABLED = True

HTTPCACHE_EXPIRATION_SECS = 0

HTTPCACHE_DIR = ‘httpcache’

HTTPCACHE_IGNORE_HTTP_CODES = []

HTTPCACHE_STORAGE = ‘scrapy.extensions.httpcache.FilesystemCacheStorage’

REDIS_HOST = 'localhost’
REDIS_PORT = 6379

DATETIME_FORMAT = “%Y-%m-%d %H:%M:%S”

RANDOM_DELAY = 2

应该也不是spider文件夹中存在好几个爬虫文件的问题,因为删了也没用

写回答

1回答

bobby

2020-04-21

代码能否格式化以下 这样看起来很乱 看不到关键信息

0
2
bobby
回复
慕函数7358036
好的,
2020-04-22
共2条回复

Scrapy打造搜索引擎 畅销4年的Python分布式爬虫课

带你彻底掌握Scrapy,用Django+Elasticsearch搭建搜索引擎

5796 学习 · 6290 问题

查看课程