redis问题
来源:6-5 索引优化策略(下)

h5
2021-05-27
redis-scrapy 分布式爬虫之缓存问题
老师您好。我在做爬虫的时候发现redis-scrapy两个问题。
- 在爬虫执行过程中中间结束掉服务后再重新启动。发现重缓存队列中拿到的request没有callback函数。导致在回调过程中出错调试发现回调的函数并不是我指定的回调函数。
- 缓存队列问题。当在执行爬虫过程中。执行没多久后redis就开始拒绝连接。重新启动后又可以短时间内连接到redis。这两个问题网上找了好久答案也没找到解决方案。非常愁人。
因为老师您的视频比较早我看的时候已经找不到对应的版本了。
redis-scrapy
以下是配置和代码信息
- settings文件
# Scrapy settings for document_spider project
#
# For simplicity, this file contains only settings considered important or
# commonly used. You can find more settings consulting the documentation:
#
# https://docs.scrapy.org/en/latest/topics/settings.html
# https://docs.scrapy.org/en/latest/topics/downloader-middleware.html
# https://docs.scrapy.org/en/latest/topics/spider-middleware.html
import datetime
BOT_NAME = 'document_spider'
SPIDER_MODULES = ['document_spider.spiders']
NEWSPIDER_MODULE = 'document_spider.spiders'
SCHEDULER = "scrapy_redis.scheduler.Scheduler"
DUPEFILTER_CLASS = "scrapy_redis.dupefilter.RFPDupeFilter"
SCHEDULER_QUEUE_CLASS = 'scrapy_redis.queue.PriorityQueue'
STATS_CLASS = "scrapy_redis.stats.RedisStatsCollector"
# 队列持久化
SCHEDULER_PERSIST = True
# Crawl responsibly by identifying yourself (and your website) on the user-agent
#USER_AGENT = 'document_spider (+http://www.yourdomain.com)'
# Obey robots.txt rules
ROBOTSTXT_OBEY = False
# Configure maximum concurrent requests performed by Scrapy (default: 16)
# CONCURRENT_REQUESTS = 50
# CONCURRENT_REQUESTS_PER_DOMAIN = 25
# Configure a delay for requests for the same website (default: 0)
# See https://docs.scrapy.org/en/latest/topics/settings.html#download-delay
# See also autothrottle settings and docs
# 设置请求间隔时间
DOWNLOAD_DELAY = 3
# The download delay setting will honor only one of:
#CONCURRENT_REQUESTS_PER_DOMAIN = 16
#CONCURRENT_REQUESTS_PER_IP = 16
# Disable cookies (enabled by default)
#COOKIES_ENABLED = False
# Disable Telnet Console (enabled by default)
#TELNETCONSOLE_ENABLED = False
# Override the default request headers:
# DEFAULT_REQUEST_HEADERS = {
# 'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.9',
# 'Accept-Language': 'en',
# 'Host': 'www.doc88.com',
# 'Accept-Encoding': 'gzip, deflate, br',
# 'Accept-Language': 'zh,zh-TW;q=0.9,en-US;q=0.8,en;q=0.7,zh-CN;q=0.6,eo;q=0.5',
# 'Sec-Fetch-Dest': 'document',
# 'Sec-Fetch-Mode': 'navigate',
# 'Sec-Fetch-Site': 'none',
# 'Sec-Fetch-User': '?1',
# 'Upgrade-Insecure-Request': '1',
# }
# Enable or disable spider middlewares
# See https://docs.scrapy.org/en/latest/topics/spider-middleware.html
#SPIDER_MIDDLEWARES = {
# 'document_spider.middlewares.DocumentSpiderSpiderMiddleware': 543,
#}
# Enable or disable downloader middlewares
# See https://docs.scrapy.org/en/latest/topics/downloader-middleware.html
# 设置随机的 User-Agent
RANDOM_UA_TYPE = 'random'
DOWNLOADER_MIDDLEWARES = {
'document_spider.middlewares.RandomProxyMiddleware': 200,
'document_spider.middlewares.RandomUserAgentMiddleware': 543,
# 'document_spider.middlewares.RedisRetryMiddleware': 545,
'document_spider.middlewares.DocumentSpiderDownloaderMiddleware': None,
'scrapy.downloadermiddlewares.useragent.UserAgentMiddleware': None,
# 'scrapy.downloadermiddlewares.retry.RetryMiddleware': None,
}
# Enable or disable extensions
# See https://docs.scrapy.org/en/latest/topics/extensions.html
#EXTENSIONS = {
# 'scrapy.extensions.telnet.TelnetConsole': None,
#}
# Configure item pipelines
# See https://docs.scrapy.org/en/latest/topics/item-pipeline.html
ITEM_PIPELINES = {
# 'document_spider.pipelines.DocumentSpiderPipeline': 300,
# 'document_spider.pipelines.MysqlPipeline': 300,
# 'scrapy_redis.pipelines.RedisPipeline': 300,
'document_spider.pipelines.MysqlTwistedPipeline': 300,
}
RETRY_TIMES = 5
RETRY_HTTP_CODES = [404, 400, 301]
# Enable and configure the AutoThrottle extension (disabled by default)
# See https://docs.scrapy.org/en/latest/topics/autothrottle.html
AUTOTHROTTLE_ENABLED = True
# The initial download delay
#AUTOTHROTTLE_START_DELAY = 5
# The maximum download delay to be set in case of high latencies
# AUTOTHROTTLE_MAX_DELAY = 60
# The average number of requests Scrapy should be sending in parallel to
# each remote server
#AUTOTHROTTLE_TARGET_CONCURRENCY = 1.0
# Enable showing throttling stats for every response received:
AUTOTHROTTLE_DEBUG = False
# Enable and configure HTTP caching (disabled by default)
# See https://docs.scrapy.org/en/latest/topics/downloader-middleware.html#httpcache-middleware-settings
#HTTPCACHE_ENABLED = True
#HTTPCACHE_EXPIRATION_SECS = 0
#HTTPCACHE_DIR = 'httpcache'
#HTTPCACHE_IGNORE_HTTP_CODES = []
#HTTPCACHE_STORAGE = 'scrapy.extensions.httpcache.FilesystemCacheStorage'
# Mysql
MYSQL_HOST = '47.95.13.216'
MYSQL_PORT = 33066
MYSQL_DBNAME = 'doc_spider'
MYSQL_USER = 'maming'
MYSQL_PASSWORD = 'FTp1X#ZxFePLJ^b2'
# REDIS_HOST = '8.140.182.77'
# REDIS_PORT = 6378
REDIS_HOST = 'localhost'
REDIS_PORT = 6379
# LOG_LEVEL = 'INFO'
# to_day = datetime.datetime.today()
# log_file_path = 'logs/scrapy_{}_{}_{}.log'.format(to_day.year, to_day.month, to_day.day)
# LOG_FILE = log_file_path
- BloomFilter文件
import mmh3
import redis
import math
import time
import settings
class BloomFilter():
#内置100个随机种子
SEEDS = [543, 460, 171, 876, 796, 607, 650, 81, 837, 545, 591, 946, 846, 521, 913, 636, 878, 735, 414, 372,
344, 324, 223, 180, 327, 891, 798, 933, 493, 293, 836, 10, 6, 544, 924, 849, 438, 41, 862, 648, 338,
465, 562, 693, 979, 52, 763, 103, 387, 374, 349, 94, 384, 680, 574, 480, 307, 580, 71, 535, 300, 53,
481, 519, 644, 219, 686, 236, 424, 326, 244, 212, 909, 202, 951, 56, 812, 901, 926, 250, 507, 739, 371,
63, 584, 154, 7, 284, 617, 332, 472, 140, 605, 262, 355, 526, 647, 923, 199, 518]
#capacity是预先估计要去重的数量
#error_rate表示错误率
#conn表示redis的连接客户端
#key表示在redis中的键的名字前缀
def __init__(self, capacity=1000000000, error_rate=0.00000001, conn=None, key='BloomFilter'):
self.m = math.ceil(capacity*math.log2(math.e)*math.log2(1/error_rate)) #需要的总bit位数
self.k = math.ceil(math.log1p(2)*self.m/capacity) #需要最少的hash次数
self.mem = math.ceil(self.m/8/1024/1024) #需要的多少M内存
self.blocknum = math.ceil(self.mem/512) #需要多少个512M的内存块,value的第一个字符必须是ascii码所有最多有256个内存块
self.seeds = self.SEEDS[0:self.k]
self.key = key
self.N = 2**31-1
self.redis = conn
def add(self, value):
name = self.key + "_" + str(ord(value[0])%self.blocknum)
hashs = self.get_hashs(value)
for hash in hashs:
self.redis.setbit(name, hash, 1)
def is_exist(self, value):
name = self.key + "_" + str(ord(value[0])%self.blocknum)
hashs = self.get_hashs(value)
exist = True
for hash in hashs:
exist = exist & self.redis.getbit(name, hash)
return exist
def get_hashs(self, value):
hashs = list()
for seed in self.seeds:
hash = mmh3.hash(value, seed)
if hash >= 0:
hashs.append(hash)
else:
hashs.append(self.N - hash)
return hashs
pool = redis.ConnectionPool(host=settings.REDIS_HOST,
port=settings.REDIS_PORT,)
conn = redis.StrictRedis(connection_pool=pool)
start = time.time()
bf = BloomFilter(conn=conn)
bf.add('www.jobbole.com')
bf.add('www.zhihu.com')
print(bf.is_exist('www.zhihu.com'))
print(bf.is_exist('www.lagou.com'))
- ProxyPool
import math
import time
import uuid
import redis
import requests
import settings
class Proxy:
PROXY_LOCK_NAME = 'lock_ips:set'
def __init__(self):
self.redis = redis.Redis(host=settings.REDIS_HOST,
port=settings.REDIS_PORT,)
def get_lock(self, PROXY_LOCK_NAME=None):
identifier = str(uuid.uuid4())
lock_timeout = int(math.ceil(3))
acquire_timeout = 3
end = time.time() + acquire_timeout
while time.time() < end:
# 如果不存在这个锁则加锁并设置过期时间避免死锁
if self.redis.set(PROXY_LOCK_NAME, identifier, ex=lock_timeout, nx=True):
return identifier
return False
def get_proxy_ip(self):
if not self.redis.exists('ips:set'):
self.fetch_proxy_ip()
time.sleep(2)
proxy = self.redis.srandmember('ips:set', 1)[0]
proxy = str(proxy, encoding='utf-8')
return proxy
def set_ip_store(self, proxies):
flag = self.get_lock(PROXY_LOCK_NAME=Proxy.PROXY_LOCK_NAME)
if flag is False:
pass
for proxy in proxies:
self.redis.sadd('ips:set', proxy)
self.redis.expire('ips:set', 55)
self.redis.delete(Proxy.PROXY_LOCK_NAME)
def fetch_proxy_ip(self):
proxies = list()
resp = 获取代理IP。数量是800个
self.set_ip_store(proxies)
连接redis完整错误信息
Connected to pydev debugger (build 211.7142.13)
/Users/martin/space/python/document_spider/document_spider/items.py:11: ScrapyDeprecationWarning: scrapy.loader.processors.TakeFirst is deprecated, instantiate itemloaders.processors.TakeFirst instead.
default_output_processor = TakeFirst()
/Users/martin/space/python/document_spider/document_spider/items.py:35: ScrapyDeprecationWarning: scrapy.loader.processors.MapCompose is deprecated, instantiate itemloaders.processors.MapCompose instead.
input_processor=MapCompose(data_strip)
2021-05-25 20:04:30 [scrapy.utils.log] INFO: Scrapy 2.5.0 started (bot: document_spider)
2021-05-25 20:04:30 [scrapy.utils.log] INFO: Versions: lxml 4.6.3.0, libxml2 2.9.10, cssselect 1.1.0, parsel 1.6.0, w3lib 1.22.0, Twisted 21.2.0, Python 3.9.2 (v3.9.2:1a79785e3e, Feb 19 2021, 09:06:10) - [Clang 6.0 (clang-600.0.57)], pyOpenSSL 20.0.1 (OpenSSL 1.1.1k 25 Mar 2021), cryptography 3.4.7, Platform macOS-10.16-x86_64-i386-64bit
2021-05-25 20:04:30 [scrapy.utils.log] DEBUG: Using reactor: twisted.internet.selectreactor.SelectReactor
2021-05-25 20:04:30 [scrapy.crawler] INFO: Overridden settings:
{'AUTOTHROTTLE_ENABLED': True,
'BOT_NAME': 'document_spider',
'DOWNLOAD_DELAY': 3,
'DUPEFILTER_CLASS': 'scrapy_redis.dupefilter.RFPDupeFilter',
'NEWSPIDER_MODULE': 'document_spider.spiders',
'RETRY_HTTP_CODES': [404, 400, 301],
'RETRY_TIMES': 5,
'SCHEDULER': 'scrapy_redis.scheduler.Scheduler',
'SPIDER_MODULES': ['document_spider.spiders'],
'STATS_CLASS': 'scrapy_redis.stats.RedisStatsCollector'}
2021-05-25 20:04:30 [scrapy.extensions.telnet] INFO: Telnet Password: 8ab6397159859bf4
2021-05-25 20:04:30 [scrapy.middleware] INFO: Enabled extensions:
['scrapy.extensions.corestats.CoreStats',
'scrapy.extensions.telnet.TelnetConsole',
'scrapy.extensions.memusage.MemoryUsage',
'scrapy.extensions.logstats.LogStats',
'scrapy.extensions.throttle.AutoThrottle']
2021-05-25 20:04:30 [doc88] INFO: Reading start URLs from redis key 'doc88:start_urls' (batch size: 16, encoding: utf-8
2021-05-25 20:04:30 [scrapy.middleware] INFO: Enabled downloader middlewares:
['document_spider.middlewares.RandomProxyMiddleware',
'scrapy.downloadermiddlewares.httpauth.HttpAuthMiddleware',
'scrapy.downloadermiddlewares.downloadtimeout.DownloadTimeoutMiddleware',
'scrapy.downloadermiddlewares.defaultheaders.DefaultHeadersMiddleware',
'document_spider.middlewares.RandomUserAgentMiddleware',
'scrapy.downloadermiddlewares.retry.RetryMiddleware',
'scrapy.downloadermiddlewares.redirect.MetaRefreshMiddleware',
'scrapy.downloadermiddlewares.httpcompression.HttpCompressionMiddleware',
'scrapy.downloadermiddlewares.redirect.RedirectMiddleware',
'scrapy.downloadermiddlewares.cookies.CookiesMiddleware',
'scrapy.downloadermiddlewares.httpproxy.HttpProxyMiddleware',
'scrapy.downloadermiddlewares.stats.DownloaderStats']
2021-05-25 20:04:30 [scrapy.middleware] INFO: Enabled spider middlewares:
['scrapy.spidermiddlewares.httperror.HttpErrorMiddleware',
'scrapy.spidermiddlewares.offsite.OffsiteMiddleware',
'scrapy.spidermiddlewares.referer.RefererMiddleware',
'scrapy.spidermiddlewares.urllength.UrlLengthMiddleware',
'scrapy.spidermiddlewares.depth.DepthMiddleware']
2021-05-25 20:04:30 [scrapy.middleware] INFO: Enabled item pipelines:
['document_spider.pipelines.MysqlTwistedPipeline']
2021-05-25 20:04:30 [scrapy.core.engine] INFO: Spider opened
1
0
2021-05-25 20:04:31 [scrapy.extensions.logstats] INFO: Crawled 0 pages (at 0 pages/min), scraped 0 items (at 0 items/min)
2021-05-25 20:04:31 [scrapy.extensions.telnet] INFO: Telnet console listening on 127.0.0.1:6023
2021-05-25 20:04:31 [py.warnings] WARNING: /Users/martin/envs/python/crawler/lib/python3.9/site-packages/scrapy/spiders/__init__.py:81: UserWarning: Spider.make_requests_from_url method is deprecated: it will be removed and not be called by the default Spider.start_requests method in future Scrapy releases. Please override Spider.start_requests method instead.
warnings.warn(
2021-05-25 20:04:31 [urllib3.connectionpool] DEBUG: Starting new HTTP connection (1): 127.0.0.1:8889
2021-05-25 20:04:31 [urllib3.connectionpool] DEBUG: http://127.0.0.1:8889 "GET http://tunnel-api.apeyun.com/h?id=2010210006785134619&secret=tL0LJKX8c8nMUmlu&limit=1000&format=json&auth_mode=basic HTTP/1.1" 200 None
2021-05-25 20:04:35 [doc88] DEBUG: Read 1 requests from 'doc88:start_urls'
2021-05-25 20:04:35 [scrapy.core.engine] DEBUG: Crawled (200) <GET http://www.doc88.com/list.html> (referer: None)
2021-05-25 20:04:42 [scrapy.core.engine] DEBUG: Crawled (200) <GET http://www.doc88.com/list-593-1.html> (referer: http://www.doc88.com/list.html)
2021-05-25 20:04:44 [scrapy.core.engine] DEBUG: Crawled (200) <GET http://www.doc88.com/list-442-1.html> (referer: http://www.doc88.com/list.html)
2021-05-25 20:04:49 [scrapy.core.engine] DEBUG: Crawled (200) <GET http://www.doc88.com/list-441-1.html> (referer: http://www.doc88.com/list.html)
2021-05-25 20:04:52 [scrapy.core.engine] DEBUG: Crawled (200) <GET http://www.doc88.com/list-440-1.html> (referer: http://www.doc88.com/list.html)
2021-05-25 20:04:57 [scrapy.core.engine] DEBUG: Crawled (200) <GET http://www.doc88.com/list-443-1.html> (referer: http://www.doc88.com/list.html)
2021-05-25 20:05:00 [scrapy.core.engine] DEBUG: Crawled (200) <GET http://www.doc88.com/list-444-1.html> (referer: http://www.doc88.com/list.html)
2021-05-25 20:05:05 [scrapy.core.engine] DEBUG: Crawled (200) <GET http://www.doc88.com/list-445-1.html> (referer: http://www.doc88.com/list.html)
2021-05-25 20:05:09 [scrapy.core.engine] DEBUG: Crawled (200) <GET http://www.doc88.com/list-446-1.html> (referer: http://www.doc88.com/list.html)
2021-05-25 20:05:13 [scrapy.core.engine] DEBUG: Crawled (200) <GET http://www.doc88.com/list-447-1.html> (referer: http://www.doc88.com/list.html)
2021-05-25 20:05:17 [scrapy.core.engine] DEBUG: Crawled (200) <GET http://www.doc88.com/list-448-1.html> (referer: http://www.doc88.com/list.html)
2021-05-25 20:05:21 [scrapy.core.engine] DEBUG: Crawled (200) <GET http://www.doc88.com/list-449-1.html> (referer: http://www.doc88.com/list.html)
2021-05-25 20:05:24 [scrapy.core.engine] DEBUG: Crawled (200) <GET http://www.doc88.com/list-574-1.html> (referer: http://www.doc88.com/list.html)
2021-05-25 20:05:27 [scrapy.core.engine] DEBUG: Crawled (200) <GET http://www.doc88.com/list-687-1.html> (referer: http://www.doc88.com/list.html)
2021-05-25 20:05:31 [scrapy.extensions.logstats] INFO: Crawled 14 pages (at 14 pages/min), scraped 0 items (at 0 items/min)
2021-05-25 20:05:35 [scrapy.core.engine] DEBUG: Crawled (200) <GET http://www.doc88.com/list-702-1.html> (referer: http://www.doc88.com/list.html)
2021-05-25 20:05:35 [urllib3.connectionpool] DEBUG: Starting new HTTP connection (1): 127.0.0.1:8889
2021-05-25 20:05:36 [urllib3.connectionpool] DEBUG: http://127.0.0.1:8889 "GET http://tunnel-api.apeyun.com/h?id=2010210006785134619&secret=tL0LJKX8c8nMUmlu&limit=1000&format=json&auth_mode=basic HTTP/1.1" 200 None
2021-05-25 20:05:40 [scrapy.core.engine] DEBUG: Crawled (200) <GET http://www.doc88.com/list-730-1.html> (referer: http://www.doc88.com/list.html)
2021-05-25 20:05:43 [scrapy.core.engine] DEBUG: Crawled (200) <GET http://www.doc88.com/list-450-1.html> (referer: http://www.doc88.com/list.html)
2021-05-25 20:05:47 [scrapy.core.engine] DEBUG: Crawled (200) <GET http://www.doc88.com/list-594-1.html> (referer: http://www.doc88.com/list-593-1.html)
2021-05-25 20:05:48 [doc88] INFO: 行业列表页面上一个请求: http://www.doc88.com/list-594-1.html
2021-05-25 20:05:48 [doc88] INFO: 解析的request url: /p-07016065979874.html
2021-05-25 20:05:48 [doc88] INFO: 行业列表页面上一个请求: http://www.doc88.com/list-594-1.html
2021-05-25 20:05:48 [doc88] INFO: 解析的request url: /p-05629292755318.html
2021-05-25 20:05:48 [doc88] INFO: 行业列表页面上一个请求: http://www.doc88.com/list-594-1.html
2021-05-25 20:05:48 [doc88] INFO: 解析的request url: /p-38773171411803.html
Unhandled error in Deferred:
Temporarily disabling observer LegacyLogObserverWrapper(<bound method PythonLoggingObserver.emit of <twisted.python.log.PythonLoggingObserver object at 0x7fc067baee80>>) due to exception: [Failure instance: Traceback: <class 'redis.exceptions.ConnectionError'>: Connection closed by server.
/Users/martin/envs/python/crawler/lib/python3.9/site-packages/twisted/internet/defer.py:580:_startRunCallbacks
/Users/martin/envs/python/crawler/lib/python3.9/site-packages/twisted/internet/defer.py:989:__del__
/Users/martin/envs/python/crawler/lib/python3.9/site-packages/twisted/logger/_logger.py:266:critical
/Users/martin/envs/python/crawler/lib/python3.9/site-packages/twisted/logger/_logger.py:143:emit
--- <exception caught here> ---
/Users/martin/envs/python/crawler/lib/python3.9/site-packages/twisted/logger/_observer.py:82:__call__
/Users/martin/envs/python/crawler/lib/python3.9/site-packages/twisted/logger/_legacy.py:90:__call__
/Users/martin/envs/python/crawler/lib/python3.9/site-packages/twisted/python/log.py:584:emit
/Users/martin/envs/python/crawler/lib/python3.9/site-packages/twisted/logger/_legacy.py:147:publishToNewObserver
/Users/martin/envs/python/crawler/lib/python3.9/site-packages/twisted/logger/_stdlib.py:114:__call__
/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/logging/__init__.py:1508:log
/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/logging/__init__.py:1585:_log
/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/logging/__init__.py:1595:handle
/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/logging/__init__.py:1657:callHandlers
/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/logging/__init__.py:948:handle
/Users/martin/envs/python/crawler/lib/python3.9/site-packages/scrapy/utils/log.py:194:emit
/Users/martin/space/python/document_spider/scrapy_redis/stats.py:60:inc_value
/Users/martin/envs/python/crawler/lib/python3.9/site-packages/redis/client.py:3006:hexists
/Users/martin/envs/python/crawler/lib/python3.9/site-packages/redis/client.py:901:execute_command
/Users/martin/envs/python/crawler/lib/python3.9/site-packages/redis/client.py:915:parse_response
/Users/martin/envs/python/crawler/lib/python3.9/site-packages/redis/connection.py:739:read_response
/Users/martin/envs/python/crawler/lib/python3.9/site-packages/redis/connection.py:324:read_response
/Users/martin/envs/python/crawler/lib/python3.9/site-packages/redis/connection.py:256:readline
/Users/martin/envs/python/crawler/lib/python3.9/site-packages/redis/connection.py:201:_read_from_socket
]
Traceback (most recent call last):
File "/Users/martin/envs/python/crawler/lib/python3.9/site-packages/twisted/internet/defer.py", line 580, in _startRunCallbacks
self._runCallbacks()
File "/Users/martin/envs/python/crawler/lib/python3.9/site-packages/twisted/internet/defer.py", line 989, in __del__
log.critical("Unhandled error in Deferred:", isError=True)
File "/Users/martin/envs/python/crawler/lib/python3.9/site-packages/twisted/logger/_logger.py", line 266, in critical
self.emit(LogLevel.critical, format, **kwargs)
File "/Users/martin/envs/python/crawler/lib/python3.9/site-packages/twisted/logger/_logger.py", line 143, in emit
self.observer(event)
--- <exception caught here> ---
File "/Users/martin/envs/python/crawler/lib/python3.9/site-packages/twisted/logger/_observer.py", line 82, in __call__
observer(event)
File "/Users/martin/envs/python/crawler/lib/python3.9/site-packages/twisted/logger/_legacy.py", line 90, in __call__
self.legacyObserver(event)
File "/Users/martin/envs/python/crawler/lib/python3.9/site-packages/twisted/python/log.py", line 584, in emit
_publishNew(self._newObserver, eventDict, textFromEventDict)
File "/Users/martin/envs/python/crawler/lib/python3.9/site-packages/twisted/logger/_legacy.py", line 147, in publishToNewObserver
observer(eventDict)
File "/Users/martin/envs/python/crawler/lib/python3.9/site-packages/twisted/logger/_stdlib.py", line 114, in __call__
self.logger.log(stdlibLevel, StringifiableFromEvent(event), exc_info=excInfo)
File "/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/logging/__init__.py", line 1508, in log
self._log(level, msg, args, **kwargs)
File "/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/logging/__init__.py", line 1585, in _log
self.handle(record)
File "/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/logging/__init__.py", line 1595, in handle
self.callHandlers(record)
File "/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/logging/__init__.py", line 1657, in callHandlers
hdlr.handle(record)
File "/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/logging/__init__.py", line 948, in handle
self.emit(record)
File "/Users/martin/envs/python/crawler/lib/python3.9/site-packages/scrapy/utils/log.py", line 194, in emit
self.crawler.stats.inc_value(sname)
File "/Users/martin/space/python/document_spider/scrapy_redis/stats.py", line 60, in inc_value
if not self.server.hexists(self._get_key(spider), key):
File "/Users/martin/envs/python/crawler/lib/python3.9/site-packages/redis/client.py", line 3006, in hexists
return self.execute_command('HEXISTS', name, key)
File "/Users/martin/envs/python/crawler/lib/python3.9/site-packages/redis/client.py", line 901, in execute_command
return self.parse_response(conn, command_name, **options)
File "/Users/martin/envs/python/crawler/lib/python3.9/site-packages/redis/client.py", line 915, in parse_response
response = connection.read_response()
File "/Users/martin/envs/python/crawler/lib/python3.9/site-packages/redis/connection.py", line 739, in read_response
response = self._parser.read_response()
File "/Users/martin/envs/python/crawler/lib/python3.9/site-packages/redis/connection.py", line 324, in read_response
raw = self._buffer.readline()
File "/Users/martin/envs/python/crawler/lib/python3.9/site-packages/redis/connection.py", line 256, in readline
self._read_from_socket()
File "/Users/martin/envs/python/crawler/lib/python3.9/site-packages/redis/connection.py", line 201, in _read_from_socket
raise ConnectionError(SERVER_CLOSED_CONNECTION_ERROR)
redis.exceptions.ConnectionError: Connection closed by server.
写回答
1回答
-
sqlercn
2021-07-09
这个是不是问错课程了?
00
相似问题