好像没报错,怎么image下还是没下载到

来源:4-16 scrapy配置图片下载

keannen

2020-02-02

G:\Evns\ArticleSpider\venv\Scripts\python.exe G:/Evns/ArticleSpider/main.py
2020-02-02 17:04:24 [scrapy.utils.log] INFO: Scrapy 1.8.0 started (bot: ArticleSpider)
2020-02-02 17:04:24 [scrapy.utils.log] INFO: Versions: lxml 4.4.2.0, libxml2 2.9.5, cssselect 1.1.0, parsel 1.5.2, w3lib 1.21.0, Twisted 19.10.0, Python 3.7.0 (v3.7.0:1bf9cc5093, Jun 27 2018, 04:59:51) [MSC v.1914 64 bit (AMD64)], pyOpenSSL 19.1.0 (OpenSSL 1.1.1d  10 Sep 2019), cryptography 2.8, Platform Windows-10-10.0.17134-SP0
2020-02-02 17:04:24 [scrapy.crawler] INFO: Overridden settings: {'BOT_NAME': 'ArticleSpider', 'NEWSPIDER_MODULE': 'ArticleSpider.spiders', 'SPIDER_MODULES': ['ArticleSpider.spiders']}
2020-02-02 17:04:24 [scrapy.extensions.telnet] INFO: Telnet Password: 7d8596b87dbde55a
2020-02-02 17:04:24 [scrapy.middleware] INFO: Enabled extensions:
['scrapy.extensions.corestats.CoreStats',
 'scrapy.extensions.telnet.TelnetConsole',
 'scrapy.extensions.logstats.LogStats']
2020-02-02 17:04:25 [scrapy.middleware] INFO: Enabled downloader middlewares:
['scrapy.downloadermiddlewares.httpauth.HttpAuthMiddleware',
 'scrapy.downloadermiddlewares.downloadtimeout.DownloadTimeoutMiddleware',
 'scrapy.downloadermiddlewares.defaultheaders.DefaultHeadersMiddleware',
 'scrapy.downloadermiddlewares.useragent.UserAgentMiddleware',
 'scrapy.downloadermiddlewares.retry.RetryMiddleware',
 'scrapy.downloadermiddlewares.redirect.MetaRefreshMiddleware',
 'scrapy.downloadermiddlewares.httpcompression.HttpCompressionMiddleware',
 'scrapy.downloadermiddlewares.redirect.RedirectMiddleware',
 'scrapy.downloadermiddlewares.cookies.CookiesMiddleware',
 'scrapy.downloadermiddlewares.httpproxy.HttpProxyMiddleware',
 'scrapy.downloadermiddlewares.stats.DownloaderStats']
2020-02-02 17:04:25 [scrapy.middleware] INFO: Enabled spider middlewares:
['scrapy.spidermiddlewares.httperror.HttpErrorMiddleware',
 'scrapy.spidermiddlewares.offsite.OffsiteMiddleware',
 'scrapy.spidermiddlewares.referer.RefererMiddleware',
 'scrapy.spidermiddlewares.urllength.UrlLengthMiddleware',
 'scrapy.spidermiddlewares.depth.DepthMiddleware']
2020-02-02 17:04:25 [scrapy.middleware] INFO: Enabled item pipelines:
['scrapy.pipelines.images.ImagesPipeline',
 'ArticleSpider.pipelines.ArticlespiderPipeline']
2020-02-02 17:04:25 [scrapy.core.engine] INFO: Spider opened
2020-02-02 17:04:25 [scrapy.extensions.logstats] INFO: Crawled 0 pages (at 0 pages/min), scraped 0 items (at 0 items/min)
2020-02-02 17:04:25 [scrapy.extensions.telnet] INFO: Telnet console listening on 127.0.0.1:6023
2020-02-02 17:04:26 [scrapy.downloadermiddlewares.redirect] DEBUG: Redirecting (301) to <GET https://news.cnblogs.com/> from <GET http://news.cnblogs.com/>
2020-02-02 17:04:28 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://news.cnblogs.com/> (referer: None)
2020-02-02 17:04:28 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://news.cnblogs.com/n/654241/> (referer: https://news.cnblogs.com/)
2020-02-02 17:04:28 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://news.cnblogs.com/NewsAjax/GetAjaxNewsInfo?contentId=654241> (referer: https://news.cnblogs.com/n/654241/)
2020-02-02 17:04:28 [scrapy.core.scraper] DEBUG: Scraped from <200 https://news.cnblogs.com/NewsAjax/GetAjaxNewsInfo?contentId=654241>
{'comment_nums': 0,
 'content': '<div id="news_content">\n'
            '            <div id="news_body">\n'
            '                <a href="/n/topic_1423.htm" title="韩国"><img '
            'src="https://img2018.cnblogs.com/news_topic/20190528183002826-593205698.png" '
            'class="topic_img" alt=""></a>\n'
            '<p>\u3000\u3000<strong>飞象网讯</strong>\xa0'
            '(一飞/文)根据韩国电信监管机构的最新数据,在推出新技术不到一年的时间里,韩国的 5G '
            '网络现在承载着该国所有无线网络流量的近四分之一。</p>\r\n'
            '<p>\u3000\u3000研究和咨询公司 Strategy Analytics 的分析师 Phil Kendall '
            '明确介绍了自 2019 年初韩国运营商推出 5G 技术以来,整个韩国的 5G 增长情况。</p>\r\n'
            '<p>\u3000\u300012 月数据显示韩国共有 467 万个 5G 连接,当月增加了 31.3 万。自 8 '
            '月以来,每月净增加量持续下降。11 月份平均 5G 使用量为 27.3GB 月,而 4G 为 9.7GB,3G 为 '
            '0.16GB,2G 为 4MB。5G 现在占移动连接流量的 21%。</p>\r\n'
            '<p>\u3000\u3000机构指出,该国 5G 网络大流量的部分原因是,韩国的 5G '
            '客户大多报名参加无限数据计划。目前,韩国超过三分之二的 5G 用户采用了无限制计划,远远超过 4G 和 3G 用户,并且韩国 5G '
            '用户在 2019 年第四季度平均每月消费 33GB。</p>\r\n'
            '<p>\u3000\u3000目前,韩国的三个无线网络运营商 SK Telecom,KT 和 LG Uplus 都提供 5G '
            '服务,尽管 SK Telecom 在 5G 市场份额方面占据了领先地位。</p>\r\n'
            '<p>\u3000\u3000韩国长期以来一直是一种技术试验场,这要归功于大量热衷于技术的公民以及政府对创新的重视。在 5G '
            '中,该国受益于广泛的用于回传的有线网络以及大量未使用的灵活中频带,这对于 5G '
            '覆盖范围和容量都是理想的选择。此外,该国的面积约为美国的1%,与美国田纳西州的面积大致相同,这使韩国运营商更容易用 5G '
            '信号覆盖广大人口。</p>\r\n'
            '<p>\u3000\u3000话虽如此,但鉴于 Verizon 和 AT&T等美国运营商并未披露此类统计数据,因此很难评估美国在 '
            '5G 客户和网络流量方面如何与韩国抗衡。</p>\r\n'
            '<p>\u3000\u3000尽管韩国在 5G 方面早已领先,但中国对 5G 的推动无疑将使韩国黯然失色。尽管 5G '
            '在中国还只有几个月的时间,但截至 2019 年底,中国三大移动网络运营商已在 50 个城市提供 5G 服务,部署了 13 '
            '万个基站,约有 1000 万用户签约,出货了 1380 万部手机。</p>            </div><!--end: '
            'news_body -->\n'
            '            <div id="news_otherinfo">\n'
            '                <div id="up_down">\n'
            '                    <div class="diggit" '
            'onclick="VoteNews(654241,\'agree\')">\n'
            '                        <span class="diggnum" '
            'id="digg_num_654241"></span>\n'
            '                    </div>\n'
            '                    <div class="buryit" '
            'onclick="VoteNews(654241,\'anti\')">\n'
            '                        <span class="burynum" '
            'id="bury_num_654241"></span>\n'
            '                    </div>\n'
            '                    <div class="clear"></div>\n'
            '                    <div id="digg_tip_654241" '
            'class="digg_tip_detail">\xa0</div>\n'
            '                </div>\n'
            '                <div id="come_from">\n'
            '                        来自:\n'
            '                        <a id="link_source2" target="_blank" '
            'href="http://www.cctime.com/html/2020-2-2/1497881.htm">飞象网</a>\n'
            '                </div><!--end: come_from -->\n'
            '                <div class="clear"></div>\n'
            '                <div id="article_A4area">\n'
            '                    <span id="shareA4" class="fl">\n'
            '                            <a href="https://q.cnblogs.com" '
            'target="_blank"><b>程序员问答平台,解决您的技术难题</b></a>\n'
            '                    </span>\n'
            '                    <span id="sharebox">\n'
            '                        <a onclick="PutInWz();return false;" '
            'href="javascript:void(0);"> <img border="0" title="收藏至网摘" '
            'src="/Images/icon_wz.png" alt="收藏"></a>\n'
            '                        <a rel="nofollow" '
            'onclick="ShareToTsina();return false;" '
            'href="javascript:void(0)"><img border="0" title="转发至新浪微博" '
            'src="/Images/icon_sina.gif" alt="新浪微博"></a>\n'
            '                        <a rel="nofollow" '
            'onclick="ShareToTweixin(654241);return false;" '
            'href="javascript:void(0)"><img border="0" title="分享至微信" '
            'src="/Images/icon_weixin.gif" alt="分享至微信"></a>\n'
            '                    </span>\n'
            '                    <div class="clear">\n'
            '                    </div>\n'
            '                </div><!--end: share block-->\n'
            '                <div class="clear"></div>\n'
            '                <div id="e4">\n'
            '                    <div id="div-gpt-ad-1533633736227-3" '
            'class="e4-dfp" style="height:60px; width:468px;">\n'
            '                        <script>\n'
            '                            googletag.cmd.push(function () { '
            "googletag.display('div-gpt-ad-1533633736227-3'); });\n"
            '                        </script>\n'
            '                    </div>\n'
            '                </div>\n'
            '                <div id="news_more_info">\n'
            '                        <div class="news_tags">标签:  <a '
            'href="/n/tag/5G/" class="catalink">5G</a></div>\n'
            '                    <input type="hidden" name="tagsId" '
            'id="tagsId" value="5G">\n'
            '                </div>\n'
            '            </div><!--end: news_otherinfo -->\n'
            '        </div>',
 'create_date': '2020-02-02 12:38',
 'fav_nums': 117,
 'front_image_url': ['https://img2018.cnblogs.com/news_topic/20190528183002826-593205698.png'],
 'praise_nums': 0,
 'tags': '5G',
 'title': '韩国5G流量已占全部移动流量的21%',
 'url': 'https://news.cnblogs.com/n/654241/',
 'url_object_id': 'a92fc0c18771fe92aaa0afd67b4a3de7'}
2020-02-02 17:04:28 [scrapy.core.engine] INFO: Closing spider (finished)
2020-02-02 17:04:28 [scrapy.statscollectors] INFO: Dumping Scrapy stats:
{'downloader/request_bytes': 991,
 'downloader/request_count': 4,
 'downloader/request_method_count/GET': 4,
 'downloader/response_bytes': 20918,
 'downloader/response_count': 4,
 'downloader/response_status_count/200': 3,
 'downloader/response_status_count/301': 1,
 'elapsed_time_seconds': 3.300118,
 'finish_reason': 'finished',
 'finish_time': datetime.datetime(2020, 2, 2, 9, 4, 28, 817186),
 'item_scraped_count': 1,
 'log_count/DEBUG': 5,
 'log_count/INFO': 10,
 'request_depth_max': 2,
 'response_received_count': 3,
 'scheduler/dequeued': 4,
 'scheduler/dequeued/memory': 4,
 'scheduler/enqueued': 4,
 'scheduler/enqueued/memory': 4,
 'start_time': datetime.datetime(2020, 2, 2, 9, 4, 25, 517068)}
2020-02-02 17:04:28 [scrapy.core.engine] INFO: Spider closed (finished)

Process finished with exit code 0

写回答

5回答

keannen

提问者

2020-02-03

settings..py 

//img.mukewang.com/szimg/5e37fffb0943632309260488.jpg

//img.mukewang.com/szimg/5e38001109b2026d08570495.jpg//img.mukewang.com/szimg/5e38001f0919907007790490.jpg

0
0

keannen

提问者

2020-02-03

pipelines.py

//img1.sycdn.imooc.com/szimg/5e37fd54096ec17207950321.jpg

//img.mukewang.com/szimg/5e37fcfd09c6cb2807910483.jpg

0
1
bobby
你的问题已经解决了吧
2020-02-05
共1条回复

keannen

提问者

2020-02-03

test

0
0

keannen

提问者

2020-02-03

//img.mukewang.com/szimg/5e37e22509d5db8313430681.jpg能看到front_image_url

0
0

bobby

2020-02-03

你这里的item和image_url都没有问题 所以你截图我看看你的settings中相关的设置是否正确

0
2
bobby
回复
keannen
那你把settings中和图片下载的相关源码设置截图我看看
2020-02-03
共2条回复

Scrapy打造搜索引擎 畅销4年的Python分布式爬虫课

带你彻底掌握Scrapy,用Django+Elasticsearch搭建搜索引擎

5796 学习 · 6290 问题

查看课程