完全没办法匹配拉勾网URL了

来源:7-4 Rule和LinkExtractor使用

梅小钰

2018-08-08

全是

2018-08-08 21:42:57 [scrapy.downloadermiddlewares.redirect] DEBUG: Redirecting (302) to <GET https://passport.lagou.com/login/login.html?msg=validation&uStatus=2&clientIp=117.159.15.221> from <GET https://www.lagou.com/gongsi/395045.html>

2018-08-08 21:42:57 [scrapy.dupefilters] DEBUG: Filtered duplicate request: <GET https://passport.lagou.com/login/login.html?msg=validation&uStatus=2&clientIp=117.159.15.221> - no more duplicates will be shown (see DUPEFILTER_DEBUG to show all duplicates)

2018-08-08 21:42:57 [scrapy.downloadermiddlewares.redirect] DEBUG: Redirecting (302) to <GET https://passport.lagou.com/login/login.html?msg=validation&uStatus=2&clientIp=117.159.15.221> from <GET https://www.lagou.com/gongsi/164989.html>

2018-08-08 21:42:57 [scrapy.downloadermiddlewares.redirect] DEBUG: Redirecting (302) to <GET https://passport.lagou.com/login/login.html?msg=validation&uStatus=2&clientIp=117.159.15.221> from <GET https://www.lagou.com/gongsi/53.html>

2018-08-08 21:42:57 [scrapy.downloadermiddlewares.redirect] DEBUG: Redirecting (302) to <GET https://passport.lagou.com/login/login.html?msg=validation&uStatus=2&clientIp=117.159.15.221> from <GET https://www.lagou.com/gongsi/76066.html>

类似这样的重定向,Rule(LinkExtractor(allow=r'jobs/') 就直接找不到,各种不对,也不知道这门课该从哪里听了,完全没办法跟代码了,心态爆炸

写回答

1回答

bobby

2018-08-10

拉勾比较变态 使用的是通过ip限制你的抓取速度, 目前你需要限制一下你的抓取速度, 重启一下家里的路由器 换一下你的ip,

0
6
bobby
回复
三肥牛元气
一周内更新视频解决这个问题
2018-11-23
共6条回复

Scrapy打造搜索引擎 畅销4年的Python分布式爬虫课

带你彻底掌握Scrapy,用Django+Elasticsearch搭建搜索引擎

5795 学习 · 6290 问题

查看课程