lagou的/jobs/allCity.html里面北上广深杭地区标签里没有-zhaopin，这是被反爬虫了吗

来源：2-1 分析招聘网站结构并解析招聘网站城市列表

慕无忌4207111

2019-10-14

 <ul class="city_list">
                                                    <li >
                                <a href="https://www.lagou.com/shenzhen/">深圳</a>
                                <input class="dn" value="https://www.lagou.com/jobs/list_webrtc?&px=default&city=深圳#filterBox"/>
                            </li>
                                                    <li >
                                <a href="https://www.lagou.com/shanghai/">上海</a>
                                <input class="dn" value="https://www.lagou.com/jobs/list_webrtc?&px=default&city=上海#filterBox"/>
                            </li>
                                                    <li >
                                <a href=" https://www.lagou.com/suzhou-zhaopin/">苏州</a>
                                <input class="dn" value="https://www.lagou.com/jobs/list_webrtc?&px=default&city=苏州#filterBox"/>
                            </li>
                                                    <li >
                                <a href=" https://www.lagou.com/shenyang-zhaopin/">沈阳</a>
                                <input class="dn" value="https://www.lagou.com/jobs/list_webrtc?&px=default&city=沈阳#filterBox"/>

感觉被针对之余留了条小路，这个有什么办法能爬到呢？
毕竟这几个城市占了大头。。。

写回答

2回答

two10

2019-11-05

还有我是Linux系统,请求头headers和老师的不一样,请用你们自己的headers

two10

2019-11-05

# 这种用xpath就很好获取

import requests
from lxml import etree

class HandleLagou(object):
def __init__(self):
# session保存cookie信息
self.lagou_session = requests.session()
self.header = {
'User-Agent': 'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/78.0.3904.70 Safari/537.36'
}

self.city_list = ''

def handle_request(self):
# 获取网页
headers = self.header
city_url = 'https://www.lagou.com/jobs/allCity.html'
response = self.lagou_session.get(city_url, headers=headers)
if response.status_code == 200:
return response.content.decode('utf-8')
return None

def handle_city(self, html):
# 解析城市
etree_html = etree.HTML(html)
city_search = etree_html.xpath('//ul[@class="city_list"]/li/a/text()')

self.city_list = city_search
print(self.city_list)

def main(self):
html = self.handle_request()
self.handle_city(html)

# 调用
if __name__ == '__main__':
lagou = HandleLagou()
lagou.main()

Python爬虫实战数据可视化分析

网站数据收集分析必备技能

3982 学习 · 115 问题

查看课程

相似问题

杭州和上海会爬不出岗位信息?

回答 1

【老师看这里】拉钩robots.txt文件里面，发现jobs/list 并不给爬，如果我爬了并且公开源码和工具，会有法律影响吗

回答 1

拉勾网检测到IP存在爬虫行为，如何解决？

回答 1

3-2中数据库里面的只存储了几十条信息，一些热门城市的职位都没有，在前面没有添加阿布云的动态代理

回答 2

AttributeError: 'HandleLaGou' object has no attribute 'headle_city'

回答 1

打开慕课网App查看更多内容