scrapy 如何带form表单进行post请求
来源:8-3 Requests和Response介绍
wa666
2017-09-24
class Lagouspider1Spider(scrapy.Spider): name = 'lagouspider1' allowed_domains = ['www.lagou.com'] city = '深圳' keyword = 'python' start_urls = 'https://www.lagou.com/jobs/positionAjax.json?city={0}&needAddtionalResult=false&isSchoolJob=0'.format(city) headers = { 'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; WOW64; rv:55.0) Gecko/20100101 Firefox/55.0', 'Accept': 'application/json, text/javascript, */*; q=0.01', 'Accept-Language': 'zh-CN,zh;q=0.8,en-US;q=0.5,en;q=0.3', 'Accept-Encoding': 'gzip, deflate, br', 'Content-Type': 'application/x-www-form-urlencoded; charset=UTF-8', 'X-Requested-With': 'XMLHttpRequest', 'X-Anit-Forge-Token': 'None', 'X-Anit-Forge-Code': '0', 'Referer': 'https://www.lagou.com/jobs/list_python?labelWords=&fromSearch=true&suginput=', 'Content-Length': '25', 'Connection': 'keep-alive' } cookie = { 'user_trace_token': '20170904131442-e5b6ef64-0911-440a-af99-d44c86945775', 'Hm_lvt_4233e74dff0ae5bd0a3d81c6ccf756e6': '1505144480,1505264923,1505435823,1505461844', '_ga': 'GA1.2.1576896716.1504502078', 'LGUID': '20170904131444-f650ed3a-912f-11e7-90ff-5254005c3644', 'index_location_city': '%E6%B7%B1%E5%9C%B3', '_gid': 'GA1.2.656103528.1505435823', 'SEARCH_ID': 'c0b695cc832c46539e184e3d1bbcb749', 'JSESSIONID': 'ABAAABAACDBAAIAC1242F13CEF66601A0E07BBDD95D4052', '_gat': '1', 'Hm_lpvt_4233e74dff0ae5bd0a3d81c6ccf756e6': '1505461844', 'LGSID': '20170915155107-a1b78d78-99ea-11e7-9193-5254005c3644', 'PRE_UTM': '', 'PRE_HOST': 'www.baidu.com', 'PRE_SITE': 'https%3A%2F%2Fwww.baidu.com%2Flink%3Furl%3D3c2mseUwlcwN7Vuc9G8__Qnf8W_acQU5-oMJAx33AcG%26wd%3D%26eqid%3Dc28bcc520000797b0000000659bb8667', 'PRE_LAND': 'https%3A%2F%2Fwww.lagou.com%2F', 'LGRID': '20170915155107-a1b78f70-99ea-11e7-9193-5254005c3644', 'TG-TRACK-CODE': 'index_search' } def parse(self, response): pass def start_requests(self): formdata = {'first': 'true', 'pn': '1', 'kd': self.keyword} yield scrapy.FormRequest(self.start_urls,headers=self.headers,formdata=formdata,callback=self.parse_detail,cookies=self.cookie)
在scrapy用FormRequest进行post请求数据服务器报400或者404
import requests headers = { 'User-Agent': 'ozilla/5.0 (Windows NT 10.0; WOW64; rv:55.0) Gecko/20100101 Firefox/55.0', 'Accept': 'application/json, text/javascript, */*; q=0.01', 'Accept-Language': 'zh-CN,zh;q=0.8,en-US;q=0.5,en;q=0.3', 'Accept-Encoding': 'gzip, deflate, br', 'Content-Type': 'application/x-www-form-urlencoded; charset=UTF-8', 'X-Requested-With': 'XMLHttpRequest', 'X-Anit-Forge-Token': 'None', 'X-Anit-Forge-Code': '0', 'Referer': 'https://www.lagou.com/jobs/list_python?labelWords=&fromSearch=true&suginput=', 'Content-Length': '25', 'Cookie': 'user_trace_token=20170904131442-e5b6ef64-0911-440a-af99-d44c86945775; Hm_lvt_4233e74dff0ae5bd0a3d81c6ccf756e6=1505144480,1505264923,1505435823,1505461844; _ga=GA1.2.1576896716.1504502078; LGUID=20170904131444-f650ed3a-912f-11e7-90ff-5254005c3644; index_location_city=%E6%B7%B1%E5%9C%B3; _gid=GA1.2.656103528.1505435823; SEARCH_ID=c0b695cc832c46539e184e3d1bbcb749; JSESSIONID=ABAAABAACDBAAIAC1242F13CEF66601A0E07BBDD95D4052; _gat=1; Hm_lpvt_4233e74dff0ae5bd0a3d81c6ccf756e6=1505461844; LGSID=20170915155107-a1b78d78-99ea-11e7-9193-5254005c3644; PRE_UTM=; PRE_HOST=www.baidu.com; PRE_SITE=https%3A%2F%2Fwww.baidu.com%2Flink%3Furl%3D3c2mseUwlcwN7Vuc9G8__Qnf8W_acQU5-oMJAx33AcG%26wd%3D%26eqid%3Dc28bcc520000797b0000000659bb8667; PRE_LAND=https%3A%2F%2Fwww.lagou.com%2F; LGRID=20170915155107-a1b78f70-99ea-11e7-9193-5254005c3644; TG-TRACK-CODE=index_search', 'Connection': 'keep-alive' } keyword = 'python' city = '深圳' post_data = { 'first':'true', 'pn':'1', 'kd':keyword } url = 'https://www.lagou.com/jobs/positionAjax.json?city={0}&needAddtionalResult=false&isSchoolJob=0'.format(city) rp = requests.post(url,headers=headers,data=post_data) print(rp.text)
同样的数据参数,改用requests进行post请求,可以正常返回json数据。
弄了一天了。。。不知道scrapy为啥就不能请求到,debug到FormRequest那行就不会继续走了。
scrapy下到底要怎么写才对呢。。查了一天都没弄对。。
写回答
1回答
-
wa666
提问者
2017-09-24
已解决 scrapy中的headers删掉'Content-Length': '25',就可以了
012017-09-25
相似问题