scrapy 如何带form表单进行post请求
来源:8-3 Requests和Response介绍
wa666
2017-09-24
class Lagouspider1Spider(scrapy.Spider):
name = 'lagouspider1'
allowed_domains = ['www.lagou.com']
city = '深圳'
keyword = 'python'
start_urls = 'https://www.lagou.com/jobs/positionAjax.json?city={0}&needAddtionalResult=false&isSchoolJob=0'.format(city)
headers = {
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; WOW64; rv:55.0) Gecko/20100101 Firefox/55.0',
'Accept': 'application/json, text/javascript, */*; q=0.01',
'Accept-Language': 'zh-CN,zh;q=0.8,en-US;q=0.5,en;q=0.3',
'Accept-Encoding': 'gzip, deflate, br',
'Content-Type': 'application/x-www-form-urlencoded; charset=UTF-8',
'X-Requested-With': 'XMLHttpRequest',
'X-Anit-Forge-Token': 'None',
'X-Anit-Forge-Code': '0',
'Referer': 'https://www.lagou.com/jobs/list_python?labelWords=&fromSearch=true&suginput=',
'Content-Length': '25',
'Connection': 'keep-alive'
}
cookie = {
'user_trace_token': '20170904131442-e5b6ef64-0911-440a-af99-d44c86945775',
'Hm_lvt_4233e74dff0ae5bd0a3d81c6ccf756e6': '1505144480,1505264923,1505435823,1505461844',
'_ga': 'GA1.2.1576896716.1504502078',
'LGUID': '20170904131444-f650ed3a-912f-11e7-90ff-5254005c3644',
'index_location_city': '%E6%B7%B1%E5%9C%B3',
'_gid': 'GA1.2.656103528.1505435823',
'SEARCH_ID': 'c0b695cc832c46539e184e3d1bbcb749',
'JSESSIONID': 'ABAAABAACDBAAIAC1242F13CEF66601A0E07BBDD95D4052',
'_gat': '1',
'Hm_lpvt_4233e74dff0ae5bd0a3d81c6ccf756e6': '1505461844',
'LGSID': '20170915155107-a1b78d78-99ea-11e7-9193-5254005c3644',
'PRE_UTM': '',
'PRE_HOST': 'www.baidu.com',
'PRE_SITE': 'https%3A%2F%2Fwww.baidu.com%2Flink%3Furl%3D3c2mseUwlcwN7Vuc9G8__Qnf8W_acQU5-oMJAx33AcG%26wd%3D%26eqid%3Dc28bcc520000797b0000000659bb8667',
'PRE_LAND': 'https%3A%2F%2Fwww.lagou.com%2F',
'LGRID': '20170915155107-a1b78f70-99ea-11e7-9193-5254005c3644',
'TG-TRACK-CODE': 'index_search'
}
def parse(self, response):
pass
def start_requests(self):
formdata = {'first': 'true', 'pn': '1', 'kd': self.keyword}
yield scrapy.FormRequest(self.start_urls,headers=self.headers,formdata=formdata,callback=self.parse_detail,cookies=self.cookie)在scrapy用FormRequest进行post请求数据服务器报400或者404
import requests
headers = {
'User-Agent': 'ozilla/5.0 (Windows NT 10.0; WOW64; rv:55.0) Gecko/20100101 Firefox/55.0',
'Accept': 'application/json, text/javascript, */*; q=0.01',
'Accept-Language': 'zh-CN,zh;q=0.8,en-US;q=0.5,en;q=0.3',
'Accept-Encoding': 'gzip, deflate, br',
'Content-Type': 'application/x-www-form-urlencoded; charset=UTF-8',
'X-Requested-With': 'XMLHttpRequest',
'X-Anit-Forge-Token': 'None',
'X-Anit-Forge-Code': '0',
'Referer': 'https://www.lagou.com/jobs/list_python?labelWords=&fromSearch=true&suginput=',
'Content-Length': '25',
'Cookie': 'user_trace_token=20170904131442-e5b6ef64-0911-440a-af99-d44c86945775; Hm_lvt_4233e74dff0ae5bd0a3d81c6ccf756e6=1505144480,1505264923,1505435823,1505461844; _ga=GA1.2.1576896716.1504502078; LGUID=20170904131444-f650ed3a-912f-11e7-90ff-5254005c3644; index_location_city=%E6%B7%B1%E5%9C%B3; _gid=GA1.2.656103528.1505435823; SEARCH_ID=c0b695cc832c46539e184e3d1bbcb749; JSESSIONID=ABAAABAACDBAAIAC1242F13CEF66601A0E07BBDD95D4052; _gat=1; Hm_lpvt_4233e74dff0ae5bd0a3d81c6ccf756e6=1505461844; LGSID=20170915155107-a1b78d78-99ea-11e7-9193-5254005c3644; PRE_UTM=; PRE_HOST=www.baidu.com; PRE_SITE=https%3A%2F%2Fwww.baidu.com%2Flink%3Furl%3D3c2mseUwlcwN7Vuc9G8__Qnf8W_acQU5-oMJAx33AcG%26wd%3D%26eqid%3Dc28bcc520000797b0000000659bb8667; PRE_LAND=https%3A%2F%2Fwww.lagou.com%2F; LGRID=20170915155107-a1b78f70-99ea-11e7-9193-5254005c3644; TG-TRACK-CODE=index_search',
'Connection': 'keep-alive'
}
keyword = 'python'
city = '深圳'
post_data = {
'first':'true',
'pn':'1',
'kd':keyword
}
url = 'https://www.lagou.com/jobs/positionAjax.json?city={0}&needAddtionalResult=false&isSchoolJob=0'.format(city)
rp = requests.post(url,headers=headers,data=post_data)
print(rp.text)同样的数据参数,改用requests进行post请求,可以正常返回json数据。
弄了一天了。。。不知道scrapy为啥就不能请求到,debug到FormRequest那行就不会继续走了。
scrapy下到底要怎么写才对呢。。查了一天都没弄对。。
写回答
1回答
-
wa666
提问者
2017-09-24
已解决 scrapy中的headers删掉'Content-Length': '25',就可以了
012017-09-25
相似问题