问题表现:

响应头中有gbk编码的中文,导致requests无法解码读取header。

http包如图:

Python 3.4.3 (default, Aug 25 2017, 16:49:50)
 [GCC 4.8.5 20150623 (Red Hat 4.8.5-11)] on linux
 Type "help", "copyright", "credits" or "license" for more information.
 >>> import requests
 >>> res = requests.get('http://down.chinaz.com/download.asp?id=35&dp=1&fid=22&f=yes',headers={'Referer':'http://down.chinaz.com/soft/12162.htm'},allow_redirects=False)
 Traceback (most recent call last):
 File "", line 1, in 
 File "/usr/local/lib/python3.4/site-packages/requests/api.py", line 72, in get
 return request('get', url, params=params, **kwargs)
 File "/usr/local/lib/python3.4/site-packages/requests/api.py", line 58, in request
 return session.request(method=method, url=url, **kwargs)
 File "/usr/local/lib/python3.4/site-packages/requests/sessions.py", line 510, in request
 resp = self.send(prep, **send_kwargs)
 File "/usr/local/lib/python3.4/site-packages/requests/sessions.py", line 655, in send
 r._next = next(self.resolve_redirects(r, request, yield_requests=True, **kwargs))
 File "/usr/local/lib/python3.4/site-packages/requests/sessions.py", line 125, in resolve_redirects
 url = self.get_redirect_target(resp)
 File "/usr/local/lib/python3.4/site-packages/requests/sessions.py", line 116, in get_redirect_target
 return to_native_string(location, 'utf8')
 File "/usr/local/lib/python3.4/site-packages/requests/_internal_utils.py", line 25, in to_native_string
 out = string.decode(encoding)
 UnicodeDecodeError: 'utf-8' codec can't decode byte 0xc6 in position 28: invalid continuation byte
 >>>

直接导致无法请求,该问题google也找不到相关问题,因为大部人遇到的都是请求成功的响应编码问题,而这个问题是请求时即报错。

经过测试python2.7是没有该问题的

从ipython 中可以看出是这一段错误:

usr/local/lib/python3.4/site-packages/requests/sessions.py in get_redirect_target(self, resp)
 114 if is_py3:
 115 location = location.encode('latin1')
 --> 116 return to_native_string(location, 'utf8')
 117 #return location
 118

那么对比下python 2.7 与python3.4 的requests底层代码可以看出差别:

python3.4 requests中获取响应location代码;

默认全部使用ut8解码

python 2.7代码:

 

再看下 get_redirect_target函数:

 

基本可以确认为python3.4 中获取location时默认使用了utf-8解码,然而如果location是中文gbk编码,那么就会出现文中一开始出现的报错。

临时的解决方法可以将utf-8改为 GBK,另外以下两处也需要修改,用于请求location的地址:

您的支持将鼓励我们继续创作!

[微信] 扫描二维码打赏

[支付宝] 扫描二维码打赏