Python HTTP库——requests

发布于:2025-04-20 ⋅ 阅读:(9) ⋅ 点赞:(0)

简介

Requests 是一款优雅而简单的Python HTTP库,为人类而建




安装

pip install requests




基本概念

RESTfulAPI

每个网址代表一种资源,对于资源的具体操作类型,由 HTTP 动词表示:

  • GET(SELECT):获取一项或多项资源
  • POST(CREATE):新建一个资源
  • PUT(UPDATE):更新并返回完整资源
  • PATCH(UPDATE):更新并返回资源改变的属性
  • DELETE(DELETE):删除资源
  • HEAD:获取资源的元数据
  • OPTIONS:获取资源信息,如哪些属性是客户端可以改变的



OAuth2.0

临时授权机制



Cookie和Session

HTTP 是无状态的,每次 HTTP 请求都是独立的

为了保持状态,在服务端保存 Session,在客户端(浏览器)保存 Cookies

浏览器每次请求附带上 Cookies,服务器通过识别 Cookies 鉴定出是哪个用户

Session 指有始有终的一系列动作,如打电话从拿起电话拨号到挂断电话这一过程可称为一个 Session

Session 在 Web 中用来存储用户属性及其配置信息




初试

GET 请求,模拟登录

import requests

r = requests.get('https://api.github.com/user', auth=('user', 'pass'))  # 模拟登录
print(r.status_code)
print(r.headers['content-type'])
print(r.encoding)
print(r.text)
print(r.json())




GET请求

import requests

r = requests.get('https://api.github.com/events')
print(r.status_code)
print(r.headers['content-type'])
print(r.encoding)
print(r.text)
print(r.json())
# 200
# application/json; charset=utf-8
# utf-8
# ...




POST请求

import requests

r = requests.post('https://httpbin.org/post', data={'key': 'value'})
print(r.json())

使用文件流

import requests

files = {
    'file': open('test.txt', 'rb'),
    'key0': (None, 'value0'),
    'key1': (None, 'value1'),
}
response = requests.post('http://httpbin.org/post', files=files)
print(response.json())

import requests
from requests_toolbelt.multipart.encoder import MultipartEncoder

data = {
    'file': open('test.txt', 'rb'),
    'key0': 'value0',
    'key1': 'value1',
}
response = requests.post('http://httpbin.org/post', data=MultipartEncoder(data))
print(response.json())




PUT请求

import requests

r = requests.put('https://httpbin.org/put', data={'key': 'value'})
print(r.json())




DELETE请求

import requests

r = requests.delete('https://httpbin.org/delete')
print(r.json())




HEAD请求

import requests

r = requests.head('https://httpbin.org/get')
print(r.status_code)
print(r.headers['content-type'])
print(r.encoding)
print(r.text)




OPTIONS请求

import requests

r = requests.options('https://httpbin.org/get')
print(r.status_code)
print(r.headers['content-type'])
print(r.encoding)
print(r.text)




传递查询参数

在 url 中传递查询参数,如 http://httpbin.org/get?key=val

params 参数

import requests

payload = {'key1': 'value1', 'key2': 'value2'}
r = requests.get('https://httpbin.org/get', params=payload)
print(r.url)  # https://httpbin.org/get?key1=value1&key2=value2
print(r.json())

payload = {'key1': 'value1', 'key2': ['value2', 'value3']}
r = requests.get('https://httpbin.org/get', params=payload)
print(r.url)  # https://httpbin.org/get?key1=value1&key2=value2&key2=value3




响应内容

import requests

r = requests.get('https://api.github.com/events')
print(r.text)
print(r.json())
print(r.encoding)  # utf-8




自定义响应头

import requests

url = 'https://api.github.com/some/endpoint'
headers = {'user-agent': 'my-app/0.0.1'}
r = requests.get(url, headers=headers)
print(r.json())




传递表单参数

data 参数

import requests

payload = {'key1': 'value1', 'key2': 'value2'}
r = requests.post('https://httpbin.org/post', data=payload)
print(r.json())

payload_tuples = [('key1', 'value1'), ('key1', 'value2')]
r = requests.post('https://httpbin.org/post', data=payload_tuples)
print(r.json())

payload_dict = {'key1': ['value1', 'value2']}
r = requests.post('https://httpbin.org/post', data=payload_dict)
print(r.json())

传递 JSON-Encoded 数据,这两种方法等价

import json
import requests

payload = {'key1': 'value1', 'key2': 'value2'}
url = 'https://api.github.com/some/endpoint'
r = requests.post(url, data=json.dumps(payload))
r = requests.post(url, json=payload)




传递文件

import requests

with open('1.txt', mode='w') as f:
    f.write('123')

url = 'https://httpbin.org/post'
files = {'file': open('1.txt', 'rb')}
r = requests.post(url, files=files)
print(r.json())

files = {'file': ('1.txt', open('1.txt', 'rb'), 'text/plain', {'Expires': '0'})}  # 设置filename、content_type、headers
r = requests.post(url, files=files)
print(r.json())

files = {'file': ('1.csv', 'some,data,to,send\nanother,row,to,send\n')}  # 字符串作为文件
r = requests.post(url, files=files)
print(r.json())




响应状态码

import requests

r = requests.get('https://httpbin.org/get')
print(r.status_code)  # 200
print(r.status_code == requests.codes.ok)  # True
r.raise_for_status()

bad_r = requests.get('https://httpbin.org/status/404')
print(bad_r.status_code)  # 404
print(bad_r.status_code == requests.codes.not_found)  # True
try:
    bad_r.raise_for_status()
except Exception as e:
    print(e)  # 404 Client Error: NOT FOUND for url: https://httpbin.org/status/404




响应头

import requests

r = requests.get('https://api.github.com/events')
print(r.headers)  # {'Server': 'GitHub.com', 'Date': 'Mon, 05 Sep 2022 10:35:42 GMT', ...}
print(r.headers['content-type'])  # application/json; charset=utf-8
print(r.headers.get('content-type'))  # application/json; charset=utf-8




Cookies

import requests

url = 'https://httpbin.org/cookies'
cookies = dict(cookies_are='working')
r = requests.get(url, cookies=cookies)
print(r.json())
print(r.cookies)

jar = requests.cookies.RequestsCookieJar()
jar.set('tasty_cookie', 'yum', domain='httpbin.org', path='/cookies')
jar.set('gross_cookie', 'blech', domain='httpbin.org', path='/elsewhere')
url = 'https://httpbin.org/cookies'
r = requests.get(url, cookies=jar)
print(r.json())
print(r.cookies)




重定向和历史记录

使用响应对象的属性 history 来追踪重定向

import requests

r = requests.get('http://github.com/')
print(r.url)  # 'https://github.com/'
print(r.status_code)  # 200
print(r.history)  # [<Response [301]>]

r = requests.get('http://github.com/', allow_redirects=False)  # 禁用重定向
print(r.status_code)  # 301
print(r.history)  # []

r = requests.head('http://github.com/', allow_redirects=True)
print(r.url)  # 'https://github.com/'
print(r.history)  # [<Response [301]>]




超时

参数 timeout 指定超时停止响应时间

import requests

try:
    requests.get('https://github.com/', timeout=0.001)
except Exception as e:
    print(e)




错误和异常

网络问题,如 DNS 失败,拒绝连接,会引发异常 ConnectionError

HTTP 请求返回不成功的状态码,Response.raise_for_status() 会引发异常 HTTPError

请求超时,会引发异常 Timeout

请求超过最大重定向数,会引发异常 TooManyRedirects

所有异常都继承 requests.RequestException




Session对象

  • Session 对象能跨请求持久化某些参数,如 Cookies
  • 如果向同一主机发出多个请求,重用底层 TCP 连接可以显著提高性能
  • Session 对象有以上所有 API 方法,还可以为请求提供默认数据
  • 即使使用 Session 对象,方法级参数也不会跨请求持久化
  • 手动添加 Cookies,使用 Session.cookies
  • Session 对象可以通过上下文管理器使用
  • 有时不需要 Session 对象的某参数,只需在方法级参数中将该键设为 None
import requests

# Session对象能跨请求持久化某些参数,如Cookies
s = requests.Session()
s.get('https://httpbin.org/cookies/set/sessioncookie/123456789')
r = s.get('https://httpbin.org/cookies')
print(r.json())  # {'cookies': {'sessioncookie': '123456789'}}

# Session对象有所有API方法,还可以为请求提供默认数据
s = requests.Session()
s.auth = ('user', 'pass')
s.headers.update({'x-test': 'true'})
s.get('https://httpbin.org/headers', headers={'x-test2': 'true'})  # 同时发送x-test和x-test2

# 即使使用Session对象,方法级参数也不会跨请求持久化
s = requests.Session()
r = s.get('https://httpbin.org/cookies', cookies={'from-my': 'browser'})
print(r.json())  # {'cookies': {'from-my': 'browser'}}
r = s.get('https://httpbin.org/cookies')
print(r.json())  # {'cookies': {}}

# Session对象可以通过上下文管理器使用
with requests.Session() as s:
    s.get('https://httpbin.org/cookies/set/sessioncookie/123456789')

维持 Session,相当于只开一个浏览器在请求

import requests

with requests.Session() as s:
    s.get('https://httpbin.org/cookies/set/sessioncookie/123456789')
    r = s.get('http://httpbin.org/cookies')
    print(r.json())




请求和响应对象

调用 requests.get() 实际上在做两件事:

  1. 构造一个 Request 对象发送到服务器请求资源
  2. 一旦请求从服务器获得响应,生成一个 Response 对象

Response 对象有服务器返回的所有信息,还包含最初创建的 Request 对象

import requests

r = requests.get('https://en.wikipedia.org/wiki/Monty_Python')
print(r.headers)
print(r.request.headers)




预处理请求

无论怎样发起请求,实际使用的是 PreparedRequest

如果需要在发送请求前对请求体或头部做一些修改,见原文




SSL证书验证

类似浏览器验证 HTTPS 请求的 SSL 证书,如果无法验证将抛出 SSLError

  • 参数 verify 可指定 CA 证书
  • 受信任的 CA 列表也可以通过环境变量 REQUESTS_CA_BUNDLE 指定。如果没有设置 REQUESTS_CA_BUNDLECURL_CA_BUNDLE 会用于回调
  • 参数 verify 设为 False 则不进行 SSL 证书验证。但无论是否验证,都会接受服务器提供的 TLS 证书,并忽略和主机名不匹配或过期的证书,这样做可能会受到中间人(MitM)攻击
  • 参数 verify 默认为 True,验证仅适用于 host 证书
import requests

r = requests.get('https://requestb.in')
print(r.text)

r = requests.get('https://github.com')
print(r.text)

r = requests.get('https://github.com', verify='/path/to/certfile')
# 上下两种方式类似
s = requests.Session()
s.verify = '/path/to/certfile'

r = requests.get('https://kennethreitz.org', verify=False)
print(r)  # <Response [200]>




客户端证书

指定本地证书为客户端证书,可以是单个文件(包含密钥和证书)或一个包含两个文件路径的元组

import requests

requests.get('https://kennethreitz.org', cert=('/path/client.cert', '/path/client.key'))
# 或
s = requests.Session()
s.cert = '/path/client.cert'

本地证书对应的密钥必须为解密状态




CA证书

Requests 使用 certifiio 的证书,允许在不更新 Requests 版本的情况下更新其受信任的证书

在 2.16 版本之前,Requests 绑定了一组来自 Mozilla 的根 CA,每次 Requests 更新,证书也会更新

如果没有安装 certifiio,在使用较旧版本的 Requests 时,会出现非常过时的证书

出于安全考虑,建议频繁更新证书!




响应体工作流

  • 默认情况下,发出请求后,响应体会立即下载。可以改成访问 Response.content 时才下载响应体
  • 请求时设置 stream=True,连接不会释放,直到获取所有数据或调用Response.close(),这样可能效率低下,建议用上下文管理器
import requests

tarball_url = 'https://github.com/psf/requests/tarball/main'
r = requests.get(tarball_url, stream=True)  # 此时只下载了响应头,仍然处于连接打开状态,可以进行有条件的内容检索

TOO_LONG = 1024
if int(r.headers['content-length']) < TOO_LONG:
    content = r.content
    ...

with requests.get('https://httpbin.org/get', stream=True) as r:
    ...




长连接

在 Session 中发出的请求都是长连接,且会自动重用合适的连接




流式上传

import requests

with open('massive-body', 'rb') as f:
    requests.post('http://some.url/streamed', data=f)




分块编码请求

import requests


def gen():
    yield 'hi'
    yield 'there'


requests.post('http://some.url/chunked', data=gen())




POST多个Multipart-Encoded文件

import requests

url = 'https://httpbin.org/post'
multiple_files = [
    ('images', ('foo.png', open('foo.png', 'rb'), 'image/png')),
    ('images', ('bar.png', open('bar.png', 'rb'), 'image/png'))]
r = requests.post(url, files=multiple_files)
print(r.text)




事件钩子

import requests


def print_url(r, *args, **kwargs):
    print(r.url)


def record_hook(r, *args, **kwargs):
    r.hook_called = True
    return r


r = requests.get('https://httpbin.org/', hooks={'response': print_url})
print(r)
# https://httpbin.org/
# <Response [200]>

r = requests.get('https://httpbin.org/', hooks={'response': [print_url, record_hook]})
print(r.hook_called)
# https://httpbin.org/
# True

s = requests.Session()
s.hooks['response'].append(print_url)
print(s.get('https://httpbin.org/'))
# https://httpbin.org/
# <Response [200]>




自定义认证

import requests
from requests.auth import AuthBase


class PizzaAuth(AuthBase):
    """Attaches HTTP Pizza Authentication to the given Request object."""

    def __init__(self, username):
        # setup any auth-related data here
        self.username = username

    def __call__(self, r):
        # modify and return the request
        r.headers['X-Pizza'] = self.username
        return r


print(requests.get('http://pizzabin.org/admin', auth=PizzaAuth('kenneth')))




流式请求

import json
import requests

r = requests.get('https://httpbin.org/stream/20', stream=True)

for line in r.iter_lines():
    if line:
        decoded_line = line.decode('utf-8')
        print(json.loads(decoded_line))

r = requests.get('https://httpbin.org/stream/20', stream=True)
if r.encoding is None:
    r.encoding = 'utf-8'
for line in r.iter_lines(decode_unicode=True):
    if line:
        print(json.loads(line))




代理

参数 proxies 配置代理

import requests

proxies = {
    'http': 'http://10.10.1.10:3128',
    'https': 'http://10.10.1.10:1080',
}
requests.get('http://example.org', proxies=proxies)

# 或为整个Session配置一次
proxies = {
    'http': 'http://10.10.1.10:3128',
    'https': 'http://10.10.1.10:1080',
}
session = requests.Session()
session.proxies.update(proxies)
session.get('http://example.org')

当代理配置没有覆盖每个请求时,检查 Requests 依赖的环境变量

export HTTP_PROXY="http://10.10.1.10:3128"
export HTTPS_PROXY="http://10.10.1.10:1080"
export ALL_PROXY="socks5://10.10.1.10:3434"




SOCKS协议




HTTP动词




自定义动词




Link头




转换适配器




OAuth认证

安装

pip install requests-oauthlib

代码

from requests_oauthlib import OAuth1Session

twitter = OAuth1Session('client_key',
                        client_secret='client_secret',
                        resource_owner_key='resource_owner_key',
                        resource_owner_secret='resource_owner_secret')
url = 'https://api.twitter.com/1/account/settings.json'
r = twitter.get(url)




下载图片




取消参数转义

import requests

params = {
    'username': 'abc',
    'password': '%'
}
params = '&'.join('{}={}'.format(k, v) for k, v in params.items())
response = requests.get('https://httpbin.org/get', params=params)
print(response.json())




转curl

安装

pip install curlify

初试

import curlify
import requests

response = requests.get("http://google.ru")
print(curlify.to_curl(response.request))
# curl -X 'GET' -H 'Accept: */*' -H 'Accept-Encoding: gzip, deflate' -H 'Connection: keep-alive' -H 'User-Agent: python-requests/2.18.4' 'http://www.google.ru/'

print(curlify.to_curl(response.request, compressed=True))
# curl -X 'GET' -H 'Accept: */*' -H 'Accept-Encoding: gzip, deflate' -H 'Connection: keep-alive' -H 'User-Agent: python-requests/2.18.4' --compressed 'http://www.google.ru/'




封装




参考文献

  1. Requests Documentation
  2. Requests-OAuth Documentation
  3. requests - 廖雪峰的官方网站
  4. RESTful API 设计指南 - 阮一峰的网络日志
  5. 52讲轻松搞定网络爬虫
  6. How to send a multipart/form-data with requests in python
  7. How to prevent python requests from percent encoding my URLs

网站公告

今日签到

点亮在社区的每一天
去签到