urllib.request
— Extensible library for opening URLs用于打开URL的可扩展库¶
Source code: Lib/urllib/request.py
The urllib.request
module defines functions and classes which help in opening URLs (mostly HTTP) in a complex world — basic and digest authentication, redirections, cookies and more.urllib.request
模块定义了一些函数和类,这些函数和类有助于在复杂的世界中打开URL(主要是HTTP)——基本和摘要身份验证、重定向、cookie等等。
See also
The Requests package is recommended for a higher-level HTTP client interface.建议将Requests包用于更高级别的HTTP客户端接口。
The urllib.request
module defines the following functions:urllib.request
模块定义以下函数:
-
urllib.request.
urlopen
(url, data=None, [timeout, ]*, cafile=None, capath=None, cadefault=False, context=None)¶ Open the URL url, which can be either a string or a打开URL url,可以是字符串或Request
object.Request
对象。data must be an object specifying additional data to be sent to the server, ordata必须是指定要发送到服务器的其他数据的对象,如果不需要此类数据,则必须是None
if no such data is needed.None
。See详情请参见Request
for details.Request
。urllib.request
module uses HTTP/1.1 and includesConnection:close
header in its HTTP requests.urllib.request
模块使用HTTP/1.1,并在其HTTP请求中包含Connection:close
头。The optional timeout parameter specifies a timeout in seconds for blocking operations like the connection attempt (if not specified, the global default timeout setting will be used).可选timeout参数以秒为单位指定阻塞操作(如未指定,将使用全局默认超时设置)的超时,如连接尝试。This actually only works for HTTP, HTTPS and FTP connections.这实际上只适用于HTTP、HTTPS和FTP连接。If context is specified, it must be a如果指定了context,则它必须是描述各种SSL选项的ssl.SSLContext
instance describing the various SSL options.ssl.SSLContext
实例。See有关更多详细信息,请参阅HTTPSConnection
for more details.HTTPSConnection
。The optional cafile and capath parameters specify a set of trusted CA certificates for HTTPS requests.可选的cafile和capath参数为HTTPS请求指定一组受信任的CA证书。cafile should point to a single file containing a bundle of CA certificates, whereas capath should point to a directory of hashed certificate files.cafile应该指向包含CA证书束的单个文件,而capath应该指向散列证书文件的目录。More information can be found in更多信息可以在ssl.SSLContext.load_verify_locations()
.ssl.SSLContext.load_verify_locations()
中找到。The cadefault parameter is ignored.cadefault参数被忽略。This function always returns an object which can work as a context manager and has the properties url, headers, and status.此函数始终返回一个对象,该对象可以用作上下文管理器,并具有url、headers和status属性。See有关这些属性的更多详细信息,请参阅urllib.response.addinfourl
for more detail on these properties.urllib.response.addinfourl
。For HTTP and HTTPS URLs, this function returns a对于HTTP和HTTPS URL,此函数返回稍有修改的http.client.HTTPResponse
object slightly modified.http.client.HTTPResponse
对象。In addition to the three new methods above, the msg attribute contains the same information as the除上述三种新方法外,msg属性包含与reason
attribute — the reason phrase returned by server — instead of the response headers as it is specified in the documentation forHTTPResponse
.reason
属性相同的信息(服务器返回的原因短语),而不是HTTPResponse
文档中指定的响应头。For FTP, file, and data URLs and requests explicitly handled by legacy对于FTP、文件和数据URL以及由遗留URLopener
andFancyURLopener
classes, this function returns aurllib.response.addinfourl
object.URLopener
和FancyURLopener
类显式处理的请求,此函数返回urllib.response.addinfourl
对象。Raises在协议错误上引发URLError
on protocol errors.URLError
。Note that请注意,如果没有处理程序处理该请求,则可能会返回None
may be returned if no handler handles the request (though the default installed globalOpenerDirector
usesUnknownHandler
to ensure this never happens).None
(尽管默认安装的全局OpenerDirector
使用UnknownHandler
确保不会发生这种情况)。In addition, if proxy settings are detected (for example, when a此外,如果检测到代理设置(例如,当设置了*_proxy
environment variable likehttp_proxy
is set),ProxyHandler
is default installed and makes sure the requests are handled through the proxy.*_proxy
环境变量(如http_proxy
)时),则默认安装ProxyHandler
,并确保通过代理处理请求。The legacyPython 2.6及更早版本中遗留的urllib.urlopen
function from Python 2.6 and earlier has been discontinued;urllib.request.urlopen()
corresponds to the oldurllib2.urlopen
.urllib.urlopen
函数已停止使用;urllib.request.urlopen()
对应于旧的urllib2.urlopen
。Proxy handling, which was done by passing a dictionary parameter to代理处理是通过将字典参数传递给urllib.urlopen
, can be obtained by usingProxyHandler
objects.urllib.urlopen
来完成的,可以通过使用ProxyHandler
对象来获得。The default opener raises an auditing event默认的开启器会引发一个审核事件urllib.Request
with argumentsfullurl
,data
,headers
,method
taken from the request object.urllib.Request
,其中包含从请求对象获取的参数fullurl
、data
、headers
和method
。Changed in version 3.2:版本3.2中更改:cafile and capath were added.添加了cafile和capath。Changed in version 3.2:版本3.2中更改:HTTPS virtual hosts are now supported if possible (that is, if如果可能,现在支持HTTPS虚拟主机(即,如果ssl.HAS_SNI
is true).ssl.HAS_SNI
为true
)。New in version 3.2.版本3.2中新增。data can be an iterable object.data可以是可迭代对象。Changed in version 3.3:版本3.3中更改:cadefault was added.添加了cadefault。Changed in version 3.4.3:版本3.4.3中更改:context was added.添加了context。Changed in version 3.10:版本3.10中更改:HTTPS connection now send an ALPN extension with protocol indicatorHTTPS连接现在在没有给出context的情况下发送一个带有协议指示符http/1.1
when no context is given.http/1.1
的ALPN扩展。Custom context should set ALPN protocols with自定义context应使用set_alpn_protocol()
.set_alpn_protocol()
设置ALPN协议。Deprecated since version 3.6:自版本3.6以来已弃用:cafile, capath and cadefault are deprecated in favor of context.cafile、capath和cadefault被弃用,取而代之的是语境。Please use请改用ssl.SSLContext.load_cert_chain()
instead, or letssl.create_default_context()
select the system’s trusted CA certificates for you.ssl.SSLContext.load_cert_chain()
,或者让ssl.create_default_context()
为您选择系统的受信任CA证书。
-
urllib.request.
install_opener
(opener)¶ Install an安装OpenerDirector
instance as the default global opener.OpenerDirector
实例作为默认的全局开启器。Installing an opener is only necessary if you want urlopen to use that opener; otherwise, simply call只有当您希望urlopen使用开启器时,才需要安装开启器;否则,只需调用OpenerDirector.open()
instead ofurlopen()
.OpenerDirector.open()
而不是urlopen()
。The code does not check for a real代码不会检查真正的OpenerDirector
, and any class with the appropriate interface will work.OpenerDirector
,任何具有适当接口的类都可以工作。
-
urllib.request.
build_opener
([handler, ...])¶ Return an返回OpenerDirector
instance, which chains the handlers in the order given.OpenerDirector
实例,该实例按照给定的顺序链接处理程序。handlers can be either instances ofhandler可以是BaseHandler
, or subclasses ofBaseHandler
(in which case it must be possible to call the constructor without any parameters).BaseHandler
的实例,也可以是BaseHandler
的子类的实例(在这种情况下,必须能够在没有任何参数的情况下调用构造函数)。Instances of the following classes will be in front of the handlers, unless the handlers contain them, instances of them or subclasses of them:以下类的实例将位于handler之前,除非handler包含它们、它们的实例或它们的子类:ProxyHandler
(if proxy settings are detected),UnknownHandler
,HTTPHandler
,HTTPDefaultErrorHandler
,HTTPRedirectHandler
,FTPHandler
,FileHandler
,HTTPErrorProcessor
.ProxyHandler
(如果检测到代理设置)、UnknownHandler
、HTTPHandler
、HTTPDefaultErrorHandler
、HTTPRedirectHandler
、FTPHandler
、FileHandler
、HTTPErrorProcessor
。If the Python installation has SSL support (i.e., if the如果Python安装具有SSL支持(即,如果可以导入ssl
module can be imported),HTTPSHandler
will also be added.ssl
模块),则还将添加HTTPSHandler
。ABaseHandler
subclass may also change itshandler_order
attribute to modify its position in the handlers list.BaseHandler
子类还可以更改其handler_order
属性,以修改其在处理程序列表中的位置。
-
urllib.request.
pathname2url
(path)¶ Convert the pathname path from the local syntax for a path to the form used in the path component of a URL.将路径名path从路径的本地语法转换为URL的路径组件中使用的形式。This does not produce a complete URL.这不会生成完整的URL。The return value will already be quoted using the返回值将已经使用quote()
function.quote()
函数引用。
-
urllib.request.
url2pathname
(path)¶ Convert the path component path from a percent-encoded URL to the local syntax for a path.将路径组件path从百分比编码URL转换为路径的本地语法。This does not accept a complete URL.这不接受完整的URL。This function uses此函数使用unquote()
to decode path.unquote()
解码path。
-
urllib.request.
getproxies
()¶ This helper function returns a dictionary of scheme to proxy server URL mappings.此helper函数返回scheme到代理服务器URL映射的字典。It scans the environment for variables named它首先以不区分大小写的方法扫描环境中所有操作系统的名为<scheme>_proxy
, in a case insensitive approach, for all operating systems first, and when it cannot find it, looks for proxy information from System Configuration for macOS and Windows Systems Registry for Windows.<scheme>_proxy
的变量,当找不到时,从macOS的系统配置和Windows的Windows系统注册表中查找代理信息。If both lowercase and uppercase environment variables exist (and disagree), lowercase is preferred.如果小写和大写环境变量都存在(并且不一致),则首选小写。Note
If the environment variable如果设置了环境变量REQUEST_METHOD
is set, which usually indicates your script is running in a CGI environment, the environment variableHTTP_PROXY
(uppercase_PROXY
) will be ignored.REQUEST_METHOD
,这通常表示脚本正在CGI环境中运行,则环境变量HTTP_PROXY
(大写_PROXY
)将被忽略。This is because that variable can be injected by a client using the “Proxy:” HTTP header.这是因为客户端可以使用“代理:”HTTP头注入该变量。If you need to use an HTTP proxy in a CGI environment, either use如果需要在CGI环境中使用HTTP代理,请显式使用ProxyHandler
explicitly, or make sure the variable name is in lowercase (or at least the_proxy
suffix).ProxyHandler
,或确保变量名为小写(或至少是_proxy
后缀)。
The following classes are provided:提供以下类别:
-
class
urllib.request.
Request
(url, data=None, headers={}, origin_req_host=None, unverifiable=False, method=None)¶ This class is an abstraction of a URL request.此类是URL请求的抽象。url
should be a string containing a valid URL.应该是包含有效URL的字符串。data must be an object specifying additional data to send to the server, ordata必须是指定要发送到服务器的其他数据的对象,如果不需要此类数据,则必须为None
if no such data is needed.None
。Currently HTTP requests are the only ones that use data.目前,HTTP请求是唯一使用data的请求。The supported object types include bytes, file-like objects, and iterables of bytes-like objects.支持的对象类型包括字节、类似文件的对象和类似字节的对象。If no如果没有提供Content-Length
norTransfer-Encoding
header field has been provided,HTTPHandler
will set these headers according to the type of data.Content-Length
或Transfer-Encoding
标头字段,HTTPHandler
将根据数据类型设置这些头。Content-Length
will be used to send bytes objects, whileTransfer-Encoding: chunked
as specified in RFC 7230, Section 3.3.1 will be used to send files and other iterables.Content-Length
将用于发送字节对象,而Transfer-Encoding: chunked
按照RFC 7230第3.3.1节中的规定将用于发送文件和其他可迭代对象。For an HTTP POST request method, data should be a buffer in the standard application/x-www-form-urlencoded format.对于HTTP POST请求方法,data应该是标准application/x-www-form-urlencoded格式的缓冲区。Theurllib.parse.urlencode()
function takes a mapping or sequence of 2-tuples and returns an ASCII string in this format.urllib.parse.urlencode()
函数的作用是:获取2元组的映射或序列,并以这种格式返回ASCII字符串。It should be encoded to bytes before being used as the data parameter.在用作data参数之前,应将其编码为字节。headers should be a dictionary, and will be treated as ifheaders应该是一个字典,并且将被视为使用每个键和值作为参数调用add_header()
was called with each key and value as arguments.add_header()
。This is often used to “spoof” the这通常用于“欺骗”User-Agent
header value, which is used by a browser to identify itself – some HTTP servers only allow requests coming from common browsers as opposed to scripts.User-Agent
标头值,浏览器使用该值来识别自身:一些HTTP服务器只允许来自普通浏览器而不是脚本的请求。For example, Mozilla Firefox may identify itself as例如,Mozilla Firefox可能将自己标识为"Mozilla/5.0 (X11; U; Linux i686) Gecko/20071127 Firefox/2.0.0.11"
, whileurllib
’s default user agent string is"Python-urllib/2.6"
(on Python 2.6)."Mozilla/5.0 (X11; U; Linux i686) Gecko/20071127 Firefox/2.0.0.11"
,而urllib
的默认用户代理字符串是"Python-urllib/2.6"
(在Python 2.6上)。All header keys are sent in camel case.所有标题键均以驼峰形式发送。An appropriate如果存在data参数,则应包括适当的Content-Type
header should be included if the data argument is present.Content-Type
标头。If this header has not been provided and data is not None,如果未提供此标题,并且data不是Content-Type: application/x-www-form-urlencoded
will be added as a default.None
,则默认情况下会添加Content-Type: application/x-www-form-urlencoded
。The next two arguments are only of interest for correct handling of third-party HTTP cookies:以下两个参数仅对正确处理第三方HTTP Cookie有用:origin_req_host
should be the request-host of the origin transaction, as defined by RFC 2965.应该是原始事务的请求主机,如RFC 2965所定义。It defaults to它默认为http.cookiejar.request_host(self)
.http.cookiejar.request_host(self)
。This is the host name or IP address of the original request that was initiated by the user.这是用户发起的原始请求的主机名或IP地址。For example, if the request is for an image in an HTML document, this should be the request-host of the request for the page containing the image.例如,如果请求是针对HTML文档中的图像,则这应该是针对包含图像的页面的请求的请求主机。unverifiable should indicate whether the request is unverifiable, as defined by RFC 2965.unverifiable应表明请求是否不可验证,如RFC 2965所定义。It defaults to默认为False
.False
。An unverifiable request is one whose URL the user did not have the option to approve.无法验证的请求是指用户无权批准其URL的请求。For example, if the request is for an image in an HTML document, and the user had no option to approve the automatic fetching of the image, this should be true.例如,如果请求的是HTML文档中的图像,而用户没有选择批准自动获取图像,那么这应该是真的。method should be a string that indicates the HTTP request method that will be used (e.g.method应该是一个字符串,指示将使用的HTTP请求方法(例如'HEAD'
).'HEAD'
)。If provided, its value is stored in the如果提供,其值存储在method
attribute and is used byget_method()
.method
属性中,并由get_method()
使用。The default is如果data为'GET'
if data isNone
or'POST'
otherwise.None
,则默认为'GET'
,否则为'POST'
。Subclasses may indicate a different default method by setting the子类可以通过在类本身中设置method
attribute in the class itself.method
属性来指示不同的默认方法。Note
The request will not work as expected if the data object is unable to deliver its content more than once (e.g. a file or an iterable that can produce the content only once) and the request is retried for HTTP redirects or authentication.如果数据对象无法多次交付其内容(例如,只能生成一次内容的文件或iterable),并且请求被重试以进行HTTP重定向或身份验证,则请求将无法按预期工作。The data is sent to the HTTP server right away after the headers.data在标头之后立即发送到HTTP服务器。There is no support for a 100-continue expectation in the library.库不支持100个连续期望值。Changed in version 3.3:版本3.3中更改:Request.method
argument is added to the Request class.参数被添加到请求类。Changed in version 3.4:版本3.4中更改:Default默认的Request.method
may be indicated at the class level.Request.method
可以在类级别指示。Changed in version 3.6:版本3.6中更改:Do not raise an error if the如果未提供Content-Length
has not been provided and data is neitherNone
nor a bytes object.Content-Length
,并且data既不是None
也不是bytes对象,请不要引发错误。Fall back to use chunked transfer encoding instead.退一步,改用分块传输编码。
-
class
urllib.request.
OpenerDirector
¶ TheOpenerDirector
class opens URLs viaBaseHandler
s chained together.OpenerDirector
类通过链接在一起的BaseHandler
打开URL。It manages the chaining of handlers, and recovery from errors.它管理处理程序的链接,并从错误中恢复。
-
class
urllib.request.
BaseHandler
¶ This is the base class for all registered handlers — and handles only the simple mechanics of registration.这是所有注册处理程序的基类,只处理简单的注册机制。
-
class
urllib.request.
HTTPDefaultErrorHandler
¶ A class which defines a default handler for HTTP error responses; all responses are turned into定义HTTP错误响应的默认处理程序的类;所有响应都转换为HTTPError
exceptions.HTTPError
异常。
-
class
urllib.request.
HTTPRedirectHandler
¶ A class to handle redirections.处理重定向的类。
-
class
urllib.request.
HTTPCookieProcessor
(cookiejar=None)¶ A class to handle HTTP Cookies.处理HTTP Cookie的类。
-
class
urllib.request.
ProxyHandler
(proxies=None)¶ Cause requests to go through a proxy.使请求通过代理。If proxies is given, it must be a dictionary mapping protocol names to URLs of proxies.如果给定了proxies,则必须是将协议名称映射到代理URL的字典。The default is to read the list of proxies from the environment variables默认情况下,从环境变量<protocol>_proxy
.<protocol>_proxy
读取代理列表。If no proxy environment variables are set, then in a Windows environment proxy settings are obtained from the registry’s Internet Settings section, and in a macOS environment proxy information is retrieved from the System Configuration Framework.如果未设置代理环境变量,则在Windows环境中,从注册表的Internet设置部分获取代理设置,在macOS环境中,从系统配置框架检索代理信息。To disable autodetected proxy pass an empty dictionary.要禁用自动检测到的代理,请传递一个空字典。Theno_proxy
environment variable can be used to specify hosts which shouldn’t be reached via proxy; if set, it should be a comma-separated list of hostname suffixes, optionally with:port
appended, for examplecern.ch,ncsa.uiuc.edu,some.host:8080
.no_proxy
环境变量可用于指定不应通过代理访问的主机;如果设置,它应该是一个以逗号分隔的主机名后缀列表,可以选择附加:port
,例如cern.ch,ncsa.uiuc.edu,some.host:8080
。Note
如果设置了变量HTTP_PROXY
will be ignored if a variableREQUEST_METHOD
is set; see the documentation ongetproxies()
.REQUEST_METHOD
,HTTP_PROXY
将被忽略;请参阅getproxies()
上的文档。
-
class
urllib.request.
HTTPPasswordMgr
¶ Keep a database of保留(realm, uri) -> (user, password)
mappings.(realm, uri) -> (user, password)
映射的数据库。
-
class
urllib.request.
HTTPPasswordMgrWithDefaultRealm
¶ Keep a database of保留(realm, uri) -> (user, password)
mappings.(realm, uri) -> (user, password)
映射的数据库。A realm ofNone
is considered a catch-all realm, which is searched if no other realm fits.None
的领域被认为是一个包罗万象的领域,如果没有其他领域适合,就会搜索它。
-
class
urllib.request.
HTTPPasswordMgrWithPriorAuth
¶ A variant ofHTTPPasswordMgrWithDefaultRealm
that also has a database ofuri -> is_authenticated
mappings.HTTPPasswordMgrWithDefaultRealm
的一个变体,也有一个uri -> is_authenticated
映射的数据库。Can be used by a BasicAuth handler to determine when to send authentication credentials immediately instead of waiting for aBasicAuth处理程序可以使用它来确定何时立即发送身份验证凭据,而不是先等待401
response first.401
响应。New in version 3.5.版本3.5中新增。
-
class
urllib.request.
AbstractBasicAuthHandler
(password_mgr=None)¶ This is a mixin class that helps with HTTP authentication, both to the remote host and to a proxy.这是一个mixin类,有助于对远程主机和代理进行HTTP身份验证。password_mgr, if given, should be something that is compatible withHTTPPasswordMgr
; refer to section HTTPPasswordMgr Objects for information on the interface that must be supported. If passwd_mgr also providesis_authenticated
andupdate_authenticated
methods (see HTTPPasswordMgrWithPriorAuth Objects), then the handler will use theis_authenticated
result for a given URI to determine whether or not to send authentication credentials with the request. Ifis_authenticated
returnsTrue
for the URI, credentials are sent. Ifis_authenticated
isFalse
, credentials are not sent, and then if a401
response is received the request is re-sent with the authentication credentials.If authentication succeeds,如果身份验证成功,则调用update_authenticated
is called to setis_authenticated
True
for the URI, so that subsequent requests to the URI or any of its super-URIs will automatically include the authentication credentials.update_authenticated
为URI设置is_authenticated
True
,以便对URI或其任何超级URI的后续请求将自动包括身份验证凭据。New in version 3.5.版本3.5中新增。Addedis_authenticated
support.
-
class
urllib.request.
HTTPBasicAuthHandler
(password_mgr=None)¶ Handle authentication with the remote host. password_mgr, if given, should be something that is compatible with
HTTPPasswordMgr
; refer to section HTTPPasswordMgr Objects for information on the interface that must be supported. HTTPBasicAuthHandler will raise aValueError
when presented with a wrong Authentication scheme.
-
class
urllib.request.
ProxyBasicAuthHandler
(password_mgr=None)¶ Handle authentication with the proxy. password_mgr, if given, should be something that is compatible with
HTTPPasswordMgr
; refer to section HTTPPasswordMgr Objects for information on the interface that must be supported.
-
class
urllib.request.
AbstractDigestAuthHandler
(password_mgr=None)¶ This is a mixin class that helps with HTTP authentication, both to the remote host and to a proxy. password_mgr, if given, should be something that is compatible with
HTTPPasswordMgr
; refer to section HTTPPasswordMgr Objects for information on the interface that must be supported.
-
class
urllib.request.
HTTPDigestAuthHandler
(password_mgr=None)¶ Handle authentication with the remote host.处理与远程主机的身份验证。password_mgr, if given, should be something that is compatible withpassword_mgr(如果给定)应该与HTTPPasswordMgr
; refer to section HTTPPasswordMgr Objects for information on the interface that must be supported.HTTPPasswordMgr
兼容;有关必须支持的接口的信息,请参阅HTTPPasswordMgr
对象一节。When both Digest Authentication Handler and Basic Authentication Handler are both added, Digest Authentication is always tried first.当同时添加摘要身份验证处理程序和基本身份验证处理程序时,总是首先尝试摘要身份验证。If the Digest Authentication returns a 40x response again, it is sent to Basic Authentication handler to Handle.如果摘要身份验证再次返回40x响应,则会将其发送给基本身份验证处理程序进行处理。This Handler method will raise a当与Digest或Basic以外的身份验证方案一起提供时,此处理程序方法将引发ValueError
when presented with an authentication scheme other than Digest or Basic.ValueError
。Changed in version 3.3:版本3.3中更改:Raise在不支持的身份验证方案上引发ValueError
on unsupported Authentication Scheme.ValueError
。
-
class
urllib.request.
ProxyDigestAuthHandler
(password_mgr=None)¶ Handle authentication with the proxy. password_mgr, if given, should be something that is compatible with
HTTPPasswordMgr
; refer to section HTTPPasswordMgr Objects for information on the interface that must be supported.
-
class
urllib.request.
HTTPHandler
¶ A class to handle opening of HTTP URLs.用于处理打开HTTP URL的类。
-
class
urllib.request.
HTTPSHandler
(debuglevel=0, context=None, check_hostname=None)¶ A class to handle opening of HTTPS URLs.用于处理打开HTTPS URL的类。context and check_hostname have the same meaning as incontext和check_hostname的含义与httpclientHTTPSConnection中的含义相同。http.client.HTTPSConnection
.Changed in version 3.2:版本3.2中更改:context and check_hostname were added.添加了context和check_hostname。
-
class
urllib.request.
FileHandler
¶ Open local files.打开本地文件。
-
class
urllib.request.
DataHandler
¶ Open data URLs.打开数据URL。New in version 3.4.版本3.4中新增。
-
class
urllib.request.
FTPHandler
¶ Open FTP URLs.打开FTP URL。
-
class
urllib.request.
CacheFTPHandler
¶ Open FTP URLs, keeping a cache of open FTP connections to minimize delays.打开FTP URL,保留打开的FTP连接的缓存,以最小化延迟。
-
class
urllib.request.
UnknownHandler
¶ A catch-all class to handle unknown URLs.一个处理未知URL的全包类。
-
class
urllib.request.
HTTPErrorProcessor
¶ Process HTTP error responses.处理HTTP错误响应。
Request
Objects对象¶
The following methods describe 以下方法描述了Request
’s public interface, and so all may be overridden in subclasses. Request
的公共接口,因此所有方法都可以在子类中重写。It also defines several public attributes that can be used by clients to inspect the parsed request.它还定义了几个公共属性,客户端可以使用这些属性来检查已解析的请求。
-
Request.
full_url
¶ The original URL passed to the constructor.传递给构造函数的原始URL。Changed in version 3.4.
Request.full_url
is a property with setter, getter and a deleter.是具有setter、getter和deleter的属性。Getting获取full_url
returns the original request URL with the fragment, if it was present.full_url
返回原始请求url和片段(如果存在)。
-
Request.
type
¶ The URI scheme.URI方案。
-
Request.
host
¶ The URI authority, typically a host, but may also contain a port separated by a colon.URI授权,通常是主机,但也可能包含由冒号分隔的端口。
-
Request.
origin_req_host
¶ The original host for the request, without port.请求的原始主机,不带端口。
-
Request.
selector
¶ The URI path.URI路径。If the如果Request
uses a proxy, then selector will be the full URL that is passed to the proxy.Request
使用代理,则选择器将是传递给代理的完整URL。
-
Request.
data
¶ The entity body for the request, or请求的实体主体,如果未指定,则为None
if not specified.None
。Changed in version 3.4:版本3.4中更改:Changing value of更改Request.data
now deletes “Content-Length” header if it was previously set or calculated.Request.data
的值现在会删除以前设置或计算的“内容长度”标题。
-
Request.
unverifiable
¶ boolean, indicates whether the request is unverifiable as defined by RFC 2965.布尔值,指示请求是否无法验证,如RFC 2965所定义。
-
Request.
method
¶ The HTTP request method to use.要使用的HTTP请求方法。By default its value isNone
, which means thatget_method()
will do its normal computation of the method to be used. Its value can be set (thus overriding the default computation inget_method()
) either by providing a default value by setting it at the class level in aRequest
subclass, or by passing a value in to theRequest
constructor via the method argument.New in version 3.3.版本3.3中新增。Changed in version 3.4:版本3.4中更改:A default value can now be set in subclasses; previously it could only be set via the constructor argument.现在可以在子类中设置默认值;以前只能通过构造函数参数设置。
-
Request.
get_method
()¶ Return a string indicating the HTTP request method. If
Request.method
is notNone
, return its value, otherwise return'GET'
ifRequest.data
isNone
, or'POST'
if it’s not. This is only meaningful for HTTP requests.Changed in version 3.3:版本3.3中更改:get_method now looks at the value ofRequest.method
.get_method
现在查看Request.method
的值。
-
Request.
add_header
(key, val)¶ Add another header to the request. Headers are currently ignored by all handlers except HTTP handlers, where they are added to the list of headers sent to the server.向请求添加另一个标头。头当前被所有处理程序忽略,HTTP处理程序除外,HTTP处理程序将头添加到发送到服务器的头列表中。Note that there cannot be more than one header with the same name, and later calls will overwrite previous calls in case the key collides.请注意,不能有多个标题具有相同的名称,以后的调用将覆盖以前的调用,以防key发生冲突。Currently, this is no loss of HTTP functionality, since all headers which have meaning when used more than once have a (header-specific) way of gaining the same functionality using only one header.目前,这并不是HTTP功能的损失,因为当多次使用时具有意义的所有标头都有一种(特定于标头的)方式,可以仅使用一个标头获得相同的功能。
-
Request.
add_unredirected_header
(key, header)¶ Add a header that will not be added to a redirected request.添加不会添加到重定向请求的标头。
-
Request.
has_header
(header)¶ Return whether the instance has the named header (checks both regular and unredirected).返回实例是否具有命名头(检查常规头和未定向头)。
-
Request.
remove_header
(header)¶ Remove named header from the request instance (both from regular and unredirected headers).从请求实例中删除命名头(从常规头和未定向头)。New in version 3.4.版本3.4中新增。
-
Request.
get_full_url
()¶ Return the URL given in the constructor.返回构造函数中给定的URL。Changed in version 3.4.在版本3.4中更改。Returns退换商品Request.full_url
-
Request.
set_proxy
(host, type)¶ Prepare the request by connecting to a proxy server.通过连接到代理服务器来准备请求。The host and type will replace those of the instance, and the instance’s selector will be the original URL given in the constructor.host和type将替换实例的主机和类型,实例的选择器将是构造函数中给定的原始URL。
-
Request.
get_header
(header_name, default=None)¶ Return the value of the given header.返回给定标题的值。If the header is not present, return the default value.如果标题不存在,则返回默认值。
-
Request.
header_items
()¶ Return a list of tuples (header_name, header_value) of the Request headers.返回请求头的元组(header_name, header_value)
的列表。
Changed in version 3.4:版本3.4中更改: The request methods add_data, has_data, get_data, get_type, get_host, get_selector, get_origin_req_host and is_unverifiable that were deprecated since 3.3 have been removed.请求方法add_data、has_data、get_data、get_type、get_host、get_selector、get_origin_req_host和is_unverifiable自3.3删除以来一直被弃用。
OpenerDirector
Objects对象¶
OpenerDirector
instances have the following methods:实例有以下方法:
-
OpenerDirector.
add_handler
(handler)¶ handler should be an instance ofhandler应该是BaseHandler
.BaseHandler
的实例。The following methods are searched, and added to the possible chains (note that HTTP errors are a special case).搜索以下方法,并将其添加到可能的链中(请注意,HTTP错误是一种特例)。Note that, in the following, protocol should be replaced with the actual protocol to handle, for example注意,在下面的示例中,应该用实际要处理的protocol替换协议,例如http_response()
would be the HTTP protocol response handler.http_response()
将是HTTP协议响应处理程序。Also type should be replaced with the actual HTTP code, for example此外,type应该替换为实际的HTTP代码,例如http_error_404()
would handle HTTP 404 errors.http_error_404()
将处理HTTP 404错误。<protocol>_open()
—signal that the handler knows how to open protocol处理程序知道如何打开协议的信号 URLs.See请参阅BaseHandler.<protocol>_open()
for more information.BaseHandler.<protocol>_open()
了解更多信息。http_error_<type>()
—signal that the handler knows how to handle HTTP errors with HTTP error code type.表示处理程序知道如何使用HTTP错误代码type处理HTTP错误的信号。See
BaseHandler.http_error_<nnn>()
for more information.<protocol>_error()
—signal that the handler knows how to handle errors from (non-表示处理程序知道如何处理来自(非http)protocol的错误的信号。http
) protocol.<protocol>_request()
—signal that the handler knows how to pre-process protocol requests.表明处理程序知道如何预处理protocol请求的信号。See请参阅BaseHandler.<protocol>_request()
for more information.BaseHandler.<protocol>_request()
,了解更多信息。<protocol>_response()
—signal that the handler knows how to post-process protocol responses.表示处理程序知道如何对protocol响应进行后期处理的信号。See请参阅BaseHandler.<protocol>_response()
for more information.BaseHandler.<protocol>_response()
,了解更多信息。
-
OpenerDirector.
open
(url, data=None[, timeout])¶ Open the given url (which can be a request object or a string), optionally passing the given data. Arguments, return values and exceptions raised are the same as those of
urlopen()
(which simply calls theopen()
method on the currently installed globalOpenerDirector
).The optional timeout parameter specifies a timeout in seconds for blocking operations like the connection attempt (if not specified, the global default timeout setting will be used).可选timeout参数以秒为单位指定阻塞操作(如未指定,将使用全局默认超时设置)的超时,如连接尝试。The timeout feature actually works only for HTTP, HTTPS and FTP connections.超时功能实际上仅适用于HTTP、HTTPS和FTP连接。
-
OpenerDirector.
error
(proto, *args)¶ Handle an error of the given protocol.处理给定协议的错误。This will call the registered error handlers for the given protocol with the given arguments (which are protocol specific).这将使用给定的参数(特定于协议)调用给定协议的已注册错误处理程序。The HTTP protocol is a special case which uses the HTTP response code to determine the specific error handler; refer to theHTTP协议是一种特殊情况,它使用HTTP响应代码来确定特定的错误处理程序;请参阅处理程序类的http_error_<type>()
methods of the handler classes.http_error_<type>()
方法。Return values and exceptions raised are the same as those of返回值和引发的异常与urlopen()
.urlopen()
的返回值和异常相同。
OpenerDirector
objects open URLs in three stages:对象分三个阶段打开URL:
The order in which these methods are called within each stage is determined by sorting the handler instances.在每个阶段中调用这些方法的顺序是通过排序处理程序实例来确定的。
Every handler with a method named like每个具有类似于<protocol>_request()
has that method called to pre-process the request.<protocol>_request()
方法的处理程序都会调用该方法来预处理请求。Handlers with a method named like
<protocol>_open()
are called to handle the request. This stage ends when a handler either returns a non-None
value (ie. a response), or raises an exception (usuallyURLError
). Exceptions are allowed to propagate.In fact, the above algorithm is first tried for methods named事实上,上述算法首先用于名为default_open()
.default_open()
的方法。If all such methods return如果所有这些方法都返回None
, the algorithm is repeated for methods named like<protocol>_open()
.None
,则对名为<protocol>_open()
的方法重复该算法。If all such methods return如果所有这些方法都返回None
, the algorithm is repeated for methods namedunknown_open()
.None
,则对名为unknown_open()
的方法重复该算法。Note that the implementation of these methods may involve calls of the parent注意,这些方法的实现可能涉及调用父OpenerDirector
instance’sopen()
anderror()
methods.OpenerDirector
实例的open()
和error()
方法。Every handler with a method named like每个名为<protocol>_response()
has that method called to post-process the response.<protocol>_response()
的处理程序都调用了该方法来对响应进行后期处理。
BaseHandler
Objects对象¶
BaseHandler
objects provide a couple of methods that are directly useful, and others that are meant to be used by derived classes. 对象提供了两个直接有用的方法,以及其他用于派生类的方法。These are intended for direct use:这些用于直接使用:
-
BaseHandler.
add_parent
(director)¶ Add a director as parent.添加一个董事作为家长。
-
BaseHandler.
close
()¶ Remove any parents.删除所有父项。
The following attribute and methods should only be used by classes derived from 以下属性和方法只能由从BaseHandler
.BaseHandler
派生的类使用。
Note
The convention has been adopted that subclasses defining 采用的约定是,<protocol>_request()
or <protocol>_response()
methods are named *Processor
; all others are named *Handler
.<protocol>_request()
或<protocol>_response()
方法的子类命名为*Processor
;所有其他的都被命名为*Handler
。
-
BaseHandler.
parent
¶ A valid有效的OpenerDirector
, which can be used to open using a different protocol, or handle errors.OpenerDirector
,可用于使用不同协议打开,或处理错误。
-
BaseHandler.
default_open
(req)¶ This method is not defined in
BaseHandler
, but subclasses should define it if they want to catch all URLs.This method, if implemented, will be called by the parent
OpenerDirector
. It should return a file-like object as described in the return value of theopen()
ofOpenerDirector
, orNone
. It should raiseURLError
, unless a truly exceptional thing happens (for example,MemoryError
should not be mapped toURLError
).This method will be called before any protocol-specific open method.此方法将在任何特定于协议的开放方法之前调用。
-
BaseHandler.<protocol>_open(req)
This method is not defined in
BaseHandler
, but subclasses should define it if they want to handle URLs with the given protocol.This method, if defined, will be called by the parent
OpenerDirector
. Return values should be the same as fordefault_open()
.
-
BaseHandler.
unknown_open
(req)¶ This method is not defined in此方法未在BaseHandler
, but subclasses should define it if they want to catch all URLs with no specific registered handler to open it.BaseHandler
中定义,但如果子类希望捕获没有特定注册处理程序打开的所有URL,则应定义它。This method, if implemented, will be called by the此方法如果实现,将由parent
OpenerDirector
.parent
OpenerDirector
调用。Return values should be the same as for返回值应与default_open()
.default_open()
相同。
-
BaseHandler.
http_error_default
(req, fp, code, msg, hdrs)¶ This method is not defined in此方法未在BaseHandler
, but subclasses should override it if they intend to provide a catch-all for otherwise unhandled HTTP errors.BaseHandler
中定义,但如果子类打算为其他未处理的HTTP错误提供全包,则应重写此方法。It will be called automatically by the它将由获得错误的OpenerDirector
getting the error, and should not normally be called in other circumstances.OpenerDirector
自动调用,通常不应在其他情况下调用。req will be a
Request
object, fp will be a file-like object with the HTTP error body, code will be the three-digit code of the error, msg will be the user-visible explanation of the code and hdrs will be a mapping object with the headers of the error.Return values and exceptions raised should be the same as those of返回值和引发的异常应与urlopen()
.urlopen()
的返回值和异常相同。
-
BaseHandler.http_error_<nnn>(req, fp, code, msg, hdrs)
nnn
should be a three-digit HTTP error code.应该是三位HTTP错误代码。This method is also not defined in此方法也未在BaseHandler
, but will be called, if it exists, on an instance of a subclass, when an HTTP error with code nnn occurs.BaseHandler
中定义,但当出现代码为nnn的HTTP错误时,将在子类的实例上调用(如果存在)。Subclasses should override this method to handle specific HTTP errors.子类应该重写此方法以处理特定的HTTP错误。Arguments, return values and exceptions raised should be the same as for引发的参数、返回值和异常应与http_error_default()
.http_error_default()
相同。
-
BaseHandler.<protocol>_request(req)
This method is not defined inBaseHandler
, but subclasses should define it if they want to pre-process requests of the given protocol.BaseHandler
中未定义此方法,但如果子类希望预处理给定协议的请求,则应定义此方法。This method, if defined, will be called by the parent此方法(如果定义)将由父OpenerDirector
.OpenerDirector
调用。req will be areq将是一个Request
object.Request
对象。The return value should be a返回值应该是Request
object.Request
对象。
-
BaseHandler.<protocol>_response(req, response)
This method is not defined inBaseHandler
, but subclasses should define it if they want to post-process responses of the given protocol.BaseHandler
中未定义此方法,但如果子类希望对给定协议的响应进行后期处理,则应定义此方法。This method, if defined, will be called by the parent此方法(如果定义)将由父OpenerDirector
.OpenerDirector
调用。req will be areq将是一个Request
object.Request
对象。response will be an object implementing the same interface as the return value ofresponse将是一个实现与urlopen()
.urlopen()
的返回值相同的接口的对象。The return value should implement the same interface as the return value of返回值应该实现与urlopen()
.urlopen()
的返回值相同的接口。
HTTPRedirectHandler
Objects对象¶
Note
Some HTTP redirections require action from this module’s client code. 一些HTTP重定向需要此模块的客户端代码执行操作。If this is the case, 如果是这种情况,则会引发HTTPError
is raised. HTTPError
。See RFC 2616 for details of the precise meanings of the various redirection codes.有关各种重定向代码的精确含义的详细信息,请参阅RFC 2616。
An 如果HTTPError
exception raised as a security consideration if the HTTPRedirectHandler is presented with a redirected URL which is not an HTTP, HTTPS or FTP URL.HTTPRedirectHandler
显示的重定向URL不是HTTP、HTTPS或FTP URL,则会引发HTTPError
异常,作为安全考虑。
-
HTTPRedirectHandler.
redirect_request
(req, fp, code, msg, hdrs, newurl)¶ Return a返回Request
orNone
in response to a redirect.Request
或None
以响应重定向。This is called by the default implementations of the当从服务器接收到重定向时,http_error_30*()
methods when a redirection is received from the server.http_error_30*()
方法的默认实现会调用此函数。If a redirection should take place, return a new如果应该进行重定向,则返回一个新Request
to allowhttp_error_30*()
to perform the redirect to newurl.Request
,以允许http_error_30*()
执行重定向到新URL。Otherwise, raise否则,如果没有其他处理程序应尝试处理此URL,则引发HTTPError
if no other handler should try to handle this URL, or returnNone
if you can’t but another handler might.HTTPError
;如果你不能,但其他处理程序可能会,则返回None
。Note
The default implementation of this method does not strictly follow RFC 2616, which says that 301 and 302 responses to该方法的默认实现并不严格遵循RFC 2616,即未经用户确认,不得自动重定向对POST请求的301和302响应。POST
requests must not be automatically redirected without confirmation by the user.In reality, browsers do allow automatic redirection of these responses, changing the POST to a实际上,浏览器确实允许自动重定向这些响应,将GET
, and the default implementation reproduces this behavior.POST
更改为GET
,默认实现再现了这种行为。
-
HTTPRedirectHandler.
http_error_301
(req, fp, code, msg, hdrs)¶ Redirect to the重定向到Location:
orURI:
URL.Location:
或URI:
URL。This method is called by the parent当获取HTTP“永久移动”响应时,父OpenerDirector
when getting an HTTP ‘moved permanently’ response.OpenerDirector
调用此方法。
-
HTTPRedirectHandler.
http_error_302
(req, fp, code, msg, hdrs)¶ The same as与http_error_301()
, but called for the ‘found’ response.http_error_301()
相同,但调用了“found”响应。
-
HTTPRedirectHandler.
http_error_303
(req, fp, code, msg, hdrs)¶ The same as与http_error_301()
, but called for the ‘see other’ response.http_error_301()
相同,但调用了“see other”响应。
-
HTTPRedirectHandler.
http_error_307
(req, fp, code, msg, hdrs)¶ The same as与http_error_301()
, but called for the ‘temporary redirect’ response.http_error_301()
相同,但调用了“临时重定向”响应。
ProxyHandler
Objects对象¶
-
ProxyHandler.<protocol>_open(request)
The对于在构造函数中给定的代理字典中具有代理的每个protocol,ProxyHandler
will have a method<protocol>_open()
for every protocol which has a proxy in the proxies dictionary given in the constructor.ProxyHandler
将有一个方法<protocol>_open()
。The method will modify requests to go through the proxy, by calling该方法将通过调用request.set_proxy()
, and call the next handler in the chain to actually execute the protocol.request.set_proxy()
修改通过代理的请求,并调用链中的下一个处理程序来实际执行协议。
HTTPPasswordMgr
Objects对象¶
These methods are available on 这些方法在HTTPPasswordMgr
and HTTPPasswordMgrWithDefaultRealm
objects.HTTPPasswordMgr
对象和HTTPPasswordMgrWithDefaultRealm
对象上可用。
-
HTTPPasswordMgr.
add_password
(realm, uri, user, passwd)¶ uri can be either a single URI, or a sequence of URIs.uri可以是单个URI,也可以是URI序列。realm, user and passwd must be strings. This causes(user, passwd)
to be used as authentication tokens when authentication for realm and a super-URI of any of the given URIs is given.
-
HTTPPasswordMgr.
find_user_password
(realm, authuri)¶ Get user/password for given realm and URI, if any.获取给定领域和URI的用户/密码(如果有)。This method will return如果没有匹配的用户/密码,此方法将返回(None, None)
if there is no matching user/password.(None, None)
。For对于HTTPPasswordMgrWithDefaultRealm
objects, the realmNone
will be searched if the given realm has no matching user/password.HTTPPasswordMgrWithDefaultRealm
对象,如果给定realm没有匹配的用户/密码,则将搜索领域None
。
HTTPPasswordMgrWithPriorAuth
Objects对象¶
This password manager extends 此密码管理器扩展了HTTPPasswordMgrWithDefaultRealm
to support tracking URIs for which authentication credentials should always be sent.HTTPPasswordMgrWithDefaultRealm
,以支持跟踪应该始终为其发送身份验证凭据的URI。
-
HTTPPasswordMgrWithPriorAuth.
add_password
(realm, uri, user, passwd, is_authenticated=False)¶ realm, uri, user, passwd are as for
HTTPPasswordMgr.add_password()
. is_authenticated sets the initial value of theis_authenticated
flag for the given URI or list of URIs.If is_authenticated is specified as如果is_authenticated指定为True
, realm is ignored.True
,则忽略realm。
-
HTTPPasswordMgrWithPriorAuth.
find_user_password
(realm, authuri)¶ Same as for与HTTPPasswordMgrWithDefaultRealm
objectsHTTPPasswordMgrWithDefaultRealm
对象相同
-
HTTPPasswordMgrWithPriorAuth.
update_authenticated
(self, uri, is_authenticated=False)¶ Update the更新给定uri或uri列表的is_authenticated
flag for the given uri or list of URIs.is_authenticated
。
-
HTTPPasswordMgrWithPriorAuth.
is_authenticated
(self, authuri)¶ Returns the current state of the返回给定URI的is_authenticated
flag for the given URI.is_authenticated
标志的当前状态。
AbstractBasicAuthHandler
Objects对象¶
-
AbstractBasicAuthHandler.
http_error_auth_reqed
(authreq, host, req, headers)¶ Handle an authentication request by getting a user/password pair, and re-trying the request.通过获取用户/密码对并重试该请求来处理身份验证请求。authreq should be the name of the header where the information about the realm is included in the request, host specifies the URL and path to authenticate for, req should be the (failed)authreq应该是请求中包含领域信息的标头的名称,host指定要进行身份验证的URL和路径,req应该是(失败的)Request
object, and headers should be the error headers.Request
对象,headers应该是错误标头。host is either an authority (e.g.host是一个权威(例如"python.org"
) or a URL containing an authority component (e.g."http://python.org/"
)."python.org"
)或包含权威组件的URL(例如"http://python.org/"
)。In either case, the authority must not contain a userinfo component (so,在任何一种情况下,权限都不能包含userinfo组件(因此,"python.org"
and"python.org:80"
are fine,"joe:password@python.org"
is not)."python.org"
和"python.org:80"
可以,而"joe:password@python.org"
不能)。
HTTPBasicAuthHandler
Objects对象¶
-
HTTPBasicAuthHandler.
http_error_401
(req, fp, code, msg, hdrs)¶ Retry the request with authentication information, if available.使用身份验证信息重试请求(如果可用)。
ProxyBasicAuthHandler
Objects对象¶
-
ProxyBasicAuthHandler.
http_error_407
(req, fp, code, msg, hdrs)¶ Retry the request with authentication information, if available.使用身份验证信息重试请求(如果可用)。
AbstractDigestAuthHandler
Objects对象¶
-
AbstractDigestAuthHandler.
http_error_auth_reqed
(authreq, host, req, headers)¶ authreq should be the name of the header where the information about the realm is included in the request, host should be the host to authenticate to, req should be the (failed)authreq应该是请求中包含领域信息的标头的名称,host应该是要进行身份验证的主机,Req应该是(失败的)Request
object, and headers should be the error headers.Request
对象,headers应该是错误标头。
HTTPDigestAuthHandler
Objects对象¶
-
HTTPDigestAuthHandler.
http_error_401
(req, fp, code, msg, hdrs)¶ Retry the request with authentication information, if available.使用身份验证信息重试请求(如果可用)。
ProxyDigestAuthHandler
Objects对象¶
-
ProxyDigestAuthHandler.
http_error_407
(req, fp, code, msg, hdrs)¶ Retry the request with authentication information, if available.使用身份验证信息重试请求(如果可用)。
HTTPHandler
Objects对象¶
-
HTTPHandler.
http_open
(req)¶ Send an HTTP request, which can be either GET or POST, depending on发送HTTP请求,该请求可以是GET或POST,具体取决于req.has_data()
.req.has_data()
。
HTTPSHandler
Objects对象¶
-
HTTPSHandler.
https_open
(req)¶ Send an HTTPS request, which can be either GET or POST, depending on发送HTTPS请求,该请求可以是GET或POST,具体取决于req.has_data()
.req.has_data()
。
FileHandler
Objects对象¶
-
FileHandler.
file_open
(req)¶ Open the file locally, if there is no host name, or the host name is如果没有主机名,或主机名为'localhost'
.'localhost'
,在本地打开文件。
DataHandler
Objects对象¶
-
DataHandler.
data_open
(req)¶ Read a data URL.读取数据URL。This kind of URL contains the content encoded in the URL itself.这种URL包含URL本身编码的内容。The data URL syntax is specified in RFC 2397.数据URL语法在RFC 2397中指定。This implementation ignores white spaces in base64 encoded data URLs so the URL may be wrapped in whatever source file it comes from.这种实现忽略了base64编码数据URL中的空格,因此URL可以包装在它来自的任何源文件中。But even though some browsers don’t mind about a missing padding at the end of a base64 encoded data URL, this implementation will raise an但是,即使一些浏览器不介意base64编码的数据URL末尾缺少填充,这种实现在这种情况下也会引发ValueError
in that case.ValueError
。
FTPHandler
Objects对象¶
-
FTPHandler.
ftp_open
(req)¶ Open the FTP file indicated by req.打开req指示的FTP文件。The login is always done with empty username and password.登录时始终使用空用户名和密码。
CacheFTPHandler
Objects对象¶
CacheFTPHandler
objects are FTPHandler
objects with the following additional methods:CacheFTPHandler
对象是具有以下附加方法的FTPHandler
对象:
-
CacheFTPHandler.
setTimeout
(t)¶ Set timeout of connections to t seconds.将连接超时设置为t秒。
-
CacheFTPHandler.
setMaxConns
(m)¶ Set maximum number of cached connections to m.将最大缓存连接数设置为m。
UnknownHandler
Objects对象¶
HTTPErrorProcessor
Objects对象¶
-
HTTPErrorProcessor.
http_response
(request, response)¶ Process HTTP error responses.处理HTTP错误响应。For 200 error codes, the response object is returned immediately.对于200个错误代码,立即返回响应对象。For non-200 error codes, this simply passes the job on to the对于非200个错误代码,这只是通过http_error_<type>()
handler methods, viaOpenerDirector.error()
.OpenerDirector.error()
将作业传递给http_error_<type>()
处理程序方法。Eventually,最终,如果没有其他处理程序处理该错误,HTTPDefaultErrorHandler
will raise anHTTPError
if no other handler handles the error.HTTPDefaultErrorHandler
将引发HTTPError
。
-
HTTPErrorProcessor.
https_response
(request, response)¶ Process HTTPS error responses.处理HTTPS错误响应。The behavior is same as行为与http_response()
.http_response()
相同。
Examples示例¶
In addition to the examples below, more examples are given in HOWTO Fetch Internet Resources Using The urllib Package.除了下面的示例外,在如何使用urllib包获取Internet资源中还提供了更多示例。
This example gets the python.org main page and displays the first 300 bytes of it.本例获取pythonorg主页并显示其前300个字节。
>>> import urllib.request
>>> with urllib.request.urlopen('http://www.python.org/') as f:
... print(f.read(300))
...
b'<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">\n\n\n<html
xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en">\n\n<head>\n
<meta http-equiv="content-type" content="text/html; charset=utf-8" />\n
<title>Python Programming '
Note that urlopen returns a bytes object. 注意,urlopen返回一个bytes对象。This is because there is no way for urlopen to automatically determine the encoding of the byte stream it receives from the HTTP server. 这是因为urlopen无法自动确定从HTTP服务器接收的字节流的编码。In general, a program will decode the returned bytes object to string once it determines or guesses the appropriate encoding.通常,一旦程序确定或猜测适当的编码,它就会将返回的bytes对象解码为字符串。
The following W3C document, https://www.w3.org/International/O-charset, lists the various ways in which an (X)HTML or an XML document could have specified its encoding information.以下W3C文档,https://www.w3.org/International/O-charset,列出了(X)HTML或XML文档指定其编码信息的各种方式。
As the python.org website uses utf-8 encoding as specified in its meta tag, we will use the same for decoding the bytes object.由于python.org网站使用其meta标记中指定的utf-8编码,因此我们将使用相同的utf-8编码来解码bytes对象。
>>> with urllib.request.urlopen('http://www.python.org/') as f:
... print(f.read(100).decode('utf-8'))
...
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
"http://www.w3.org/TR/xhtml1/DTD/xhtm
It is also possible to achieve the same result without using the context manager approach.也可以在不使用上下文管理器方法的情况下实现相同的结果。
>>> import urllib.request
>>> f = urllib.request.urlopen('http://www.python.org/')
>>> print(f.read(100).decode('utf-8'))
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
"http://www.w3.org/TR/xhtml1/DTD/xhtm
In the following example, we are sending a data-stream to the stdin of a CGI and reading the data it returns to us. 在下面的示例中,我们将数据流发送到CGI的stdin,并读取它返回给数据。Note that this example will only work when the Python installation supports SSL.请注意,只有当Python安装支持SSL时,此示例才起作用。
>>> import urllib.request
>>> req = urllib.request.Request(url='https://localhost/cgi-bin/test.cgi',
... data=b'This data is passed to stdin of the CGI')
>>> with urllib.request.urlopen(req) as f:
... print(f.read().decode('utf-8'))
...
Got Data: "This data is passed to stdin of the CGI"
The code for the sample CGI used in the above example is:上述示例中使用的示例CGI代码为:
#!/usr/bin/env python
import sys
data = sys.stdin.read()
print('Content-type: text/plain\n\nGot Data: "%s"' % data)
Here is an example of doing a 下面是使用PUT
request using Request
:Request
执行PUT
请求的示例:
import urllib.request
DATA = b'some data'
req = urllib.request.Request(url='http://localhost:8080', data=DATA, method='PUT')
with urllib.request.urlopen(req) as f:
pass
print(f.status)
print(f.reason)
Use of Basic HTTP Authentication:基本HTTP身份验证的使用:
import urllib.request
# Create an OpenerDirector with support for Basic HTTP Authentication...
auth_handler = urllib.request.HTTPBasicAuthHandler()
auth_handler.add_password(realm='PDQ Application',
uri='https://mahler:8092/site-updates.py',
user='klem',
passwd='kadidd!ehopper')
opener = urllib.request.build_opener(auth_handler)
# ...and install it globally so it can be used with urlopen.
urllib.request.install_opener(opener)
urllib.request.urlopen('http://www.example.com/login.html')
build_opener()
provides many handlers by default, including a ProxyHandler
. build_opener()
默认情况下提供了许多处理程序,包括一个ProxyHandler
。By default, 默认情况下,ProxyHandler
uses the environment variables named <scheme>_proxy
, where <scheme>
is the URL scheme involved. ProxyHandler
使用名为<scheme>_proxy
的环境变量,其中<scheme>_proxy
是所涉及的URL方案。For example, the 例如,读取http_proxy
environment variable is read to obtain the HTTP proxy’s URL.http_proxy
环境变量以获取http代理的URL。
This example replaces the default 此示例使用以编程方式提供的代理URL替换默认ProxyHandler
with one that uses programmatically-supplied proxy URLs, and adds proxy authorization support with ProxyBasicAuthHandler
.ProxyHandler
,并使用ProxyBasicAuthHandler
添加代理授权支持。
proxy_handler = urllib.request.ProxyHandler({'http': 'http://www.example.com:3128/'})
proxy_auth_handler = urllib.request.ProxyBasicAuthHandler()
proxy_auth_handler.add_password('realm', 'host', 'username', 'password')
opener = urllib.request.build_opener(proxy_handler, proxy_auth_handler)
# This time, rather than install the OpenerDirector, we use it directly:
opener.open('http://www.example.com/login.html')
Adding HTTP headers:添加HTTP标头:
Use the headers argument to the 将headers参数用于Request
constructor, or:Request
构造函数,或者:
import urllib.request
req = urllib.request.Request('http://www.example.com/')
req.add_header('Referer', 'http://www.python.org/')
# Customize the default User-Agent header value:
req.add_header('User-Agent', 'urllib-example/0.1 (Contact: . . .)')
r = urllib.request.urlopen(req)
OpenerDirector
automatically adds a User-Agent header to every 自动向每个Request
. Request
添加User-Agent标头。To change this:要更改此设置:
import urllib.request
opener = urllib.request.build_opener()
opener.addheaders = [('User-agent', 'Mozilla/5.0')]
opener.open('http://www.example.com/')
Also, remember that a few standard headers (Content-Length, Content-Type and Host)
are added when the Request
is passed to urlopen()
(or OpenerDirector.open()
).
Here is an example session that uses the 下面是一个示例会话,它使用GET
method to retrieve a URL containing parameters:GET
方法检索包含参数的URL:
>>> import urllib.request
>>> import urllib.parse
>>> params = urllib.parse.urlencode({'spam': 1, 'eggs': 2, 'bacon': 0})
>>> url = "http://www.musi-cal.com/cgi-bin/query?%s" % params
>>> with urllib.request.urlopen(url) as f:
... print(f.read().decode('utf-8'))
...
The following example uses the 下面的示例使用POST
method instead. POST
方法。Note that params output from urlencode is encoded to bytes before it is sent to urlopen as data:注意,urlencode的参数输出在作为数据发送到urlopen之前被编码为字节:
>>> import urllib.request
>>> import urllib.parse
>>> data = urllib.parse.urlencode({'spam': 1, 'eggs': 2, 'bacon': 0})
>>> data = data.encode('ascii')
>>> with urllib.request.urlopen("http://requestb.in/xrbl82xr", data) as f:
... print(f.read().decode('utf-8'))
...
The following example uses an explicitly specified HTTP proxy, overriding environment settings:以下示例使用显式指定的HTTP代理,覆盖环境设置:
>>> import urllib.request
>>> proxies = {'http': 'http://proxy.example.com:8080/'}
>>> opener = urllib.request.FancyURLopener(proxies)
>>> with opener.open("http://www.python.org") as f:
... f.read().decode('utf-8')
...
The following example uses no proxies at all, overriding environment settings:以下示例完全不使用代理,覆盖环境设置:
>>> import urllib.request
>>> opener = urllib.request.FancyURLopener({})
>>> with opener.open("http://www.python.org/") as f:
... f.read().decode('utf-8')
...
Legacy interface传统接口¶
The following functions and classes are ported from the Python 2 module 以下函数和类是从Python 2模块urllib
(as opposed to urllib2
). urllib
(与urllib2
相反)移植的。They might become deprecated at some point in the future.在未来的某个时候,它们可能会被弃用。
-
urllib.request.
urlretrieve
(url, filename=None, reporthook=None, data=None)¶ Copy a network object denoted by a URL to a local file.将URL表示的网络对象复制到本地文件。If the URL points to a local file, the object will not be copied unless filename is supplied.如果URL指向本地文件,则除非提供文件名,否则不会复制对象。Return a tuple(filename, headers)
where filename is the local file name under which the object can be found, and headers is whatever theinfo()
method of the object returned byurlopen()
returned (for a remote object). Exceptions are the same as forurlopen()
.The second argument, if present, specifies the file location to copy to (if absent, the location will be a tempfile with a generated name).第二个参数(如果存在)指定要复制到的文件位置(如果不存在,则该位置将是具有生成名称的tempfile)。The third argument, if present, is a callable that will be called once on establishment of the network connection and once after each block read thereafter.第三个参数(如果存在)是一个可调用的参数,它将在建立网络连接时调用一次,然后在读取每个块后调用一次。The callable will be passed three arguments; a count of blocks transferred so far, a block size in bytes, and the total size of the file.callable将被传递三个参数;到目前为止传输的块数、以字节为单位的块大小以及文件的总大小。The third argument may be第三个参数可能是-1
on older FTP servers which do not return a file size in response to a retrieval request.-1
,在旧的FTP服务器上,这些服务器不返回文件大小以响应检索请求。The following example illustrates the most common usage scenario:以下示例说明了最常见的使用场景:>>> import urllib.request
>>> local_filename, headers = urllib.request.urlretrieve('http://python.org/')
>>> html = open(local_filename)
>>> html.close()If the url uses the
http:
scheme identifier, the optional data argument may be given to specify aPOST
request (normally the request type isGET
). The data argument must be a bytes object in standard application/x-www-form-urlencoded format; see theurllib.parse.urlencode()
function.urlretrieve()
will raiseContentTooShortError
when it detects that the amount of data available was less than the expected amount (which is the size reported by a Content-Length header). This can occur, for example, when the download is interrupted.The Content-Length is treated as a lower bound: if there’s more data to read, urlretrieve reads more data, but if less data is available, it raises the exception.Content-Length被视为下限:如果要读取的数据更多,urlretrieve会读取更多数据,但如果可用数据更少,则会引发异常。You can still retrieve the downloaded data in this case, it is stored in the在这种情况下,您仍然可以检索下载的数据,它存储在异常实例的content
attribute of the exception instance.content
属性中。If no Content-Length header was supplied, urlretrieve can not check the size of the data it has downloaded, and just returns it.如果没有提供Content-Length标头,urlretrieve
无法检查其下载的数据的大小,只能返回它。In this case you just have to assume that the download was successful.在这种情况下,您只需假设下载成功。
-
urllib.request.
urlcleanup
()¶ Cleans up temporary files that may have been left behind by previous calls to清理以前调用urlretrieve()
.urlretrieve()
时可能留下的临时文件。
-
class
urllib.request.
URLopener
(proxies=None, **x509)¶ -
Deprecated since version 3.3.自版本3.3以来已弃用。Base class for opening and reading URLs.用于打开和读取URL的基类。Unless you need to support opening objects using schemes other than除非您需要支持使用http:
,ftp:
, orfile:
, you probably want to useFancyURLopener
.http:
、ftp:
或file:
以外的方案打开对象,否则您可能希望使用FancyURLopener
。By default, the
URLopener
class sends a User-Agent header ofurllib/VVV
, where VVV is theurllib
version number. Applications can define their own User-Agent header by subclassingURLopener
orFancyURLopener
and setting the class attributeversion
to an appropriate string value in the subclass definition.The optional proxies parameter should be a dictionary mapping scheme names to proxy URLs, where an empty dictionary turns proxies off completely.可选proxies参数应该是将方案名称映射到代理URL的字典,其中空字典会完全关闭代理。Its default value is其默认值为None
, in which case environmental proxy settings will be used if present, as discussed in the definition ofurlopen()
, above.None
,在这种情况下,将使用环境代理设置(如果存在),如上文urlopen()
的定义所述。Additional keyword parameters, collected in x509, may be used for authentication of the client when using thex509中收集的其他关键字参数可用于在使用https:
scheme.https:
架构时对客户端进行身份验证。The keywords key_file and cert_file are supported to provide an SSL key and certificate; both are needed to support client authentication.关键字key_file和cert_file支持提供SSL密钥和证书;两者都需要支持客户端身份验证。如果服务器返回错误代码,URLopener
objects will raise anOSError
exception if the server returns an error code.URLopener
对象将引发OSError
异常。-
open
(fullurl, data=None)¶ Open fullurl using the appropriate protocol.使用适当的协议打开fullurl。This method sets up cache and proxy information, then calls the appropriate open method with its input arguments.该方法设置缓存和代理信息,然后使用其输入参数调用相应的open方法。If the scheme is not recognized,如果无法识别方案,则调用open_unknown()
is called.open_unknown()
。The data argument has the same meaning as the data argument ofdata参数的含义与urlopen()
.urlopen()
的data参数相同。This method always quotes fullurl using此方法始终使用quote()
.quote()
引用fullurl。
-
open_unknown
(fullurl, data=None)¶ Overridable interface to open unknown URL types.用于打开未知URL类型的可重写接口。
-
retrieve
(url, filename=None, reporthook=None, data=None)¶ Retrieves the contents of url and places it in filename.检索url的内容并将其放置在filename中。The return value is a tuple consisting of a local filename and either an返回值是一个元组,由本地文件名和包含响应头的email.message.Message
object containing the response headers (for remote URLs) orNone
(for local URLs).email.message.Message
对象(对于远程URL)或None
(对于本地URL)组成。The caller must then open and read the contents of filename.然后调用方必须打开并读取filename的内容。If filename is not given and the URL refers to a local file, the input filename is returned.如果未给出filename,并且URL引用本地文件,则返回输入文件名。If the URL is non-local and filename is not given, the filename is the output of如果URL是非本地的,并且未给定filename,则文件名是tempfile.mktemp()
with a suffix that matches the suffix of the last path component of the input URL.tempfile.mktemp()
的输出,其后缀与输入URL的最后一个路径组件的后缀匹配。If reporthook is given, it must be a function accepting three numeric parameters: A chunk number, the maximum size chunks are read in and the total size of the download (-1 if unknown).如果给定reporthook,则它必须是一个接受三个数字参数的函数:块数、读入的最大大小块和下载的总大小(如果未知,则为-1)。It will be called once at the start and after each chunk of data is read from the network.它将在开始时以及从网络读取每个数据块后调用一次。reporthook is ignored for local URLs.本地URL忽略reporthook。If the url uses the如果url使用http:
scheme identifier, the optional data argument may be given to specify aPOST
request (normally the request type isGET
).http:
架构标识符,则可以提供可选的data参数来指定POST
请求(通常请求类型为GET
)。The data argument must in standard application/x-www-form-urlencoded format; see thedata参数必须采用标准application/x-www-form-urlencoded格式;请参阅urllib.parse.urlencode()
function.urllib.parse.urlencode()
函数。
-
version
¶ Variable that specifies the user agent of the opener object.变量,指定打开器对象的用户代理。To get要让urllib
to tell servers that it is a particular user agent, set this in a subclass as a class variable or in the constructor before calling the base constructor.urllib
告诉服务器它是一个特定的用户代理,请在调用基构造函数之前,在子类中将其设置为类变量或构造函数。
-
-
class
urllib.request.
FancyURLopener
(...)¶ -
Deprecated since version 3.3.自版本3.3以来已弃用。FancyURLopener
subclassesURLopener
providing default handling for the following HTTP response codes: 301, 302, 303, 307 and 401.FancyURLopener
子类URLopener
为以下HTTP响应代码提供默认处理:301、302、303、307和401。For the 30x response codes listed above, the Location header is used to fetch the actual URL.对于上面列出的30x响应代码,Location标头用于获取实际URL。For 401 response codes (authentication required), basic HTTP authentication is performed.对于401响应代码(需要身份验证),执行基本HTTP身份验证。For the 30x response codes, recursion is bounded by the value of the maxtries attribute, which defaults to 10.对于30x响应代码,递归受maxtries属性值的限制,该属性默认为10。For all other response codes, the method对于所有其他响应代码,将调用方法http_error_default()
is called which you can override in subclasses to handle the error appropriately.http_error_default()
,您可以在子类中重写该方法以适当地处理错误。Note
According to the letter of RFC 2616, 301 and 302 responses to POST requests must not be automatically redirected without confirmation by the user.根据RFC 2616、301和302的信函,未经用户确认,不得自动重定向对POST请求的响应。In reality, browsers do allow automatic redirection of these responses, changing the POST to a GET, and实际上,浏览器确实允许自动重定向这些响应,将POST更改为GET,而urllib
reproduces this behaviour.urllib
复制了这种行为。The parameters to the constructor are the same as those for构造函数的参数与URLopener
.URLopener
的参数相同。Note
When performing basic authentication, a在执行基本身份验证时,FancyURLopener
instance calls itsprompt_user_passwd()
method.FancyURLopener
实例调用其prompt_user_passwd()
方法。The default implementation asks the users for the required information on the controlling terminal.默认实现要求用户在控制终端上提供所需的信息。A subclass may override this method to support more appropriate behavior if needed.如果需要,子类可以重写此方法以支持更合适的行为。TheFancyURLopener
class offers one additional method that should be overloaded to provide the appropriate behavior:FancyURLopener
类提供了一个额外的方法,应该重载该方法以提供适当的行为:-
prompt_user_passwd
(host, realm)¶ Return information needed to authenticate the user at the given host in the specified security realm.返回在指定安全域中对给定主机上的用户进行身份验证所需的信息。The return value should be a tuple,返回值应该是元组(user, password)
, which can be used for basic authentication.(user, password)
,可以用于基本身份验证。The implementation prompts for this information on the terminal; an application should override this method to use an appropriate interaction model in the local environment.实现在终端上提示该信息;应用程序应该重写此方法,以便在本地环境中使用适当的交互模型。
-
urllib.request
Restrictions限制¶
Currently, only the following protocols are supported: HTTP (versions 0.9 and 1.0), FTP, local files, and data URLs.目前,仅支持以下协议:HTTP(版本0.9和1.0)、FTP、本地文件和数据URL。Changed in version 3.4:版本3.4中更改:Added support for data URLs.添加了对数据URL的支持。The caching feature ofurlretrieve()
has been disabled until someone finds the time to hack proper processing of Expiration time headers.urlretrieve()
的缓存功能已被禁用,直到有人找到时间破解到期时间头的正确处理。There should be a function to query whether a particular URL is in the cache.应该有一个查询特定URL是否在缓存中的函数。For backward compatibility, if a URL appears to point to a local file but the file can’t be opened, the URL is re-interpreted using the FTP protocol.为了向后兼容,如果URL似乎指向本地文件,但无法打开该文件,则使用FTP协议重新解释URL。This can sometimes cause confusing error messages.这有时会导致令人困惑的错误消息。Theurlopen()
andurlretrieve()
functions can cause arbitrarily long delays while waiting for a network connection to be set up.urlopen()
和urlretrieve()
函数在等待建立网络连接时可能会导致任意长的延迟。This means that it is difficult to build an interactive web client using these functions without using threads.这意味着,如果不使用线程,则很难使用这些函数构建交互式web客户端。-
The data returned byurlopen()
orurlretrieve()
is the raw data returned by the server.urlopen()
或urlretrieve()
返回的数据是服务器返回的原始数据。This may be binary data (such as an image), plain text or (for example) HTML.这可能是二进制数据(例如图像)、纯文本或(例如)HTML。The HTTP protocol provides type information in the reply header, which can be inspected by looking at the Content-Type header.HTTP协议在回复标头中提供类型信息,可以通过查看Content-Type标头来检查。If the returned data is HTML, you can use the module如果返回的数据是HTML,则可以使用模块html.parser
to parse it.html.parser
对其进行解析。 -
The code handling the FTP protocol cannot differentiate between a file and a directory.处理FTP协议的代码无法区分文件和目录。This can lead to unexpected behavior when attempting to read a URL that points to a file that is not accessible.当试图读取指向不可访问文件的URL时,这可能会导致意外行为。If the URL ends in a如果URL以/
, it is assumed to refer to a directory and will be handled accordingly./
结尾,则假设它引用了一个目录,并将进行相应的处理。But if an attempt to read a file leads to a 550 error (meaning the URL cannot be found or is not accessible, often for permission reasons), then the path is treated as a directory in order to handle the case when a directory is specified by a URL but the trailing但是,如果试图读取文件导致550错误(意味着无法找到URL或无法访问URL,通常是出于权限原因),则该路径将被视为目录,以处理由URL指定目录但尾部/
has been left off./
已被删除的情况。This can cause misleading results when you try to fetch a file whose read permissions make it inaccessible; the FTP code will try to read it, fail with a 550 error, and then perform a directory listing for the unreadable file.当您试图获取其读取权限使其无法访问的文件时,这可能会导致误导性结果;FTP代码将尝试读取它,失败时出现550错误,然后为无法读取的文件执行目录列表。If fine-grained control is needed, consider using the如果需要细粒度控制,请考虑使用ftplib
module, subclassingFancyURLopener
, or changing _urlopener to meet your needs.ftplib
模块,将FancyURLopener
子类化,或更改_urlopener以满足您的需要。
urllib.response
— Response classes used by urlliburllib使用的响应类¶
The urllib.response
module defines functions and classes which define a minimal file-like interface, including read()
and readline()
. urllib.response
模块定义了函数和类,这些函数和类定义了一个最小的类似文件的接口,包括read()
和readline()
。Functions defined by this module are used internally by the 该模块定义的函数由urllib.request
module. urllib.request
模块内部使用。The typical response object is a 典型的响应对象是urllib.response.addinfourl
instance:urllib.response.addinfourl
实例:
-
class
urllib.response.
addinfourl
¶ -
url
¶ URL of the resource retrieved, commonly used to determine if a redirect was followed.检索到的资源的URL,通常用于确定是否遵循了重定向。
-
headers
¶ Returns the headers of the response in the form of an以EmailMessage
instance.EmailMessage
实例的形式返回响应的标头。
-
status
¶ -
New in version 3.9.版本3.9中新增。Status code returned by server.服务器返回的状态代码。
-
geturl
()¶
-
info
()¶
-
code
¶
-
getstatus
()¶
-