urllib.requestExtensible library for opening URLs用于打开URL的可扩展库

Source code: Lib/urllib/request.py


The urllib.request module defines functions and classes which help in opening URLs (mostly HTTP) in a complex world — basic and digest authentication, redirections, cookies and more.urllib.request模块定义了一些函数和类,这些函数和类有助于在复杂的世界中打开URL(主要是HTTP)——基本和摘要身份验证、重定向、cookie等等。

See also

The Requests package is recommended for a higher-level HTTP client interface.建议将Requests包用于更高级别的HTTP客户端接口。

The urllib.request module defines the following functions:urllib.request模块定义以下函数:

urllib.request.urlopen(url, data=None, [timeout, ]*, cafile=None, capath=None, cadefault=False, context=None)

Open the URL url, which can be either a string or a Request object.打开URL url,可以是字符串或Request对象。

data must be an object specifying additional data to be sent to the server, or None if no such data is needed. data必须是指定要发送到服务器的其他数据的对象,如果不需要此类数据,则必须是NoneSee Request for details.详情请参见Request

urllib.request module uses HTTP/1.1 and includes Connection:close header in its HTTP requests.urllib.request模块使用HTTP/1.1,并在其HTTP请求中包含Connection:close头。

The optional timeout parameter specifies a timeout in seconds for blocking operations like the connection attempt (if not specified, the global default timeout setting will be used). 可选timeout参数以秒为单位指定阻塞操作(如未指定,将使用全局默认超时设置)的超时,如连接尝试。This actually only works for HTTP, HTTPS and FTP connections.这实际上只适用于HTTP、HTTPS和FTP连接。

If context is specified, it must be a ssl.SSLContext instance describing the various SSL options. 如果指定了context,则它必须是描述各种SSL选项的ssl.SSLContext实例。See HTTPSConnection for more details.有关更多详细信息,请参阅HTTPSConnection

The optional cafile and capath parameters specify a set of trusted CA certificates for HTTPS requests. 可选的cafilecapath参数为HTTPS请求指定一组受信任的CA证书。cafile should point to a single file containing a bundle of CA certificates, whereas capath should point to a directory of hashed certificate files. cafile应该指向包含CA证书束的单个文件,而capath应该指向散列证书文件的目录。More information can be found in ssl.SSLContext.load_verify_locations().更多信息可以在ssl.SSLContext.load_verify_locations()中找到。

The cadefault parameter is ignored.cadefault参数被忽略。

This function always returns an object which can work as a context manager and has the properties url, headers, and status. 此函数始终返回一个对象,该对象可以用作上下文管理器,并具有urlheadersstatus属性。See urllib.response.addinfourl for more detail on these properties.有关这些属性的更多详细信息,请参阅urllib.response.addinfourl

For HTTP and HTTPS URLs, this function returns a http.client.HTTPResponse object slightly modified. 对于HTTP和HTTPS URL,此函数返回稍有修改的http.client.HTTPResponse对象。In addition to the three new methods above, the msg attribute contains the same information as the reason attribute — the reason phrase returned by server — instead of the response headers as it is specified in the documentation for HTTPResponse.除上述三种新方法外,msg属性包含与reason属性相同的信息(服务器返回的原因短语),而不是HTTPResponse文档中指定的响应头。

For FTP, file, and data URLs and requests explicitly handled by legacy URLopener and FancyURLopener classes, this function returns a urllib.response.addinfourl object.对于FTP、文件和数据URL以及由遗留URLopenerFancyURLopener类显式处理的请求,此函数返回urllib.response.addinfourl对象。

Raises URLError on protocol errors.在协议错误上引发URLError

Note that None may be returned if no handler handles the request (though the default installed global OpenerDirector uses UnknownHandler to ensure this never happens).请注意,如果没有处理程序处理该请求,则可能会返回None(尽管默认安装的全局OpenerDirector使用UnknownHandler确保不会发生这种情况)。

In addition, if proxy settings are detected (for example, when a *_proxy environment variable like http_proxy is set), ProxyHandler is default installed and makes sure the requests are handled through the proxy.此外,如果检测到代理设置(例如,当设置了*_proxy环境变量(如http_proxy)时),则默认安装ProxyHandler,并确保通过代理处理请求。

The legacy urllib.urlopen function from Python 2.6 and earlier has been discontinued; urllib.request.urlopen() corresponds to the old urllib2.urlopen. Python 2.6及更早版本中遗留的urllib.urlopen函数已停止使用;urllib.request.urlopen()对应于旧的urllib2.urlopenProxy handling, which was done by passing a dictionary parameter to urllib.urlopen, can be obtained by using ProxyHandler objects.代理处理是通过将字典参数传递给urllib.urlopen来完成的,可以通过使用ProxyHandler对象来获得。

The default opener raises an auditing event urllib.Request with arguments fullurl, data, headers, method taken from the request object.默认的开启器会引发一个审核事件urllib.Request,其中包含从请求对象获取的参数fullurldataheadersmethod

Changed in version 3.2:版本3.2中更改: cafile and capath were added.添加了cafilecapath

Changed in version 3.2:版本3.2中更改: HTTPS virtual hosts are now supported if possible (that is, if ssl.HAS_SNI is true).如果可能,现在支持HTTPS虚拟主机(即,如果ssl.HAS_SNItrue)。

New in version 3.2.版本3.2中新增。data can be an iterable object.data可以是可迭代对象。

Changed in version 3.3:版本3.3中更改: cadefault was added.添加了cadefault

Changed in version 3.4.3:版本3.4.3中更改: context was added.添加了context

Changed in version 3.10:版本3.10中更改: HTTPS connection now send an ALPN extension with protocol indicator http/1.1 when no context is given. HTTPS连接现在在没有给出context的情况下发送一个带有协议指示符http/1.1的ALPN扩展。Custom context should set ALPN protocols with set_alpn_protocol().自定义context应使用set_alpn_protocol()设置ALPN协议。

Deprecated since version 3.6: 自版本3.6以来已弃用:cafile, capath and cadefault are deprecated in favor of context. cafilecapathcadefault被弃用,取而代之的是语境。Please use ssl.SSLContext.load_cert_chain() instead, or let ssl.create_default_context() select the system’s trusted CA certificates for you.请改用ssl.SSLContext.load_cert_chain(),或者让ssl.create_default_context()为您选择系统的受信任CA证书。

urllib.request.install_opener(opener)

Install an OpenerDirector instance as the default global opener. 安装OpenerDirector实例作为默认的全局开启器。Installing an opener is only necessary if you want urlopen to use that opener; otherwise, simply call OpenerDirector.open() instead of urlopen(). 只有当您希望urlopen使用开启器时,才需要安装开启器;否则,只需调用OpenerDirector.open()而不是urlopen()The code does not check for a real OpenerDirector, and any class with the appropriate interface will work.代码不会检查真正的OpenerDirector,任何具有适当接口的类都可以工作。

urllib.request.build_opener([handler, ...])

Return an OpenerDirector instance, which chains the handlers in the order given. 返回OpenerDirector实例,该实例按照给定的顺序链接处理程序。handlers can be either instances of BaseHandler, or subclasses of BaseHandler (in which case it must be possible to call the constructor without any parameters). handler可以是BaseHandler的实例,也可以是BaseHandler的子类的实例(在这种情况下,必须能够在没有任何参数的情况下调用构造函数)。Instances of the following classes will be in front of the handlers, unless the handlers contain them, instances of them or subclasses of them: ProxyHandler (if proxy settings are detected), UnknownHandler, HTTPHandler, HTTPDefaultErrorHandler, HTTPRedirectHandler, FTPHandler, FileHandler, HTTPErrorProcessor.以下类的实例将位于handler之前,除非handler包含它们、它们的实例或它们的子类:ProxyHandler(如果检测到代理设置)、UnknownHandlerHTTPHandlerHTTPDefaultErrorHandlerHTTPRedirectHandlerFTPHandlerFileHandlerHTTPErrorProcessor

If the Python installation has SSL support (i.e., if the ssl module can be imported), HTTPSHandler will also be added.如果Python安装具有SSL支持(即,如果可以导入ssl模块),则还将添加HTTPSHandler

A BaseHandler subclass may also change its handler_order attribute to modify its position in the handlers list.BaseHandler子类还可以更改其handler_order属性,以修改其在处理程序列表中的位置。

urllib.request.pathname2url(path)

Convert the pathname path from the local syntax for a path to the form used in the path component of a URL. 将路径名path从路径的本地语法转换为URL的路径组件中使用的形式。This does not produce a complete URL. 这不会生成完整的URL。The return value will already be quoted using the quote() function.返回值将已经使用quote()函数引用。

urllib.request.url2pathname(path)

Convert the path component path from a percent-encoded URL to the local syntax for a path. 将路径组件path从百分比编码URL转换为路径的本地语法。This does not accept a complete URL. 这不接受完整的URL。This function uses unquote() to decode path.此函数使用unquote()解码path

urllib.request.getproxies()

This helper function returns a dictionary of scheme to proxy server URL mappings. 此helper函数返回scheme到代理服务器URL映射的字典。It scans the environment for variables named <scheme>_proxy, in a case insensitive approach, for all operating systems first, and when it cannot find it, looks for proxy information from System Configuration for macOS and Windows Systems Registry for Windows. 它首先以不区分大小写的方法扫描环境中所有操作系统的名为<scheme>_proxy的变量,当找不到时,从macOS的系统配置和Windows的Windows系统注册表中查找代理信息。If both lowercase and uppercase environment variables exist (and disagree), lowercase is preferred.如果小写和大写环境变量都存在(并且不一致),则首选小写。

Note

If the environment variable REQUEST_METHOD is set, which usually indicates your script is running in a CGI environment, the environment variable HTTP_PROXY (uppercase _PROXY) will be ignored. 如果设置了环境变量REQUEST_METHOD,这通常表示脚本正在CGI环境中运行,则环境变量HTTP_PROXY(大写_PROXY)将被忽略。This is because that variable can be injected by a client using the “Proxy:” HTTP header. 这是因为客户端可以使用“代理:”HTTP头注入该变量。If you need to use an HTTP proxy in a CGI environment, either use ProxyHandler explicitly, or make sure the variable name is in lowercase (or at least the _proxy suffix).如果需要在CGI环境中使用HTTP代理,请显式使用ProxyHandler,或确保变量名为小写(或至少是_proxy后缀)。

The following classes are provided:提供以下类别:

classurllib.request.Request(url, data=None, headers={}, origin_req_host=None, unverifiable=False, method=None)

This class is an abstraction of a URL request.此类是URL请求的抽象。

url should be a string containing a valid URL.应该是包含有效URL的字符串。

data must be an object specifying additional data to send to the server, or None if no such data is needed. data必须是指定要发送到服务器的其他数据的对象,如果不需要此类数据,则必须为NoneCurrently HTTP requests are the only ones that use data. 目前,HTTP请求是唯一使用data的请求。The supported object types include bytes, file-like objects, and iterables of bytes-like objects. 支持的对象类型包括字节、类似文件的对象和类似字节的对象。If no Content-Length nor Transfer-Encoding header field has been provided, HTTPHandler will set these headers according to the type of data. 如果没有提供Content-LengthTransfer-Encoding标头字段,HTTPHandler将根据数据类型设置这些头。Content-Length will be used to send bytes objects, while Transfer-Encoding: chunked as specified in RFC 7230, Section 3.3.1 will be used to send files and other iterables.Content-Length将用于发送字节对象,而Transfer-Encoding: chunked按照RFC 7230第3.3.1节中的规定将用于发送文件和其他可迭代对象。

For an HTTP POST request method, data should be a buffer in the standard application/x-www-form-urlencoded format. 对于HTTP POST请求方法,data应该是标准application/x-www-form-urlencoded格式的缓冲区。The urllib.parse.urlencode() function takes a mapping or sequence of 2-tuples and returns an ASCII string in this format. urllib.parse.urlencode()函数的作用是:获取2元组的映射或序列,并以这种格式返回ASCII字符串。It should be encoded to bytes before being used as the data parameter.在用作data参数之前,应将其编码为字节。

headers should be a dictionary, and will be treated as if add_header() was called with each key and value as arguments. headers应该是一个字典,并且将被视为使用每个键和值作为参数调用add_header()This is often used to “spoof” the User-Agent header value, which is used by a browser to identify itself – some HTTP servers only allow requests coming from common browsers as opposed to scripts. 这通常用于“欺骗”User-Agent标头值,浏览器使用该值来识别自身:一些HTTP服务器只允许来自普通浏览器而不是脚本的请求。For example, Mozilla Firefox may identify itself as "Mozilla/5.0 (X11; U; Linux i686) Gecko/20071127 Firefox/2.0.0.11", while urllib’s default user agent string is "Python-urllib/2.6" (on Python 2.6). 例如,Mozilla Firefox可能将自己标识为"Mozilla/5.0 (X11; U; Linux i686) Gecko/20071127 Firefox/2.0.0.11",而urllib的默认用户代理字符串是"Python-urllib/2.6"(在Python 2.6上)。All header keys are sent in camel case.所有标题键均以驼峰形式发送。

An appropriate Content-Type header should be included if the data argument is present. 如果存在data参数,则应包括适当的Content-Type标头。If this header has not been provided and data is not None, Content-Type: application/x-www-form-urlencoded will be added as a default.如果未提供此标题,并且data不是None,则默认情况下会添加Content-Type: application/x-www-form-urlencoded

The next two arguments are only of interest for correct handling of third-party HTTP cookies:以下两个参数仅对正确处理第三方HTTP Cookie有用:

origin_req_host should be the request-host of the origin transaction, as defined by RFC 2965. 应该是原始事务的请求主机,如RFC 2965所定义。It defaults to http.cookiejar.request_host(self). 它默认为http.cookiejar.request_host(self)This is the host name or IP address of the original request that was initiated by the user. 这是用户发起的原始请求的主机名或IP地址。For example, if the request is for an image in an HTML document, this should be the request-host of the request for the page containing the image.例如,如果请求是针对HTML文档中的图像,则这应该是针对包含图像的页面的请求的请求主机。

unverifiable should indicate whether the request is unverifiable, as defined by RFC 2965. unverifiable应表明请求是否不可验证,如RFC 2965所定义。It defaults to False. 默认为FalseAn unverifiable request is one whose URL the user did not have the option to approve. 无法验证的请求是指用户无权批准其URL的请求。For example, if the request is for an image in an HTML document, and the user had no option to approve the automatic fetching of the image, this should be true.例如,如果请求的是HTML文档中的图像,而用户没有选择批准自动获取图像,那么这应该是真的。

method should be a string that indicates the HTTP request method that will be used (e.g. 'HEAD'). method应该是一个字符串,指示将使用的HTTP请求方法(例如'HEAD')。If provided, its value is stored in the method attribute and is used by get_method(). 如果提供,其值存储在method属性中,并由get_method()使用。The default is 'GET' if data is None or 'POST' otherwise. 如果dataNone,则默认为'GET',否则为'POST'Subclasses may indicate a different default method by setting the method attribute in the class itself.子类可以通过在类本身中设置method属性来指示不同的默认方法。

Note

The request will not work as expected if the data object is unable to deliver its content more than once (e.g. a file or an iterable that can produce the content only once) and the request is retried for HTTP redirects or authentication. 如果数据对象无法多次交付其内容(例如,只能生成一次内容的文件或iterable),并且请求被重试以进行HTTP重定向或身份验证,则请求将无法按预期工作。The data is sent to the HTTP server right away after the headers. data在标头之后立即发送到HTTP服务器。There is no support for a 100-continue expectation in the library.库不支持100个连续期望值。

Changed in version 3.3:版本3.3中更改: Request.method argument is added to the Request class.参数被添加到请求类。

Changed in version 3.4:版本3.4中更改: Default Request.method may be indicated at the class level.默认的Request.method可以在类级别指示。

Changed in version 3.6:版本3.6中更改: Do not raise an error if the Content-Length has not been provided and data is neither None nor a bytes object. 如果未提供Content-Length,并且data既不是None也不是bytes对象,请不要引发错误。Fall back to use chunked transfer encoding instead.退一步,改用分块传输编码。

classurllib.request.OpenerDirector

The OpenerDirector class opens URLs via BaseHandlers chained together. OpenerDirector类通过链接在一起的BaseHandler打开URL。It manages the chaining of handlers, and recovery from errors.它管理处理程序的链接,并从错误中恢复。

classurllib.request.BaseHandler

This is the base class for all registered handlers — and handles only the simple mechanics of registration.这是所有注册处理程序的基类,只处理简单的注册机制。

classurllib.request.HTTPDefaultErrorHandler

A class which defines a default handler for HTTP error responses; all responses are turned into HTTPError exceptions.定义HTTP错误响应的默认处理程序的类;所有响应都转换为HTTPError异常。

classurllib.request.HTTPRedirectHandler

A class to handle redirections.处理重定向的类。

classurllib.request.HTTPCookieProcessor(cookiejar=None)

A class to handle HTTP Cookies.处理HTTP Cookie的类。

classurllib.request.ProxyHandler(proxies=None)

Cause requests to go through a proxy. 使请求通过代理。If proxies is given, it must be a dictionary mapping protocol names to URLs of proxies. 如果给定了proxies,则必须是将协议名称映射到代理URL的字典。The default is to read the list of proxies from the environment variables <protocol>_proxy. 默认情况下,从环境变量<protocol>_proxy读取代理列表。If no proxy environment variables are set, then in a Windows environment proxy settings are obtained from the registry’s Internet Settings section, and in a macOS environment proxy information is retrieved from the System Configuration Framework.如果未设置代理环境变量,则在Windows环境中,从注册表的Internet设置部分获取代理设置,在macOS环境中,从系统配置框架检索代理信息。

To disable autodetected proxy pass an empty dictionary.要禁用自动检测到的代理,请传递一个空字典。

The no_proxy environment variable can be used to specify hosts which shouldn’t be reached via proxy; if set, it should be a comma-separated list of hostname suffixes, optionally with :port appended, for example cern.ch,ncsa.uiuc.edu,some.host:8080.no_proxy环境变量可用于指定不应通过代理访问的主机;如果设置,它应该是一个以逗号分隔的主机名后缀列表,可以选择附加:port,例如cern.ch,ncsa.uiuc.edu,some.host:8080

Note

HTTP_PROXY will be ignored if a variable REQUEST_METHOD is set; see the documentation on getproxies().如果设置了变量REQUEST_METHODHTTP_PROXY将被忽略;请参阅getproxies()上的文档。

classurllib.request.HTTPPasswordMgr

Keep a database of (realm, uri) -> (user, password) mappings.保留(realm, uri) -> (user, password)映射的数据库。

classurllib.request.HTTPPasswordMgrWithDefaultRealm

Keep a database of (realm, uri) -> (user, password) mappings. 保留(realm, uri) -> (user, password)映射的数据库。A realm of None is considered a catch-all realm, which is searched if no other realm fits.None的领域被认为是一个包罗万象的领域,如果没有其他领域适合,就会搜索它。

classurllib.request.HTTPPasswordMgrWithPriorAuth

A variant of HTTPPasswordMgrWithDefaultRealm that also has a database of uri -> is_authenticated mappings. HTTPPasswordMgrWithDefaultRealm的一个变体,也有一个uri -> is_authenticated映射的数据库。Can be used by a BasicAuth handler to determine when to send authentication credentials immediately instead of waiting for a 401 response first.BasicAuth处理程序可以使用它来确定何时立即发送身份验证凭据,而不是先等待401响应。

New in version 3.5.版本3.5中新增。

classurllib.request.AbstractBasicAuthHandler(password_mgr=None)

This is a mixin class that helps with HTTP authentication, both to the remote host and to a proxy. 这是一个mixin类,有助于对远程主机和代理进行HTTP身份验证。password_mgr, if given, should be something that is compatible with HTTPPasswordMgr; refer to section HTTPPasswordMgr Objects for information on the interface that must be supported. If passwd_mgr also provides is_authenticated and update_authenticated methods (see HTTPPasswordMgrWithPriorAuth Objects), then the handler will use the is_authenticated result for a given URI to determine whether or not to send authentication credentials with the request. If is_authenticated returns True for the URI, credentials are sent. If is_authenticated is False, credentials are not sent, and then if a 401 response is received the request is re-sent with the authentication credentials. If authentication succeeds, update_authenticated is called to set is_authenticated True for the URI, so that subsequent requests to the URI or any of its super-URIs will automatically include the authentication credentials.如果身份验证成功,则调用update_authenticated为URI设置is_authenticated True,以便对URI或其任何超级URI的后续请求将自动包括身份验证凭据。

New in version 3.5.版本3.5中新增。Added is_authenticated support.

classurllib.request.HTTPBasicAuthHandler(password_mgr=None)

Handle authentication with the remote host. password_mgr, if given, should be something that is compatible with HTTPPasswordMgr; refer to section HTTPPasswordMgr Objects for information on the interface that must be supported. HTTPBasicAuthHandler will raise a ValueError when presented with a wrong Authentication scheme.

classurllib.request.ProxyBasicAuthHandler(password_mgr=None)

Handle authentication with the proxy. password_mgr, if given, should be something that is compatible with HTTPPasswordMgr; refer to section HTTPPasswordMgr Objects for information on the interface that must be supported.

classurllib.request.AbstractDigestAuthHandler(password_mgr=None)

This is a mixin class that helps with HTTP authentication, both to the remote host and to a proxy. password_mgr, if given, should be something that is compatible with HTTPPasswordMgr; refer to section HTTPPasswordMgr Objects for information on the interface that must be supported.

classurllib.request.HTTPDigestAuthHandler(password_mgr=None)

Handle authentication with the remote host. 处理与远程主机的身份验证。password_mgr, if given, should be something that is compatible with HTTPPasswordMgr; refer to section HTTPPasswordMgr Objects for information on the interface that must be supported. password_mgr(如果给定)应该与HTTPPasswordMgr兼容;有关必须支持的接口的信息,请参阅HTTPPasswordMgr对象一节。When both Digest Authentication Handler and Basic Authentication Handler are both added, Digest Authentication is always tried first. 当同时添加摘要身份验证处理程序和基本身份验证处理程序时,总是首先尝试摘要身份验证。If the Digest Authentication returns a 40x response again, it is sent to Basic Authentication handler to Handle. 如果摘要身份验证再次返回40x响应,则会将其发送给基本身份验证处理程序进行处理。This Handler method will raise a ValueError when presented with an authentication scheme other than Digest or Basic.当与Digest或Basic以外的身份验证方案一起提供时,此处理程序方法将引发ValueError

Changed in version 3.3:版本3.3中更改: Raise ValueError on unsupported Authentication Scheme.在不支持的身份验证方案上引发ValueError

classurllib.request.ProxyDigestAuthHandler(password_mgr=None)

Handle authentication with the proxy. password_mgr, if given, should be something that is compatible with HTTPPasswordMgr; refer to section HTTPPasswordMgr Objects for information on the interface that must be supported.

classurllib.request.HTTPHandler

A class to handle opening of HTTP URLs.用于处理打开HTTP URL的类。

classurllib.request.HTTPSHandler(debuglevel=0, context=None, check_hostname=None)

A class to handle opening of HTTPS URLs. 用于处理打开HTTPS URL的类。context and check_hostname have the same meaning as in http.client.HTTPSConnection.contextcheck_hostname的含义与httpclientHTTPSConnection中的含义相同。

Changed in version 3.2:版本3.2中更改: context and check_hostname were added.添加了contextcheck_hostname

classurllib.request.FileHandler

Open local files.打开本地文件。

classurllib.request.DataHandler

Open data URLs.打开数据URL。

New in version 3.4.版本3.4中新增。

classurllib.request.FTPHandler

Open FTP URLs.打开FTP URL。

classurllib.request.CacheFTPHandler

Open FTP URLs, keeping a cache of open FTP connections to minimize delays.打开FTP URL,保留打开的FTP连接的缓存,以最小化延迟。

classurllib.request.UnknownHandler

A catch-all class to handle unknown URLs.一个处理未知URL的全包类。

classurllib.request.HTTPErrorProcessor

Process HTTP error responses.处理HTTP错误响应。

Request Objects对象

The following methods describe Request’s public interface, and so all may be overridden in subclasses. 以下方法描述了Request的公共接口,因此所有方法都可以在子类中重写。It also defines several public attributes that can be used by clients to inspect the parsed request.它还定义了几个公共属性,客户端可以使用这些属性来检查已解析的请求。

Request.full_url

The original URL passed to the constructor.传递给构造函数的原始URL。

Changed in version 3.4.

Request.full_url is a property with setter, getter and a deleter. 是具有setter、getter和deleter的属性。Getting full_url returns the original request URL with the fragment, if it was present.获取full_url返回原始请求url和片段(如果存在)。

Request.type

The URI scheme.URI方案。

Request.host

The URI authority, typically a host, but may also contain a port separated by a colon.URI授权,通常是主机,但也可能包含由冒号分隔的端口。

Request.origin_req_host

The original host for the request, without port.请求的原始主机,不带端口。

Request.selector

The URI path. URI路径。If the Request uses a proxy, then selector will be the full URL that is passed to the proxy.如果Request使用代理,则选择器将是传递给代理的完整URL。

Request.data

The entity body for the request, or None if not specified.请求的实体主体,如果未指定,则为None

Changed in version 3.4:版本3.4中更改: Changing value of Request.data now deletes “Content-Length” header if it was previously set or calculated.更改Request.data的值现在会删除以前设置或计算的“内容长度”标题。

Request.unverifiable

boolean, indicates whether the request is unverifiable as defined by RFC 2965.布尔值,指示请求是否无法验证,如RFC 2965所定义。

Request.method

The HTTP request method to use. 要使用的HTTP请求方法。By default its value is None, which means that get_method() will do its normal computation of the method to be used. Its value can be set (thus overriding the default computation in get_method()) either by providing a default value by setting it at the class level in a Request subclass, or by passing a value in to the Request constructor via the method argument.

New in version 3.3.版本3.3中新增。

Changed in version 3.4:版本3.4中更改: A default value can now be set in subclasses; previously it could only be set via the constructor argument.现在可以在子类中设置默认值;以前只能通过构造函数参数设置。

Request.get_method()

Return a string indicating the HTTP request method. If Request.method is not None, return its value, otherwise return 'GET' if Request.data is None, or 'POST' if it’s not. This is only meaningful for HTTP requests.

Changed in version 3.3:版本3.3中更改: get_method now looks at the value of Request.method.get_method现在查看Request.method的值。

Request.add_header(key, val)

Add another header to the request. Headers are currently ignored by all handlers except HTTP handlers, where they are added to the list of headers sent to the server. 向请求添加另一个标头。头当前被所有处理程序忽略,HTTP处理程序除外,HTTP处理程序将头添加到发送到服务器的头列表中。Note that there cannot be more than one header with the same name, and later calls will overwrite previous calls in case the key collides. 请注意,不能有多个标题具有相同的名称,以后的调用将覆盖以前的调用,以防key发生冲突。Currently, this is no loss of HTTP functionality, since all headers which have meaning when used more than once have a (header-specific) way of gaining the same functionality using only one header.目前,这并不是HTTP功能的损失,因为当多次使用时具有意义的所有标头都有一种(特定于标头的)方式,可以仅使用一个标头获得相同的功能。

Request.add_unredirected_header(key, header)

Add a header that will not be added to a redirected request.添加不会添加到重定向请求的标头。

Request.has_header(header)

Return whether the instance has the named header (checks both regular and unredirected).返回实例是否具有命名头(检查常规头和未定向头)。

Request.remove_header(header)

Remove named header from the request instance (both from regular and unredirected headers).从请求实例中删除命名头(从常规头和未定向头)。

New in version 3.4.版本3.4中新增。

Request.get_full_url()

Return the URL given in the constructor.返回构造函数中给定的URL。

Changed in version 3.4.在版本3.4中更改。

Returns 退换商品Request.full_url

Request.set_proxy(host, type)

Prepare the request by connecting to a proxy server. 通过连接到代理服务器来准备请求。The host and type will replace those of the instance, and the instance’s selector will be the original URL given in the constructor.hosttype将替换实例的主机和类型,实例的选择器将是构造函数中给定的原始URL。

Request.get_header(header_name, default=None)

Return the value of the given header. 返回给定标题的值。If the header is not present, return the default value.如果标题不存在,则返回默认值。

Request.header_items()

Return a list of tuples (header_name, header_value) of the Request headers.返回请求头的元组(header_name, header_value)的列表。

Changed in version 3.4:版本3.4中更改: The request methods add_data, has_data, get_data, get_type, get_host, get_selector, get_origin_req_host and is_unverifiable that were deprecated since 3.3 have been removed.请求方法add_data、has_data、get_data、get_type、get_host、get_selector、get_origin_req_host和is_unverifiable自3.3删除以来一直被弃用。

OpenerDirector Objects对象

OpenerDirector instances have the following methods:实例有以下方法:

OpenerDirector.add_handler(handler)

handler should be an instance of BaseHandler. handler应该是BaseHandler的实例。The following methods are searched, and added to the possible chains (note that HTTP errors are a special case). 搜索以下方法,并将其添加到可能的链中(请注意,HTTP错误是一种特例)。Note that, in the following, protocol should be replaced with the actual protocol to handle, for example http_response() would be the HTTP protocol response handler. 注意,在下面的示例中,应该用实际要处理的protocol替换协议,例如http_response()将是HTTP协议响应处理程序。Also type should be replaced with the actual HTTP code, for example http_error_404() would handle HTTP 404 errors.此外,type应该替换为实际的HTTP代码,例如http_error_404()将处理HTTP 404错误。

  • <protocol>_open()signal that the handler knows how to open protocol处理程序知道如何打开协议的信号 URLs.

    See BaseHandler.<protocol>_open() for more information.请参阅BaseHandler.<protocol>_open()了解更多信息。

  • http_error_<type>()signal that the handler knows how to handle HTTP errors with HTTP error code type.表示处理程序知道如何使用HTTP错误代码type处理HTTP错误的信号。

    See BaseHandler.http_error_<nnn>() for more information.

  • <protocol>_error()signal that the handler knows how to handle errors from (non-http) protocol.表示处理程序知道如何处理来自(非httpprotocol的错误的信号。

  • <protocol>_request()signal that the handler knows how to pre-process protocol requests.表明处理程序知道如何预处理protocol请求的信号。

    See BaseHandler.<protocol>_request() for more information.请参阅BaseHandler.<protocol>_request(),了解更多信息。

  • <protocol>_response()signal that the handler knows how to post-process protocol responses.表示处理程序知道如何对protocol响应进行后期处理的信号。

    See BaseHandler.<protocol>_response() for more information.请参阅BaseHandler.<protocol>_response(),了解更多信息。

OpenerDirector.open(url, data=None[, timeout])

Open the given url (which can be a request object or a string), optionally passing the given data. Arguments, return values and exceptions raised are the same as those of urlopen() (which simply calls the open() method on the currently installed global OpenerDirector). The optional timeout parameter specifies a timeout in seconds for blocking operations like the connection attempt (if not specified, the global default timeout setting will be used). 可选timeout参数以秒为单位指定阻塞操作(如未指定,将使用全局默认超时设置)的超时,如连接尝试。The timeout feature actually works only for HTTP, HTTPS and FTP connections.超时功能实际上仅适用于HTTP、HTTPS和FTP连接。

OpenerDirector.error(proto, *args)

Handle an error of the given protocol. 处理给定协议的错误。This will call the registered error handlers for the given protocol with the given arguments (which are protocol specific). 这将使用给定的参数(特定于协议)调用给定协议的已注册错误处理程序。The HTTP protocol is a special case which uses the HTTP response code to determine the specific error handler; refer to the http_error_<type>() methods of the handler classes.HTTP协议是一种特殊情况,它使用HTTP响应代码来确定特定的错误处理程序;请参阅处理程序类的http_error_<type>()方法。

Return values and exceptions raised are the same as those of urlopen().返回值和引发的异常与urlopen()的返回值和异常相同。

OpenerDirector objects open URLs in three stages:对象分三个阶段打开URL:

The order in which these methods are called within each stage is determined by sorting the handler instances.在每个阶段中调用这些方法的顺序是通过排序处理程序实例来确定的。

  1. Every handler with a method named like <protocol>_request() has that method called to pre-process the request.每个具有类似于<protocol>_request()方法的处理程序都会调用该方法来预处理请求。

  2. Handlers with a method named like <protocol>_open() are called to handle the request. This stage ends when a handler either returns a non-None value (ie. a response), or raises an exception (usually URLError). Exceptions are allowed to propagate.

    In fact, the above algorithm is first tried for methods named default_open(). 事实上,上述算法首先用于名为default_open()的方法。If all such methods return None, the algorithm is repeated for methods named like <protocol>_open(). 如果所有这些方法都返回None,则对名为<protocol>_open()的方法重复该算法。If all such methods return None, the algorithm is repeated for methods named unknown_open().如果所有这些方法都返回None,则对名为unknown_open()的方法重复该算法。

    Note that the implementation of these methods may involve calls of the parent OpenerDirector instance’s open() and error() methods.注意,这些方法的实现可能涉及调用父OpenerDirector实例的open()error()方法。

  3. Every handler with a method named like <protocol>_response() has that method called to post-process the response.每个名为<protocol>_response()的处理程序都调用了该方法来对响应进行后期处理。

BaseHandler Objects对象

BaseHandler objects provide a couple of methods that are directly useful, and others that are meant to be used by derived classes. 对象提供了两个直接有用的方法,以及其他用于派生类的方法。These are intended for direct use:这些用于直接使用:

BaseHandler.add_parent(director)

Add a director as parent.添加一个董事作为家长。

BaseHandler.close()

Remove any parents.删除所有父项。

The following attribute and methods should only be used by classes derived from BaseHandler.以下属性和方法只能由从BaseHandler派生的类使用。

Note

The convention has been adopted that subclasses defining <protocol>_request() or <protocol>_response() methods are named *Processor; all others are named *Handler.采用的约定是,<protocol>_request()<protocol>_response()方法的子类命名为*Processor;所有其他的都被命名为*Handler

BaseHandler.parent

A valid OpenerDirector, which can be used to open using a different protocol, or handle errors.有效的OpenerDirector,可用于使用不同协议打开,或处理错误。

BaseHandler.default_open(req)

This method is not defined in BaseHandler, but subclasses should define it if they want to catch all URLs.

This method, if implemented, will be called by the parent OpenerDirector. It should return a file-like object as described in the return value of the open() of OpenerDirector, or None. It should raise URLError, unless a truly exceptional thing happens (for example, MemoryError should not be mapped to URLError).

This method will be called before any protocol-specific open method.此方法将在任何特定于协议的开放方法之前调用。

BaseHandler.<protocol>_open(req)

This method is not defined in BaseHandler, but subclasses should define it if they want to handle URLs with the given protocol.

This method, if defined, will be called by the parent OpenerDirector. Return values should be the same as for default_open().

BaseHandler.unknown_open(req)

This method is not defined in BaseHandler, but subclasses should define it if they want to catch all URLs with no specific registered handler to open it.此方法未在BaseHandler中定义,但如果子类希望捕获没有特定注册处理程序打开的所有URL,则应定义它。

This method, if implemented, will be called by the parent OpenerDirector. 此方法如果实现,将由parent OpenerDirector调用。Return values should be the same as for default_open().返回值应与default_open()相同。

BaseHandler.http_error_default(req, fp, code, msg, hdrs)

This method is not defined in BaseHandler, but subclasses should override it if they intend to provide a catch-all for otherwise unhandled HTTP errors. 此方法未在BaseHandler中定义,但如果子类打算为其他未处理的HTTP错误提供全包,则应重写此方法。It will be called automatically by the OpenerDirector getting the error, and should not normally be called in other circumstances.它将由获得错误的OpenerDirector自动调用,通常不应在其他情况下调用。

req will be a Request object, fp will be a file-like object with the HTTP error body, code will be the three-digit code of the error, msg will be the user-visible explanation of the code and hdrs will be a mapping object with the headers of the error.

Return values and exceptions raised should be the same as those of urlopen().返回值和引发的异常应与urlopen()的返回值和异常相同。

BaseHandler.http_error_<nnn>(req, fp, code, msg, hdrs)

nnn should be a three-digit HTTP error code. 应该是三位HTTP错误代码。This method is also not defined in BaseHandler, but will be called, if it exists, on an instance of a subclass, when an HTTP error with code nnn occurs.此方法也未在BaseHandler中定义,但当出现代码为nnn的HTTP错误时,将在子类的实例上调用(如果存在)。

Subclasses should override this method to handle specific HTTP errors.子类应该重写此方法以处理特定的HTTP错误。

Arguments, return values and exceptions raised should be the same as for http_error_default().引发的参数、返回值和异常应与http_error_default()相同。

BaseHandler.<protocol>_request(req)

This method is not defined in BaseHandler, but subclasses should define it if they want to pre-process requests of the given protocol.BaseHandler中未定义此方法,但如果子类希望预处理给定协议的请求,则应定义此方法。

This method, if defined, will be called by the parent OpenerDirector. 此方法(如果定义)将由父OpenerDirector调用。req will be a Request object. req将是一个Request对象。The return value should be a Request object.返回值应该是Request对象。

BaseHandler.<protocol>_response(req, response)

This method is not defined in BaseHandler, but subclasses should define it if they want to post-process responses of the given protocol.BaseHandler中未定义此方法,但如果子类希望对给定协议的响应进行后期处理,则应定义此方法。

This method, if defined, will be called by the parent OpenerDirector. 此方法(如果定义)将由父OpenerDirector调用。req will be a Request object. req将是一个Request对象。response will be an object implementing the same interface as the return value of urlopen(). response将是一个实现与urlopen()的返回值相同的接口的对象。The return value should implement the same interface as the return value of urlopen().返回值应该实现与urlopen()的返回值相同的接口。

HTTPRedirectHandler Objects对象

Note

Some HTTP redirections require action from this module’s client code. 一些HTTP重定向需要此模块的客户端代码执行操作。If this is the case, HTTPError is raised. 如果是这种情况,则会引发HTTPErrorSee RFC 2616 for details of the precise meanings of the various redirection codes.有关各种重定向代码的精确含义的详细信息,请参阅RFC 2616

An HTTPError exception raised as a security consideration if the HTTPRedirectHandler is presented with a redirected URL which is not an HTTP, HTTPS or FTP URL.如果HTTPRedirectHandler显示的重定向URL不是HTTP、HTTPS或FTP URL,则会引发HTTPError异常,作为安全考虑。

HTTPRedirectHandler.redirect_request(req, fp, code, msg, hdrs, newurl)

Return a Request or None in response to a redirect. 返回RequestNone以响应重定向。This is called by the default implementations of the http_error_30*() methods when a redirection is received from the server. 当从服务器接收到重定向时,http_error_30*()方法的默认实现会调用此函数。If a redirection should take place, return a new Request to allow http_error_30*() to perform the redirect to newurl. 如果应该进行重定向,则返回一个新Request,以允许http_error_30*()执行重定向到新URL。Otherwise, raise HTTPError if no other handler should try to handle this URL, or return None if you can’t but another handler might.否则,如果没有其他处理程序应尝试处理此URL,则引发HTTPError;如果你不能,但其他处理程序可能会,则返回None

Note

The default implementation of this method does not strictly follow RFC 2616, which says that 301 and 302 responses to POST requests must not be automatically redirected without confirmation by the user. 该方法的默认实现并不严格遵循RFC 2616,即未经用户确认,不得自动重定向对POST请求的301和302响应。In reality, browsers do allow automatic redirection of these responses, changing the POST to a GET, and the default implementation reproduces this behavior.实际上,浏览器确实允许自动重定向这些响应,将POST更改为GET,默认实现再现了这种行为。

HTTPRedirectHandler.http_error_301(req, fp, code, msg, hdrs)

Redirect to the Location: or URI: URL. 重定向到Location:URI:URL。This method is called by the parent OpenerDirector when getting an HTTP ‘moved permanently’ response.当获取HTTP“永久移动”响应时,父OpenerDirector调用此方法。

HTTPRedirectHandler.http_error_302(req, fp, code, msg, hdrs)

The same as http_error_301(), but called for the ‘found’ response.http_error_301()相同,但调用了“found”响应。

HTTPRedirectHandler.http_error_303(req, fp, code, msg, hdrs)

The same as http_error_301(), but called for the ‘see other’ response.http_error_301()相同,但调用了“see other”响应。

HTTPRedirectHandler.http_error_307(req, fp, code, msg, hdrs)

The same as http_error_301(), but called for the ‘temporary redirect’ response.http_error_301()相同,但调用了“临时重定向”响应。

HTTPCookieProcessor Objects对象

HTTPCookieProcessor instances have one attribute:实例有一个属性:

HTTPCookieProcessor.cookiejar

The http.cookiejar.CookieJar in which cookies are stored.http.cookiejar.CookieJar,存储cookies。

ProxyHandler Objects对象

ProxyHandler.<protocol>_open(request)

The ProxyHandler will have a method <protocol>_open() for every protocol which has a proxy in the proxies dictionary given in the constructor. 对于在构造函数中给定的代理字典中具有代理的每个protocolProxyHandler将有一个方法<protocol>_open()The method will modify requests to go through the proxy, by calling request.set_proxy(), and call the next handler in the chain to actually execute the protocol.该方法将通过调用request.set_proxy()修改通过代理的请求,并调用链中的下一个处理程序来实际执行协议。

HTTPPasswordMgr Objects对象

These methods are available on HTTPPasswordMgr and HTTPPasswordMgrWithDefaultRealm objects.这些方法在HTTPPasswordMgr对象和HTTPPasswordMgrWithDefaultRealm对象上可用。

HTTPPasswordMgr.add_password(realm, uri, user, passwd)

uri can be either a single URI, or a sequence of URIs. uri可以是单个URI,也可以是URI序列。realm, user and passwd must be strings. This causes (user, passwd) to be used as authentication tokens when authentication for realm and a super-URI of any of the given URIs is given.

HTTPPasswordMgr.find_user_password(realm, authuri)

Get user/password for given realm and URI, if any. 获取给定领域和URI的用户/密码(如果有)。This method will return (None, None) if there is no matching user/password.如果没有匹配的用户/密码,此方法将返回(None, None)

For HTTPPasswordMgrWithDefaultRealm objects, the realm None will be searched if the given realm has no matching user/password.对于HTTPPasswordMgrWithDefaultRealm对象,如果给定realm没有匹配的用户/密码,则将搜索领域None

HTTPPasswordMgrWithPriorAuth Objects对象

This password manager extends HTTPPasswordMgrWithDefaultRealm to support tracking URIs for which authentication credentials should always be sent.此密码管理器扩展了HTTPPasswordMgrWithDefaultRealm,以支持跟踪应该始终为其发送身份验证凭据的URI。

HTTPPasswordMgrWithPriorAuth.add_password(realm, uri, user, passwd, is_authenticated=False)

realm, uri, user, passwd are as for HTTPPasswordMgr.add_password(). is_authenticated sets the initial value of the is_authenticated flag for the given URI or list of URIs. If is_authenticated is specified as True, realm is ignored.如果is_authenticated指定为True,则忽略realm

HTTPPasswordMgrWithPriorAuth.find_user_password(realm, authuri)

Same as for HTTPPasswordMgrWithDefaultRealm objectsHTTPPasswordMgrWithDefaultRealm对象相同

HTTPPasswordMgrWithPriorAuth.update_authenticated(self, uri, is_authenticated=False)

Update the is_authenticated flag for the given uri or list of URIs.更新给定uri或uri列表的is_authenticated

HTTPPasswordMgrWithPriorAuth.is_authenticated(self, authuri)

Returns the current state of the is_authenticated flag for the given URI.返回给定URI的is_authenticated标志的当前状态。

AbstractBasicAuthHandler Objects对象

AbstractBasicAuthHandler.http_error_auth_reqed(authreq, host, req, headers)

Handle an authentication request by getting a user/password pair, and re-trying the request. 通过获取用户/密码对并重试该请求来处理身份验证请求。authreq should be the name of the header where the information about the realm is included in the request, host specifies the URL and path to authenticate for, req should be the (failed) Request object, and headers should be the error headers.authreq应该是请求中包含领域信息的标头的名称,host指定要进行身份验证的URL和路径,req应该是(失败的)Request对象,headers应该是错误标头。

host is either an authority (e.g. "python.org") or a URL containing an authority component (e.g. "http://python.org/"). host是一个权威(例如"python.org")或包含权威组件的URL(例如"http://python.org/")。In either case, the authority must not contain a userinfo component (so, "python.org" and "python.org:80" are fine, "joe:password@python.org" is not).在任何一种情况下,权限都不能包含userinfo组件(因此,"python.org""python.org:80"可以,而"joe:password@python.org"不能)。

HTTPBasicAuthHandler Objects对象

HTTPBasicAuthHandler.http_error_401(req, fp, code, msg, hdrs)

Retry the request with authentication information, if available.使用身份验证信息重试请求(如果可用)。

ProxyBasicAuthHandler Objects对象

ProxyBasicAuthHandler.http_error_407(req, fp, code, msg, hdrs)

Retry the request with authentication information, if available.使用身份验证信息重试请求(如果可用)。

AbstractDigestAuthHandler Objects对象

AbstractDigestAuthHandler.http_error_auth_reqed(authreq, host, req, headers)

authreq should be the name of the header where the information about the realm is included in the request, host should be the host to authenticate to, req should be the (failed) Request object, and headers should be the error headers.authreq应该是请求中包含领域信息的标头的名称,host应该是要进行身份验证的主机,Req应该是(失败的)Request对象,headers应该是错误标头。

HTTPDigestAuthHandler Objects对象

HTTPDigestAuthHandler.http_error_401(req, fp, code, msg, hdrs)

Retry the request with authentication information, if available.使用身份验证信息重试请求(如果可用)。

ProxyDigestAuthHandler Objects对象

ProxyDigestAuthHandler.http_error_407(req, fp, code, msg, hdrs)

Retry the request with authentication information, if available.使用身份验证信息重试请求(如果可用)。

HTTPHandler Objects对象

HTTPHandler.http_open(req)

Send an HTTP request, which can be either GET or POST, depending on req.has_data().发送HTTP请求,该请求可以是GET或POST,具体取决于req.has_data()

HTTPSHandler Objects对象

HTTPSHandler.https_open(req)

Send an HTTPS request, which can be either GET or POST, depending on req.has_data().发送HTTPS请求,该请求可以是GET或POST,具体取决于req.has_data()

FileHandler Objects对象

FileHandler.file_open(req)

Open the file locally, if there is no host name, or the host name is 'localhost'.如果没有主机名,或主机名为'localhost',在本地打开文件。

Changed in version 3.2:版本3.2中更改: This method is applicable only for local hostnames. 此方法仅适用于本地主机名。When a remote hostname is given, an URLError is raised.当给定远程主机名时,会引发URLError

DataHandler Objects对象

DataHandler.data_open(req)

Read a data URL. 读取数据URL。This kind of URL contains the content encoded in the URL itself. 这种URL包含URL本身编码的内容。The data URL syntax is specified in RFC 2397. 数据URL语法在RFC 2397中指定。This implementation ignores white spaces in base64 encoded data URLs so the URL may be wrapped in whatever source file it comes from. 这种实现忽略了base64编码数据URL中的空格,因此URL可以包装在它来自的任何源文件中。But even though some browsers don’t mind about a missing padding at the end of a base64 encoded data URL, this implementation will raise an ValueError in that case.但是,即使一些浏览器不介意base64编码的数据URL末尾缺少填充,这种实现在这种情况下也会引发ValueError

FTPHandler Objects对象

FTPHandler.ftp_open(req)

Open the FTP file indicated by req. 打开req指示的FTP文件。The login is always done with empty username and password.登录时始终使用空用户名和密码。

CacheFTPHandler Objects对象

CacheFTPHandler objects are FTPHandler objects with the following additional methods:CacheFTPHandler对象是具有以下附加方法的FTPHandler对象:

CacheFTPHandler.setTimeout(t)

Set timeout of connections to t seconds.将连接超时设置为t秒。

CacheFTPHandler.setMaxConns(m)

Set maximum number of cached connections to m.将最大缓存连接数设置为m

UnknownHandler Objects对象

UnknownHandler.unknown_open()

Raise a URLError exception.引发URLError异常。

HTTPErrorProcessor Objects对象

HTTPErrorProcessor.http_response(request, response)

Process HTTP error responses.处理HTTP错误响应。

For 200 error codes, the response object is returned immediately.对于200个错误代码,立即返回响应对象。

For non-200 error codes, this simply passes the job on to the http_error_<type>() handler methods, via OpenerDirector.error(). 对于非200个错误代码,这只是通过OpenerDirector.error()将作业传递给http_error_<type>()处理程序方法。Eventually, HTTPDefaultErrorHandler will raise an HTTPError if no other handler handles the error.最终,如果没有其他处理程序处理该错误,HTTPDefaultErrorHandler将引发HTTPError

HTTPErrorProcessor.https_response(request, response)

Process HTTPS error responses.处理HTTPS错误响应。

The behavior is same as http_response().行为与http_response()相同。

Examples示例

In addition to the examples below, more examples are given in HOWTO Fetch Internet Resources Using The urllib Package.除了下面的示例外,在如何使用urllib包获取Internet资源中还提供了更多示例。

This example gets the python.org main page and displays the first 300 bytes of it.本例获取pythonorg主页并显示其前300个字节。

>>> import urllib.request
>>> with urllib.request.urlopen('http://www.python.org/') as f:
... print(f.read(300))
...
b'<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">\n\n\n<html
xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en">\n\n<head>\n
<meta http-equiv="content-type" content="text/html; charset=utf-8" />\n
<title>Python Programming '

Note that urlopen returns a bytes object. 注意,urlopen返回一个bytes对象。This is because there is no way for urlopen to automatically determine the encoding of the byte stream it receives from the HTTP server. 这是因为urlopen无法自动确定从HTTP服务器接收的字节流的编码。In general, a program will decode the returned bytes object to string once it determines or guesses the appropriate encoding.通常,一旦程序确定或猜测适当的编码,它就会将返回的bytes对象解码为字符串。

The following W3C document, https://www.w3.org/International/O-charset, lists the various ways in which an (X)HTML or an XML document could have specified its encoding information.以下W3C文档,https://www.w3.org/International/O-charset,列出了(X)HTML或XML文档指定其编码信息的各种方式。

As the python.org website uses utf-8 encoding as specified in its meta tag, we will use the same for decoding the bytes object.由于python.org网站使用其meta标记中指定的utf-8编码,因此我们将使用相同的utf-8编码来解码bytes对象。

>>> with urllib.request.urlopen('http://www.python.org/') as f:
... print(f.read(100).decode('utf-8'))
...
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
"http://www.w3.org/TR/xhtml1/DTD/xhtm

It is also possible to achieve the same result without using the context manager approach.也可以在不使用上下文管理器方法的情况下实现相同的结果。

>>> import urllib.request
>>> f = urllib.request.urlopen('http://www.python.org/')
>>> print(f.read(100).decode('utf-8'))
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
"http://www.w3.org/TR/xhtml1/DTD/xhtm

In the following example, we are sending a data-stream to the stdin of a CGI and reading the data it returns to us. 在下面的示例中,我们将数据流发送到CGI的stdin,并读取它返回给数据。Note that this example will only work when the Python installation supports SSL.请注意,只有当Python安装支持SSL时,此示例才起作用。

>>> import urllib.request
>>> req = urllib.request.Request(url='https://localhost/cgi-bin/test.cgi',
... data=b'This data is passed to stdin of the CGI')
>>> with urllib.request.urlopen(req) as f:
... print(f.read().decode('utf-8'))
...
Got Data: "This data is passed to stdin of the CGI"

The code for the sample CGI used in the above example is:上述示例中使用的示例CGI代码为:

#!/usr/bin/env python
import sys
data = sys.stdin.read()
print('Content-type: text/plain\n\nGot Data: "%s"' % data)

Here is an example of doing a PUT request using Request:下面是使用Request执行PUT请求的示例:

import urllib.request
DATA = b'some data'
req = urllib.request.Request(url='http://localhost:8080', data=DATA, method='PUT')
with urllib.request.urlopen(req) as f:
pass
print(f.status)
print(f.reason)

Use of Basic HTTP Authentication:基本HTTP身份验证的使用:

import urllib.request
# Create an OpenerDirector with support for Basic HTTP Authentication...
auth_handler = urllib.request.HTTPBasicAuthHandler()
auth_handler.add_password(realm='PDQ Application',
uri='https://mahler:8092/site-updates.py',
user='klem',
passwd='kadidd!ehopper')
opener = urllib.request.build_opener(auth_handler)
# ...and install it globally so it can be used with urlopen.
urllib.request.install_opener(opener)
urllib.request.urlopen('http://www.example.com/login.html')

build_opener() provides many handlers by default, including a ProxyHandler. build_opener()默认情况下提供了许多处理程序,包括一个ProxyHandlerBy default, ProxyHandler uses the environment variables named <scheme>_proxy, where <scheme> is the URL scheme involved. 默认情况下,ProxyHandler使用名为<scheme>_proxy的环境变量,其中<scheme>_proxy是所涉及的URL方案。For example, the http_proxy environment variable is read to obtain the HTTP proxy’s URL.例如,读取http_proxy环境变量以获取http代理的URL。

This example replaces the default ProxyHandler with one that uses programmatically-supplied proxy URLs, and adds proxy authorization support with ProxyBasicAuthHandler.此示例使用以编程方式提供的代理URL替换默认ProxyHandler,并使用ProxyBasicAuthHandler添加代理授权支持。

proxy_handler = urllib.request.ProxyHandler({'http': 'http://www.example.com:3128/'})
proxy_auth_handler = urllib.request.ProxyBasicAuthHandler()
proxy_auth_handler.add_password('realm', 'host', 'username', 'password')
opener = urllib.request.build_opener(proxy_handler, proxy_auth_handler)
# This time, rather than install the OpenerDirector, we use it directly:
opener.open('http://www.example.com/login.html')

Adding HTTP headers:添加HTTP标头:

Use the headers argument to the Request constructor, or:headers参数用于Request构造函数,或者:

import urllib.request
req = urllib.request.Request('http://www.example.com/')
req.add_header('Referer', 'http://www.python.org/')
# Customize the default User-Agent header value:
req.add_header('User-Agent', 'urllib-example/0.1 (Contact: . . .)')
r = urllib.request.urlopen(req)

OpenerDirector automatically adds a User-Agent header to every Request. 自动向每个Request添加User-Agent标头。To change this:要更改此设置:

import urllib.request
opener = urllib.request.build_opener()
opener.addheaders = [('User-agent', 'Mozilla/5.0')]
opener.open('http://www.example.com/')

Also, remember that a few standard headers (Content-Length, Content-Type and Host) are added when the Request is passed to urlopen() (or OpenerDirector.open()).

Here is an example session that uses the GET method to retrieve a URL containing parameters:下面是一个示例会话,它使用GET方法检索包含参数的URL:

>>> import urllib.request
>>> import urllib.parse
>>> params = urllib.parse.urlencode({'spam': 1, 'eggs': 2, 'bacon': 0})
>>> url = "http://www.musi-cal.com/cgi-bin/query?%s" % params
>>> with urllib.request.urlopen(url) as f:
... print(f.read().decode('utf-8'))
...

The following example uses the POST method instead. 下面的示例使用POST方法。Note that params output from urlencode is encoded to bytes before it is sent to urlopen as data:注意,urlencode的参数输出在作为数据发送到urlopen之前被编码为字节:

>>> import urllib.request
>>> import urllib.parse
>>> data = urllib.parse.urlencode({'spam': 1, 'eggs': 2, 'bacon': 0})
>>> data = data.encode('ascii')
>>> with urllib.request.urlopen("http://requestb.in/xrbl82xr", data) as f:
... print(f.read().decode('utf-8'))
...

The following example uses an explicitly specified HTTP proxy, overriding environment settings:以下示例使用显式指定的HTTP代理,覆盖环境设置:

>>> import urllib.request
>>> proxies = {'http': 'http://proxy.example.com:8080/'}
>>> opener = urllib.request.FancyURLopener(proxies)
>>> with opener.open("http://www.python.org") as f:
... f.read().decode('utf-8')
...

The following example uses no proxies at all, overriding environment settings:以下示例完全不使用代理,覆盖环境设置:

>>> import urllib.request
>>> opener = urllib.request.FancyURLopener({})
>>> with opener.open("http://www.python.org/") as f:
... f.read().decode('utf-8')
...

Legacy interface传统接口

The following functions and classes are ported from the Python 2 module urllib (as opposed to urllib2). 以下函数和类是从Python 2模块urllib(与urllib2相反)移植的。They might become deprecated at some point in the future.在未来的某个时候,它们可能会被弃用。

urllib.request.urlretrieve(url, filename=None, reporthook=None, data=None)

Copy a network object denoted by a URL to a local file. 将URL表示的网络对象复制到本地文件。If the URL points to a local file, the object will not be copied unless filename is supplied. 如果URL指向本地文件,则除非提供文件名,否则不会复制对象。Return a tuple (filename, headers) where filename is the local file name under which the object can be found, and headers is whatever the info() method of the object returned by urlopen() returned (for a remote object). Exceptions are the same as for urlopen().

The second argument, if present, specifies the file location to copy to (if absent, the location will be a tempfile with a generated name). 第二个参数(如果存在)指定要复制到的文件位置(如果不存在,则该位置将是具有生成名称的tempfile)。The third argument, if present, is a callable that will be called once on establishment of the network connection and once after each block read thereafter. 第三个参数(如果存在)是一个可调用的参数,它将在建立网络连接时调用一次,然后在读取每个块后调用一次。The callable will be passed three arguments; a count of blocks transferred so far, a block size in bytes, and the total size of the file. callable将被传递三个参数;到目前为止传输的块数、以字节为单位的块大小以及文件的总大小。The third argument may be -1 on older FTP servers which do not return a file size in response to a retrieval request.第三个参数可能是-1,在旧的FTP服务器上,这些服务器不返回文件大小以响应检索请求。

The following example illustrates the most common usage scenario:以下示例说明了最常见的使用场景:

>>> import urllib.request
>>> local_filename, headers = urllib.request.urlretrieve('http://python.org/')
>>> html = open(local_filename)
>>> html.close()

If the url uses the http: scheme identifier, the optional data argument may be given to specify a POST request (normally the request type is GET). The data argument must be a bytes object in standard application/x-www-form-urlencoded format; see the urllib.parse.urlencode() function.

urlretrieve() will raise ContentTooShortError when it detects that the amount of data available was less than the expected amount (which is the size reported by a Content-Length header). This can occur, for example, when the download is interrupted.

The Content-Length is treated as a lower bound: if there’s more data to read, urlretrieve reads more data, but if less data is available, it raises the exception.Content-Length被视为下限:如果要读取的数据更多,urlretrieve会读取更多数据,但如果可用数据更少,则会引发异常。

You can still retrieve the downloaded data in this case, it is stored in the content attribute of the exception instance.在这种情况下,您仍然可以检索下载的数据,它存储在异常实例的content属性中。

If no Content-Length header was supplied, urlretrieve can not check the size of the data it has downloaded, and just returns it. 如果没有提供Content-Length标头,urlretrieve无法检查其下载的数据的大小,只能返回它。In this case you just have to assume that the download was successful.在这种情况下,您只需假设下载成功。

urllib.request.urlcleanup()

Cleans up temporary files that may have been left behind by previous calls to urlretrieve().清理以前调用urlretrieve()时可能留下的临时文件。

classurllib.request.URLopener(proxies=None, **x509)

Deprecated since version 3.3.自版本3.3以来已弃用。

Base class for opening and reading URLs. 用于打开和读取URL的基类。Unless you need to support opening objects using schemes other than http:, ftp:, or file:, you probably want to use FancyURLopener.除非您需要支持使用http:ftp:file:以外的方案打开对象,否则您可能希望使用FancyURLopener

By default, the URLopener class sends a User-Agent header of urllib/VVV, where VVV is the urllib version number. Applications can define their own User-Agent header by subclassing URLopener or FancyURLopener and setting the class attribute version to an appropriate string value in the subclass definition.

The optional proxies parameter should be a dictionary mapping scheme names to proxy URLs, where an empty dictionary turns proxies off completely. 可选proxies参数应该是将方案名称映射到代理URL的字典,其中空字典会完全关闭代理。Its default value is None, in which case environmental proxy settings will be used if present, as discussed in the definition of urlopen(), above.其默认值为None,在这种情况下,将使用环境代理设置(如果存在),如上文urlopen()的定义所述。

Additional keyword parameters, collected in x509, may be used for authentication of the client when using the https: scheme. x509中收集的其他关键字参数可用于在使用https:架构时对客户端进行身份验证。The keywords key_file and cert_file are supported to provide an SSL key and certificate; both are needed to support client authentication.关键字key_filecert_file支持提供SSL密钥和证书;两者都需要支持客户端身份验证。

URLopener objects will raise an OSError exception if the server returns an error code.如果服务器返回错误代码,URLopener对象将引发OSError异常。

open(fullurl, data=None)

Open fullurl using the appropriate protocol. 使用适当的协议打开fullurlThis method sets up cache and proxy information, then calls the appropriate open method with its input arguments. 该方法设置缓存和代理信息,然后使用其输入参数调用相应的open方法。If the scheme is not recognized, open_unknown() is called. 如果无法识别方案,则调用open_unknown()The data argument has the same meaning as the data argument of urlopen().data参数的含义与urlopen()data参数相同。

This method always quotes fullurl using quote().此方法始终使用quote()引用fullurl

open_unknown(fullurl, data=None)

Overridable interface to open unknown URL types.用于打开未知URL类型的可重写接口。

retrieve(url, filename=None, reporthook=None, data=None)

Retrieves the contents of url and places it in filename. 检索url的内容并将其放置在filename中。The return value is a tuple consisting of a local filename and either an email.message.Message object containing the response headers (for remote URLs) or None (for local URLs). 返回值是一个元组,由本地文件名和包含响应头的email.message.Message对象(对于远程URL)或None(对于本地URL)组成。The caller must then open and read the contents of filename. 然后调用方必须打开并读取filename的内容。If filename is not given and the URL refers to a local file, the input filename is returned. 如果未给出filename,并且URL引用本地文件,则返回输入文件名。If the URL is non-local and filename is not given, the filename is the output of tempfile.mktemp() with a suffix that matches the suffix of the last path component of the input URL. 如果URL是非本地的,并且未给定filename,则文件名是tempfile.mktemp()的输出,其后缀与输入URL的最后一个路径组件的后缀匹配。If reporthook is given, it must be a function accepting three numeric parameters: A chunk number, the maximum size chunks are read in and the total size of the download (-1 if unknown). 如果给定reporthook,则它必须是一个接受三个数字参数的函数:块数、读入的最大大小块和下载的总大小(如果未知,则为-1)。It will be called once at the start and after each chunk of data is read from the network. 它将在开始时以及从网络读取每个数据块后调用一次。reporthook is ignored for local URLs.本地URL忽略reporthook

If the url uses the http: scheme identifier, the optional data argument may be given to specify a POST request (normally the request type is GET). 如果url使用http:架构标识符,则可以提供可选的data参数来指定POST请求(通常请求类型为GET)。The data argument must in standard application/x-www-form-urlencoded format; see the urllib.parse.urlencode() function.data参数必须采用标准application/x-www-form-urlencoded格式;请参阅urllib.parse.urlencode()函数。

version

Variable that specifies the user agent of the opener object. 变量,指定打开器对象的用户代理。To get urllib to tell servers that it is a particular user agent, set this in a subclass as a class variable or in the constructor before calling the base constructor.要让urllib告诉服务器它是一个特定的用户代理,请在调用基构造函数之前,在子类中将其设置为类变量或构造函数。

classurllib.request.FancyURLopener(...)

Deprecated since version 3.3.自版本3.3以来已弃用。

FancyURLopener subclasses URLopener providing default handling for the following HTTP response codes: 301, 302, 303, 307 and 401. FancyURLopener子类URLopener为以下HTTP响应代码提供默认处理:301、302、303、307和401。For the 30x response codes listed above, the Location header is used to fetch the actual URL. 对于上面列出的30x响应代码,Location标头用于获取实际URL。For 401 response codes (authentication required), basic HTTP authentication is performed. 对于401响应代码(需要身份验证),执行基本HTTP身份验证。For the 30x response codes, recursion is bounded by the value of the maxtries attribute, which defaults to 10.对于30x响应代码,递归受maxtries属性值的限制,该属性默认为10。

For all other response codes, the method http_error_default() is called which you can override in subclasses to handle the error appropriately.对于所有其他响应代码,将调用方法http_error_default(),您可以在子类中重写该方法以适当地处理错误。

Note

According to the letter of RFC 2616, 301 and 302 responses to POST requests must not be automatically redirected without confirmation by the user. 根据RFC 2616、301和302的信函,未经用户确认,不得自动重定向对POST请求的响应。In reality, browsers do allow automatic redirection of these responses, changing the POST to a GET, and urllib reproduces this behaviour.实际上,浏览器确实允许自动重定向这些响应,将POST更改为GET,而urllib复制了这种行为。

The parameters to the constructor are the same as those for URLopener.构造函数的参数与URLopener的参数相同。

Note

When performing basic authentication, a FancyURLopener instance calls its prompt_user_passwd() method. 在执行基本身份验证时,FancyURLopener实例调用其prompt_user_passwd()方法。The default implementation asks the users for the required information on the controlling terminal. 默认实现要求用户在控制终端上提供所需的信息。A subclass may override this method to support more appropriate behavior if needed.如果需要,子类可以重写此方法以支持更合适的行为。

The FancyURLopener class offers one additional method that should be overloaded to provide the appropriate behavior:FancyURLopener类提供了一个额外的方法,应该重载该方法以提供适当的行为:

prompt_user_passwd(host, realm)

Return information needed to authenticate the user at the given host in the specified security realm. 返回在指定安全域中对给定主机上的用户进行身份验证所需的信息。The return value should be a tuple, (user, password), which can be used for basic authentication.返回值应该是元组(user, password),可以用于基本身份验证。

The implementation prompts for this information on the terminal; an application should override this method to use an appropriate interaction model in the local environment.实现在终端上提示该信息;应用程序应该重写此方法,以便在本地环境中使用适当的交互模型。

urllib.request Restrictions限制

  • Currently, only the following protocols are supported: HTTP (versions 0.9 and 1.0), FTP, local files, and data URLs.目前,仅支持以下协议:HTTP(版本0.9和1.0)、FTP、本地文件和数据URL。

    Changed in version 3.4:版本3.4中更改: Added support for data URLs.添加了对数据URL的支持。

  • The caching feature of urlretrieve() has been disabled until someone finds the time to hack proper processing of Expiration time headers.urlretrieve()的缓存功能已被禁用,直到有人找到时间破解到期时间头的正确处理。

  • There should be a function to query whether a particular URL is in the cache.应该有一个查询特定URL是否在缓存中的函数。

  • For backward compatibility, if a URL appears to point to a local file but the file can’t be opened, the URL is re-interpreted using the FTP protocol. 为了向后兼容,如果URL似乎指向本地文件,但无法打开该文件,则使用FTP协议重新解释URL。This can sometimes cause confusing error messages.这有时会导致令人困惑的错误消息。

  • The urlopen() and urlretrieve() functions can cause arbitrarily long delays while waiting for a network connection to be set up. urlopen()urlretrieve()函数在等待建立网络连接时可能会导致任意长的延迟。This means that it is difficult to build an interactive web client using these functions without using threads.这意味着,如果不使用线程,则很难使用这些函数构建交互式web客户端。

  • The data returned by urlopen() or urlretrieve() is the raw data returned by the server. urlopen()urlretrieve()返回的数据是服务器返回的原始数据。This may be binary data (such as an image), plain text or (for example) HTML. 这可能是二进制数据(例如图像)、纯文本或(例如)HTML。The HTTP protocol provides type information in the reply header, which can be inspected by looking at the Content-Type header. HTTP协议在回复标头中提供类型信息,可以通过查看Content-Type标头来检查。If the returned data is HTML, you can use the module html.parser to parse it.如果返回的数据是HTML,则可以使用模块html.parser对其进行解析。

  • The code handling the FTP protocol cannot differentiate between a file and a directory. 处理FTP协议的代码无法区分文件和目录。This can lead to unexpected behavior when attempting to read a URL that points to a file that is not accessible. 当试图读取指向不可访问文件的URL时,这可能会导致意外行为。If the URL ends in a /, it is assumed to refer to a directory and will be handled accordingly. 如果URL以/结尾,则假设它引用了一个目录,并将进行相应的处理。But if an attempt to read a file leads to a 550 error (meaning the URL cannot be found or is not accessible, often for permission reasons), then the path is treated as a directory in order to handle the case when a directory is specified by a URL but the trailing / has been left off. 但是,如果试图读取文件导致550错误(意味着无法找到URL或无法访问URL,通常是出于权限原因),则该路径将被视为目录,以处理由URL指定目录但尾部/已被删除的情况。This can cause misleading results when you try to fetch a file whose read permissions make it inaccessible; the FTP code will try to read it, fail with a 550 error, and then perform a directory listing for the unreadable file. 当您试图获取其读取权限使其无法访问的文件时,这可能会导致误导性结果;FTP代码将尝试读取它,失败时出现550错误,然后为无法读取的文件执行目录列表。If fine-grained control is needed, consider using the ftplib module, subclassing FancyURLopener, or changing _urlopener to meet your needs.如果需要细粒度控制,请考虑使用ftplib模块,将FancyURLopener子类化,或更改_urlopener以满足您的需要。

urllib.responseResponse classes used by urlliburllib使用的响应类

The urllib.response module defines functions and classes which define a minimal file-like interface, including read() and readline(). urllib.response模块定义了函数和类,这些函数和类定义了一个最小的类似文件的接口,包括read()readline()Functions defined by this module are used internally by the urllib.request module. 该模块定义的函数由urllib.request模块内部使用。The typical response object is a urllib.response.addinfourl instance:典型的响应对象是urllib.response.addinfourl实例:

classurllib.response.addinfourl
url

URL of the resource retrieved, commonly used to determine if a redirect was followed.检索到的资源的URL,通常用于确定是否遵循了重定向。

headers

Returns the headers of the response in the form of an EmailMessage instance.EmailMessage实例的形式返回响应的标头。

status

New in version 3.9.版本3.9中新增。

Status code returned by server.服务器返回的状态代码。

geturl()

Deprecated since version 3.9: 自版本3.9以来已弃用:Deprecated in favor of url.反对使用url

info()

Deprecated since version 3.9: 自版本3.9以来已弃用:Deprecated in favor of headers.由于headers而反对使用

code

Deprecated since version 3.9: 自版本3.9以来已弃用:Deprecated in favor of status.由于status而反对使用。

getstatus()

Deprecated since version 3.9: 自版本3.9以来已弃用:Deprecated in favor of status.由于status而反对使用