xml.etree.ElementTree — The ElementTree XML API

Source code: Lib/xml/etree/ElementTree.py


The xml.etree.ElementTree module implements a simple and efficient API for parsing and creating XML data.xml.etree.ElementTree模块实现了一个简单高效的API,用于解析和创建XML数据。

Changed in version 3.3:版本3.3中更改: This module will use a fast implementation whenever available.本模块将在可用时使用快速实现。

Deprecated since version 3.3: The xml.etree.cElementTree module is deprecated.xml.etree.cElementTree模块已弃用。

Warning

The xml.etree.ElementTree module is not secure against maliciously constructed data. xml.etree.ElementTree模块对恶意构建的数据不安全。If you need to parse untrusted or unauthenticated data see XML vulnerabilities.如果需要解析不可信或未经身份验证的数据,请参阅XML漏洞

Tutorial辅导的

This is a short tutorial for using xml.etree.ElementTree (ET in short). 这是一个使用xml.etree.ElementTree(简称ET)的简短教程。The goal is to demonstrate some of the building blocks and basic concepts of the module.目标是演示模块的一些构建块和基本概念。

XML tree and elementsXML树和元素

XML is an inherently hierarchical data format, and the most natural way to represent it is with a tree. XML是一种固有的分层数据格式,最自然的表示方法是使用树。ET has two classes for this purpose - ElementTree represents the whole XML document as a tree, and Element represents a single node in this tree. ET为此有两个类——ElementTree将整个XML文档表示为树,Element表示该树中的单个节点。Interactions with the whole document (reading and writing to/from files) are usually done on the ElementTree level. 与整个文档的交互(读/写文件)通常在ElementTree级别完成。Interactions with a single XML element and its sub-elements are done on the Element level.与单个XML元素及其子元素的交互在Element级别完成。

Parsing XML分析XML

We’ll be using the following XML document as the sample data for this section:我们将使用以下XML文档作为本节的示例数据:

<?xml version="1.0"?>
<data>
<country name="Liechtenstein">
<rank>1</rank>
<year>2008</year>
<gdppc>141100</gdppc>
<neighbor name="Austria" direction="E"/>
<neighbor name="Switzerland" direction="W"/>
</country>
<country name="Singapore">
<rank>4</rank>
<year>2011</year>
<gdppc>59900</gdppc>
<neighbor name="Malaysia" direction="N"/>
</country>
<country name="Panama">
<rank>68</rank>
<year>2011</year>
<gdppc>13600</gdppc>
<neighbor name="Costa Rica" direction="W"/>
<neighbor name="Colombia" direction="E"/>
</country>
</data>

We can import this data by reading from a file:我们可以通过读取文件来导入此数据:

import xml.etree.ElementTree as ET
tree = ET.parse('country_data.xml')
root = tree.getroot()

Or directly from a string:或者直接从字符串:

root = ET.fromstring(country_data_as_string)

fromstring() parses XML from a string directly into an Element, which is the root element of the parsed tree. 将XML从字符串直接解析为ElementElement是解析树的根元素。Other parsing functions may create an ElementTree. 其他解析函数可以创建ElementTreeCheck the documentation to be sure.检查文档以确定。

As an Element, root has a tag and a dictionary of attributes:作为Elementroot有一个标签和一个属性字典:

>>> root.tag
'data'
>>> root.attrib
{}

It also has children nodes over which we can iterate:它还有子节点,我们可以在这些节点上迭代:

>>> for child in root:
... print(child.tag, child.attrib)
...
country {'name': 'Liechtenstein'}
country {'name': 'Singapore'}
country {'name': 'Panama'}

Children are nested, and we can access specific child nodes by index:子节点是嵌套的,我们可以通过索引访问特定的子节点:

>>> root[0][1].text
'2008'

Note

Not all elements of the XML input will end up as elements of the parsed tree. 并非XML输入的所有元素都将作为解析树的元素。Currently, this module skips over any XML comments, processing instructions, and document type declarations in the input. 目前,该模块跳过输入中的任何XML注释、处理指令和文档类型声明。Nevertheless, trees built using this module’s API rather than parsing from XML text can have comments and processing instructions in them; they will be included when generating XML output. 然而,使用该模块的API而不是从XML文本解析构建的树中可以包含注释和处理指令;它们将在生成XML输出时包含在内。A document type declaration may be accessed by passing a custom TreeBuilder instance to the XMLParser constructor.可以通过将自定义TreeBuilder实例传递给XMLParser构造函数来访问文档类型声明。

Pull API for non-blocking parsing用于非阻塞分析的Pull API

Most parsing functions provided by this module require the whole document to be read at once before returning any result. 该模块提供的大多数解析函数要求在返回任何结果之前立即读取整个文档。It is possible to use an XMLParser and feed data into it incrementally, but it is a push API that calls methods on a callback target, which is too low-level and inconvenient for most needs. 可以使用XMLParser并将数据以增量方式输入其中,但它是一个调用回调目标上的方法的推送API,这对于大多数需求来说太低级且不方便。Sometimes what the user really wants is to be able to parse XML incrementally, without blocking operations, while enjoying the convenience of fully constructed Element objects.有时,用户真正想要的是能够在不阻塞操作的情况下增量解析XML,同时享受完全构造的Element对象的便利。

The most powerful tool for doing this is XMLPullParser. 最强大的工具是XMLPullParserIt does not require a blocking read to obtain the XML data, and is instead fed with data incrementally with XMLPullParser.feed() calls. 它不需要阻塞读取来获取XML数据,而是通过XMLPullParser.feed()调用以增量方式向它提供数据。To get the parsed XML elements, call XMLPullParser.read_events(). 要获取解析的XML元素,请调用XMLPullParser.read_events()Here is an example:下面是一个示例:

>>> parser = ET.XMLPullParser(['start', 'end'])
>>> parser.feed('<mytag>sometext')
>>> list(parser.read_events())
[('start', <Element 'mytag' at 0x7fa66db2be58>)]
>>> parser.feed(' more text</mytag>')
>>> for event, elem in parser.read_events():
... print(event)
... print(elem.tag, 'text=', elem.text)
...
end

The obvious use case is applications that operate in a non-blocking fashion where the XML data is being received from a socket or read incrementally from some storage device. 显而易见的用例是以非阻塞方式运行的应用程序,其中XML数据是从套接字接收的,或者是从某个存储设备增量读取的。In such cases, blocking reads are unacceptable.在这种情况下,阻止读取是不可接受的。

Because it’s so flexible, XMLPullParser can be inconvenient to use for simpler use-cases. 因为XMLPullParser非常灵活,所以在更简单的用例中使用它可能很不方便。If you don’t mind your application blocking on reading XML data but would still like to have incremental parsing capabilities, take a look at iterparse(). 如果您不介意应用程序阻塞读取XML数据,但仍希望具有增量解析功能,请查看iterparse()It can be useful when you’re reading a large XML document and don’t want to hold it wholly in memory.当您正在阅读一个大型XML文档而不想将其完全保存在内存中时,它会很有用。

Finding interesting elements寻找有趣的元素

Element has some useful methods that help iterate recursively over all the sub-tree below it (its children, their children, and so on). 有一些有用的方法可以帮助递归遍历它下面的所有子树(它的子树、它们的子树等等)。For example, Element.iter():例如,Element.iter()

>>> for neighbor in root.iter('neighbor'):
... print(neighbor.attrib)
...
{'name': 'Austria', 'direction': 'E'}
{'name': 'Switzerland', 'direction': 'W'}
{'name': 'Malaysia', 'direction': 'N'}
{'name': 'Costa Rica', 'direction': 'W'}
{'name': 'Colombia', 'direction': 'E'}

Element.findall() finds only elements with a tag which are direct children of the current element. 仅查找带有标记的元素,这些元素是当前元素的直接子元素。Element.find() finds the first child with a particular tag, and 找到具有特定标记的第一个子项,而Element.text accesses the element’s text content. 访问元素的文本内容。Element.get() accesses the element’s attributes:访问元素的属性:

>>> for country in root.findall('country'):
... rank = country.find('rank').text
... name = country.get('name')
... print(name, rank)
...
Liechtenstein 1
Singapore 4
Panama 68

More sophisticated specification of which elements to look for is possible by using XPath.使用XPath可以更复杂地指定要查找哪些元素。

Modifying an XML File修改XML文件

ElementTree provides a simple way to build XML documents and write them to files. 提供了一种构建XML文档并将其写入文件的简单方法。The ElementTree.write() method serves this purpose.ElementTree.write()方法用于此目的。

Once created, an Element object may be manipulated by directly changing its fields (such as Element.text), adding and modifying attributes (Element.set() method), as well as adding new children (for example with Element.append()).创建Element对象后,可以通过直接更改其字段(例如Element.set())、添加和修改属性(Element方法)以及添加新的子对象(例如使用Element.append())来操作Element对象。

Let’s say we want to add one to each country’s rank, and add an updated attribute to the rank element:假设我们想在每个国家的排名中添加一个,并在排名元素中添加updated属性:

>>> for rank in root.iter('rank'):
... new_rank = int(rank.text) + 1
... rank.text = str(new_rank)
... rank.set('updated', 'yes')
...
>>> tree.write('output.xml')

Our XML now looks like this:XML现在如下所示:

<?xml version="1.0"?>
<data>
<country name="Liechtenstein">
<rank updated="yes">2</rank>
<year>2008</year>
<gdppc>141100</gdppc>
<neighbor name="Austria" direction="E"/>
<neighbor name="Switzerland" direction="W"/>
</country>
<country name="Singapore">
<rank updated="yes">5</rank>
<year>2011</year>
<gdppc>59900</gdppc>
<neighbor name="Malaysia" direction="N"/>
</country>
<country name="Panama">
<rank updated="yes">69</rank>
<year>2011</year>
<gdppc>13600</gdppc>
<neighbor name="Costa Rica" direction="W"/>
<neighbor name="Colombia" direction="E"/>
</country>
</data>

We can remove elements using Element.remove(). 我们可以使用Element.remove()删除元素。Let’s say we want to remove all countries with a rank higher than 50:假设我们想删除排名高于50的所有国家:

>>> for country in root.findall('country'):
... # using root.findall() to avoid removal during traversal
... rank = int(country.find('rank').text)
... if rank > 50:
... root.remove(country)
...
>>> tree.write('output.xml')

Note that concurrent modification while iterating can lead to problems, just like when iterating and modifying Python lists or dicts. 请注意,迭代时并发修改可能会导致问题,就像迭代和修改Python列表或dict时一样。Therefore, the example first collects all matching elements with root.findall(), and only then iterates over the list of matches.因此,该示例首先使用root.findall()收集所有匹配的元素,然后才遍历匹配列表。

Our XML now looks like this:XML现在如下所示:

<?xml version="1.0"?>
<data>
<country name="Liechtenstein">
<rank updated="yes">2</rank>
<year>2008</year>
<gdppc>141100</gdppc>
<neighbor name="Austria" direction="E"/>
<neighbor name="Switzerland" direction="W"/>
</country>
<country name="Singapore">
<rank updated="yes">5</rank>
<year>2011</year>
<gdppc>59900</gdppc>
<neighbor name="Malaysia" direction="N"/>
</country>
</data>

Building XML documents构建XML文档

The SubElement() function also provides a convenient way to create new sub-elements for a given element:SubElement()函数还提供了为给定元素创建新子元素的便捷方法:

>>> a = ET.Element('a')
>>> b = ET.SubElement(a, 'b')
>>> c = ET.SubElement(a, 'c')
>>> d = ET.SubElement(c, 'd')
>>> ET.dump(a)
<a><b /><c><d /></c></a>

Parsing XML with Namespaces使用命名空间分析XML

If the XML input has namespaces, tags and attributes with prefixes in the form prefix:sometag get expanded to {uri}sometag where the prefix is replaced by the full URI. 如果XML输入具有名称空间、标签和属性,且前缀格式为prefix:sometag,则将其扩展为{uri}sometag,其中前缀将替换为完整的uri。Also, if there is a default namespace, that full URI gets prepended to all of the non-prefixed tags.此外,如果有一个默认名称空间,则该完整URI将被添加到所有非前缀标记之前。

Here is an XML example that incorporates two namespaces, one with the prefix “fictional” and the other serving as the default namespace:下面是一个包含两个名称空间的XML示例,一个前缀为“虚构”,另一个用作默认名称空间:

<?xml version="1.0"?>
<actors xmlns:fictional="http://characters.example.com"
xmlns="http://people.example.com">
<actor>
<name>John Cleese</name>
<fictional:character>Lancelot</fictional:character>
<fictional:character>Archie Leach</fictional:character>
</actor>
<actor>
<name>Eric Idle</name>
<fictional:character>Sir Robin</fictional:character>
<fictional:character>Gunther</fictional:character>
<fictional:character>Commander Clement</fictional:character>
</actor>
</actors>

One way to search and explore this XML example is to manually add the URI to every tag or attribute in the xpath of a find() or findall():搜索和探索这个XML示例的一种方法是手动将URI添加到find()findall()的xpath中的每个标记或属性:

root = fromstring(xml_text)
for actor in root.findall('{http://people.example.com}actor'):
name = actor.find('{http://people.example.com}name')
print(name.text)
for char in actor.findall('{http://characters.example.com}character'):
print(' |-->', char.text)

A better way to search the namespaced XML example is to create a dictionary with your own prefixes and use those in the search functions:搜索命名空间XML示例的更好方法是创建一个带有自己前缀的字典,并在搜索函数中使用这些前缀:

ns = {'real_person': 'http://people.example.com',
'role': 'http://characters.example.com'}
for actor in root.findall('real_person:actor', ns):
name = actor.find('real_person:name', ns)
print(name.text)
for char in actor.findall('role:character', ns):
print(' |-->', char.text)

These two approaches both output:这两种方法都可以输出:

John Cleese
|--> Lancelot
|--> Archie Leach
Eric Idle
|--> Sir Robin
|--> Gunther
|--> Commander Clement

Additional resources其他资源

See http://effbot.org/zone/element-index.htm for tutorials and links to other docs.看见http://effbot.org/zone/element-index.htm获取教程和其他文档的链接。

XPath supportXPath支持

This module provides limited support for XPath expressions for locating elements in a tree. 此模块为XPath表达式提供了有限的支持,用于在树中查找元素。The goal is to support a small subset of the abbreviated syntax; a full XPath engine is outside the scope of the module.目标是支持缩写语法的一小部分;完整的XPath引擎超出了模块的范围。

Example实例

Here’s an example that demonstrates some of the XPath capabilities of the module. 下面是一个示例,演示了模块的一些XPath功能。We’ll be using the countrydata XML document from the Parsing XML section:我们将使用解析XML部分中的countrydata XML文档:

import xml.etree.ElementTree as ET
root = ET.fromstring(countrydata)

# Top-level elements
root.findall(".")

# All 'neighbor' grand-children of 'country' children of the top-level
# elements
root.findall("./country/neighbor")

# Nodes with name='Singapore' that have a 'year' child
root.findall(".//year/..[@name='Singapore']")

# 'year' nodes that are children of nodes with name='Singapore'
root.findall(".//*[@name='Singapore']/year")

# All 'neighbor' nodes that are the second child of their parent
root.findall(".//neighbor[2]")

For XML with namespaces, use the usual qualified {namespace}tag notation:对于带有命名空间的XML,请使用通常的限定{namespace}tag符号:

# All dublin-core "title" tags in the document
root.findall(".//{http://purl.org/dc/elements/1.1/}title")

Supported XPath syntax支持的XPath语法

Syntax语法

Meaning意思

tag

Selects all child elements with the given tag. 选择具有给定标记的所有子元素。For example, spam selects all child elements named spam, and spam/egg selects all grandchildren named egg in all children named spam. 例如,spam将选择所有名为spam的子元素,spam/egg将选择所有子元素中名为egg的所有子元素。{namespace}* selects all tags in the given namespace, 选择给定名称空间中的所有标签,{*}spam selects tags named spam in any (or no) namespace, and {}* only selects tags that are not in a namespace.在任何(或没有)命名空间中选择名为spam的标记,而{}*只选择不在命名空间中的标记。

Changed in version 3.8:版本3.8中更改: Support for star-wildcards was added.添加了对星形通配符的支持。

*

Selects all child elements, including comments and processing instructions. 选择所有子元素,包括注释和处理说明。For example, */egg selects all grandchildren named egg.例如,*/egg选择所有名为egg的孙子。

.

Selects the current node. 选择当前节点。This is mostly useful at the beginning of the path, to indicate that it’s a relative path.这在路径的开头最有用,表示它是相对路径。

//

Selects all subelements, on all levels beneath the current element. 选择当前元素下所有级别上的所有子元素。For example, .//egg selects all egg elements in the entire tree.例如,.//egg选择整个树中的所有egg元素。

..

Selects the parent element. 选择父元素。Returns None if the path attempts to reach the ancestors of the start element (the element find was called on).如果路径试图到达起始元素的祖先(调用了元素find),则返回None

[@attrib]

Selects all elements that have the given attribute.选择具有给定属性的所有元素。

[@attrib='value']

Selects all elements for which the given attribute has the given value. 选择给定属性具有给定值的所有元素。The value cannot contain quotes.该值不能包含引号。

[@attrib!='value']

Selects all elements for which the given attribute does not have the given value. 选择给定属性没有给定值的所有元素。The value cannot contain quotes.该值不能包含引号。

New in version 3.10.版本3.10中新增。

[tag]

Selects all elements that have a child named tag. 选择具有子名称tag的所有图元。Only immediate children are supported.仅支持直系子女。

[.='text']

Selects all elements whose complete text content, including descendants, equals the given text.选择其完整文本内容(包括子体)等于给定text的所有元素。

New in version 3.7.版本3.7中新增。

[.!='text']

Selects all elements whose complete text content, including descendants, does not equal the given text.选择其完整文本内容(包括子体)不等于给定text的所有元素。

New in version 3.10.版本3.10中新增。

[tag='text']

Selects all elements that have a child named tag whose complete text content, including descendants, equals the given text.选择具有子命名tag的所有元素,该标记的完整文本内容(包括子体)等于给定text

[tag!='text']

Selects all elements that have a child named tag whose complete text content, including descendants, does not equal the given text.选择具有子命名tag的所有元素,该标记的完整文本内容(包括子体)不等于给定text

New in version 3.10.版本3.10中新增。

[position]

Selects all elements that are located at the given position. 选择位于给定位置的所有图元。The position can be either an integer (1 is the first position), the expression last() (for the last position), or a position relative to the last position (e.g. last()-1).位置可以是整数(1是第一个位置)、表达式last()(表示最后一个位置)或相对于最后一个的位置(例如last()-1)。

Predicates (expressions within square brackets) must be preceded by a tag name, an asterisk, or another predicate. 谓词(方括号内的表达式)前面必须有标记名、星号或其他谓词。position predicates must be preceded by a tag name.谓词前面必须有标记名。

Reference参考

Functions函数

xml.etree.ElementTree.canonicalize(xml_data=None, *, out=None, from_file=None, **options)

C14N 2.0 transformation function.转换函数。

Canonicalization is a way to normalise XML output in a way that allows byte-by-byte comparisons and digital signatures. 规范化是一种以允许逐字节比较和数字签名的方式规范化XML输出的方法。It reduced the freedom that XML serializers have and instead generates a more constrained XML representation. 它减少了XML序列化程序的自由度,反而生成了更受约束的XML表示。The main restrictions regard the placement of namespace declarations, the ordering of attributes, and ignorable whitespace.主要限制涉及命名空间声明的位置、属性的排序和可忽略的空白。

This function takes an XML data string (xml_data) or a file path or file-like object (from_file) as input, converts it to the canonical form, and writes it out using the out file(-like) object, if provided, or returns it as a text string if not. 此函数将XML数据字符串(XML_data)或文件路径或类文件对象(from_file)作为输入,将其转换为规范形式,并使用输出文件(类文件)对象(如果提供)将其写入,如果没有,则将其作为文本字符串返回。The output file receives text, not bytes. 输出文件接收文本,而不是字节。It should therefore be opened in text mode with utf-8 encoding.因此,它应该以utf-8编码的文本模式打开。

Typical uses:典型用途:

xml_data = "<root>...</root>"
print(canonicalize(xml_data))
with open("c14n_output.xml", mode='w', encoding='utf-8') as out_file:
canonicalize(xml_data, out=out_file)

with open("c14n_output.xml", mode='w', encoding='utf-8') as out_file:
canonicalize(from_file="inputfile.xml", out=out_file)

The configuration options are as follows:配置options如下:

  • with_comments: set to true to include comments (default: false):设置为true以包含注释(默认值:false

  • strip_text: set to true to strip whitespace before and after text content:设置为true以删除文本内容前后的空白

    (default: false)(默认值:false

  • rewrite_prefixes: set to true to replace namespace prefixes by “n{number}”:设置为true以将命名空间前缀替换为“n{number}”

    (default: false)(默认值:false

  • qname_aware_tags: a set of qname aware tag names in which prefixes:一组可识别qname的标记名,其中包含前缀

    should be replaced in text content (default: empty)应在文本内容中替换(默认值:空)

  • qname_aware_attrs: a set of qname aware attribute names in which prefixes:一组qname感知属性名称,其中包含前缀

    should be replaced in text content (default: empty)应在文本内容中替换(默认值:空)

  • exclude_attrs: a set of attribute names that should not be serialised:一组不应序列化的属性名称

  • exclude_tags: a set of tag names that should not be serialised:一组不应序列化的标记名

In the option list above, “a set” refers to any collection or iterable of strings, no ordering is expected.在上面的选项列表中,“a set”指的是字符串的任何集合或可迭代的,不需要排序。

New in version 3.8.版本3.8中新增。

xml.etree.ElementTree.Comment(text=None)

Comment element factory. 注释元素工厂。This factory function creates a special element that will be serialized as an XML comment by the standard serializer. 此工厂函数创建一个特殊元素,该元素将由标准序列化程序序列化为XML注释。The comment string can be either a bytestring or a Unicode string. 注释字符串可以是字节字符串或Unicode字符串。text is a string containing the comment string. 是包含注释字符串的字符串。Returns an element instance representing a comment.返回表示注释的元素实例。

Note that XMLParser skips over comments in the input instead of creating comment objects for them. 请注意,XMLParser跳过输入中的注释,而不是为它们创建注释对象。An ElementTree will only contain comment nodes if they have been inserted into to the tree using one of the Element methods.如果使用Element方法之一将注释节点插入到树中,ElementTree将只包含注释节点。

xml.etree.ElementTree.dump(elem)

Writes an element tree or element structure to sys.stdout. 将元素树或元素结构写入sys.tdout。This function should be used for debugging only.此函数应仅用于调试。

The exact output format is implementation dependent. 确切的输出格式取决于实现。In this version, it’s written as an ordinary XML file.在这个版本中,它被编写为一个普通的XML文件。

elem is an element tree or an individual element.是元素树或单个元素。

Changed in version 3.8:版本3.8中更改: The dump() function now preserves the attribute order specified by the user.dump()函数现在保留用户指定的属性顺序。

xml.etree.ElementTree.fromstring(text, parser=None)

Parses an XML section from a string constant. 从字符串常量分析XML节。Same as XML(). XML()相同。text is a string containing XML data. 是包含XML数据的字符串。parser is an optional parser instance. 是可选的解析器实例。If not given, the standard XMLParser parser is used. 如果没有给出,则使用标准XMLParser解析器。Returns an Element instance.返回Element实例。

xml.etree.ElementTree.fromstringlist(sequence, parser=None)

Parses an XML document from a sequence of string fragments. 从字符串片段序列分析XML文档。sequence is a list or other sequence containing XML data fragments. sequence是包含XML数据片段的列表或其他序列。parser is an optional parser instance. 是可选的解析器实例。If not given, the standard XMLParser parser is used. 如果没有给出,则使用标准XMLParser解析器。Returns an Element instance.返回Element实例。

New in version 3.2.版本3.2中新增。

xml.etree.ElementTree.indent(tree, space=' ', level=0)

Appends whitespace to the subtree to indent the tree visually. 在子树中附加空白以直观地缩进树。This can be used to generate pretty-printed XML output. 这可以用来生成漂亮的打印XML输出。tree can be an Element or ElementTree. 可以是Element或ElementTree。space is the whitespace string that will be inserted for each indentation level, two space characters by default. 是将为每个缩进级别插入的空白字符串,默认为两个空格字符。For indenting partial subtrees inside of an already indented tree, pass the initial indentation level as level.对于缩进已经缩进的树内部的部分子树,将初始缩进级别作为level传递。

New in version 3.9.版本3.9中新增。

xml.etree.ElementTree.iselement(element)

Check if an object appears to be a valid element object. 检查对象是否是有效的元素对象。element is an element instance. 是元素实例。Return True if this is an element object.如果这是元素对象,则返回True

xml.etree.ElementTree.iterparse(source, events=None, parser=None)

Parses an XML section into an element tree incrementally, and reports what’s going on to the user. 以增量方式将XML部分解析为元素树,并向用户报告发生的情况。source is a filename or file object containing XML data. source是包含XML数据的文件名或文件对象events is a sequence of events to report back. 是要报告的事件序列。The supported events are the strings "start", "end", "comment", "pi", "start-ns" and "end-ns" (the “ns” events are used to get detailed namespace information). 支持的事件是字符串"start""end""comment""pi""start-ns""end-ns"(“ns”事件用于获取详细的命名空间信息)。If events is omitted, only "end" events are reported. 如果省略events,则只报告"end"事件。parser is an optional parser instance. parser是可选的解析器实例。If not given, the standard XMLParser parser is used. 如果没有给出,则使用标准XMLParser解析器。parser must be a subclass of XMLParser and can only use the default TreeBuilder as a target. 必须是XMLParser的子类,并且只能使用默认的TreeBuilder作为目标。Returns an iterator providing (event, elem) pairs.返回提供(event, elem)对的迭代器

Note that while iterparse() builds the tree incrementally, it issues blocking reads on source (or the file it names). 请注意,当iterparse()以增量方式构建树时,它会阻止对source(或它命名的文件)的读取。As such, it’s unsuitable for applications where blocking reads can’t be made. 因此,它不适用于无法进行阻塞读取的应用程序。For fully non-blocking parsing, see XMLPullParser.有关完全非阻塞解析,请参阅XMLPullParser

Note

iterparse() only guarantees that it has seen the “>” character of a starting tag when it emits a “start” event, so the attributes are defined, but the contents of the text and tail attributes are undefined at that point. 仅保证它在发出“start”事件时看到了起始标记的“>”字符,因此属性被定义,但文本和尾部属性的内容在此时未定义。The same applies to the element children; they may or may not be present.这同样适用于元素子级;它们可能存在也可能不存在。

If you need a fully populated element, look for “end” events instead.如果您需要一个完全填充的元素,请查找“结束”事件。

Deprecated since version 3.4: 自3.4版以来已弃用:The parser argument.parser参数。

Changed in version 3.8:版本3.8中更改: The comment and pi events were added.添加了commentpi事件。

xml.etree.ElementTree.parse(source, parser=None)

Parses an XML section into an element tree. 将XML节解析为元素树。source is a filename or file object containing XML data. source是包含XML数据的文件名或文件对象。parser is an optional parser instance. 是可选的解析器实例。If not given, the standard XMLParser parser is used. 如果没有给出,则使用标准XMLParser解析器。Returns an ElementTree instance.返回ElementTree实例。

xml.etree.ElementTree.ProcessingInstruction(target, text=None)

PI element factory. PI元件工厂。This factory function creates a special element that will be serialized as an XML processing instruction. 此工厂函数创建一个特殊元素,该元素将序列化为XML处理指令。target is a string containing the PI target. 是包含PI目标的字符串。text is a string containing the PI contents, if given. 是包含PI内容的字符串(如果给定)。Returns an element instance, representing a processing instruction.返回表示处理指令的元素实例。

Note that XMLParser skips over processing instructions in the input instead of creating comment objects for them. 请注意,XMLParser跳过处理输入中的指令,而不是为它们创建注释对象。An ElementTree will only contain processing instruction nodes if they have been inserted into to the tree using one of the Element methods.ElementTree仅包含使用Element方法之一插入到树中的处理指令节点。

xml.etree.ElementTree.register_namespace(prefix, uri)

Registers a namespace prefix. 注册命名空间前缀。The registry is global, and any existing mapping for either the given prefix or the namespace URI will be removed. 注册表是全局的,给定前缀或命名空间URI的任何现有映射都将被删除。prefix is a namespace prefix. 是命名空间前缀。uri is a namespace uri. 是命名空间uri。Tags and attributes in this namespace will be serialized with the given prefix, if at all possible.如果可能的话,此命名空间中的标记和属性将使用给定的前缀进行序列化。

New in version 3.2.版本3.2中新增。

xml.etree.ElementTree.SubElement(parent, tag, attrib={}, **extra)

Subelement factory. 子元素工厂。This function creates an element instance, and appends it to an existing element.此函数用于创建元素实例,并将其附加到现有元素。

The element name, attribute names, and attribute values can be either bytestrings or Unicode strings. 元素名称、属性名称和属性值可以是字节字符串或Unicode字符串。parent is the parent element. 是父元素。tag is the subelement name. 是子元素名称。attrib is an optional dictionary, containing element attributes. 是一个可选字典,包含元素属性。extra contains additional attributes, given as keyword arguments. 包含作为关键字参数提供的其他属性。Returns an element instance.返回元素实例。

xml.etree.ElementTree.tostring(element, encoding='us-ascii', method='xml', *, xml_declaration=None, default_namespace=None, short_empty_elements=True)

Generates a string representation of an XML element, including all subelements. 生成XML元素的字符串表示,包括所有子元素。element is an Element instance. Element实例。encoding 1 is the output encoding (default is US-ASCII). 是输出编码(默认为US-ASCII)。Use encoding="unicode" to generate a Unicode string (otherwise, a bytestring is generated). 使用encoding="unicode"生成unicode字符串(否则,将生成字节串)。method is either "xml", "html" or "text" (default is "xml"). "xml""html""text"(默认为"xml")。xml_declaration, default_namespace and short_empty_elements has the same meaning as in ElementTree.write(). xml_declarationdefault_namespaceshort_empty_elements具有与ElementTree.write()中相同的含义。Returns an (optionally) encoded string containing the XML data.返回包含XML数据的(可选)编码字符串。

New in version 3.4.版本3.4中新增。The short_empty_elements parameter.short_empty_elements参数。short_empty_elements参数。

New in version 3.8.版本3.8中新增。The xml_declaration and default_namespace parameters.xml_declarationdefault_namespace参数。

Changed in version 3.8:版本3.8中更改: The tostring() function now preserves the attribute order specified by the user.tostring()函数现在保留用户指定的属性顺序。

xml.etree.ElementTree.tostringlist(element, encoding='us-ascii', method='xml', *, xml_declaration=None, default_namespace=None, short_empty_elements=True)

Generates a string representation of an XML element, including all subelements. 生成XML元素的字符串表示,包括所有子元素。element is an Element instance. Element实例。encoding 1 is the output encoding (default is US-ASCII). 是输出编码(默认为US-ASCII)。Use encoding="unicode" to generate a Unicode string (otherwise, a bytestring is generated). 使用encoding="unicode"生成unicode字符串(否则,将生成字节串)。method is either "xml", "html" or "text" (default is "xml"). "xml""html""text"(默认为"xml")。xml_declaration, default_namespace and short_empty_elements has the same meaning as in ElementTree.write(). xml_declarationdefault_namespaceshort_empty_elements具有与ElementTree.write()中相同的含义。Returns a list of (optionally) encoded strings containing the XML data. 返回包含XML数据的(可选)编码字符串列表。It does not guarantee any specific sequence, except that b"".join(tostringlist(element)) == tostring(element).它不保证任何特定的序列,除了b"".join(tostringlist(element)) == tostring(element)

New in version 3.2.版本3.2中新增。

New in version 3.4.版本3.4中新增。The short_empty_elements parameter.short_empty_elements参数。

New in version 3.8.版本3.8中新增。The xml_declaration and default_namespace parameters.xml_declarationdefault_namespace参数。

Changed in version 3.8:版本3.8中更改: The tostringlist() function now preserves the attribute order specified by the user.tostringlist()函数现在保留用户指定的属性顺序。

xml.etree.ElementTree.XML(text, parser=None)

Parses an XML section from a string constant. 从字符串常量分析XML节。This function can be used to embed “XML literals” in Python code. 此函数可用于在Python代码中嵌入“XML文本”。text is a string containing XML data. 是包含XML数据的字符串。parser is an optional parser instance. 是可选的解析器实例。If not given, the standard XMLParser parser is used. 如果没有给出,则使用标准XMLParser解析器。Returns an Element instance.返回Element实例。

xml.etree.ElementTree.XMLID(text, parser=None)

Parses an XML section from a string constant, and also returns a dictionary which maps from element id:s to elements. 解析字符串常量中的XML节,并返回一个从元素id:s映射到元素的字典。text is a string containing XML data. 是包含XML数据的字符串。parser is an optional parser instance. 是可选的解析器实例。If not given, the standard XMLParser parser is used. 如果没有给出,则使用标准XMLParser解析器。Returns a tuple containing an Element instance and a dictionary.返回包含Element实例和字典的元组。

XInclude support支持

This module provides limited support for XInclude directives, via the xml.etree.ElementInclude helper module. 该模块通过xml.etree.ElementInclude助手模块为XInclude指令提供了有限的支持。This module can be used to insert subtrees and text strings into element trees, based on information in the tree.该模块可用于根据树中的信息将子树和文本字符串插入元素树中。

Example示例

Here’s an example that demonstrates use of the XInclude module. 下面是一个示例,演示了XInclude模块的使用。To include an XML document in the current document, use the {http://www.w3.org/2001/XInclude}include element and set the parse attribute to "xml", and use the href attribute to specify the document to include.要在当前文档中包含XML文档,请使用{http://www.w3.org/2001/XInclude}include元素并将parse属性设置为"xml",然后使用href属性指定要包含的文档。

<?xml version="1.0"?>
<document xmlns:xi="http://www.w3.org/2001/XInclude">
<xi:include href="source.xml" parse="xml" />
</document>

By default, the href attribute is treated as a file name. 默认情况下,href属性被视为文件名。You can use custom loaders to override this behaviour. 您可以使用自定义加载程序来覆盖此行为。Also note that the standard helper does not support XPointer syntax.还要注意,标准助手不支持XPointer语法。

To process this file, load it as usual, and pass the root element to the xml.etree.ElementTree module:要处理此文件,请照常加载它,并将根元素传递给xml.etree.ElementTree模块:

from xml.etree import ElementTree, ElementInclude
tree = ElementTree.parse("document.xml")
root = tree.getroot()

ElementInclude.include(root)

The ElementInclude module replaces the {http://www.w3.org/2001/XInclude}include element with the root element from the source.xml document. ElementInclude模块将替换{http://www.w3.org/2001/XInclude}include元素与source.xml文档中的根元素。The result might look something like this:结果可能如下:

<document xmlns:xi="http://www.w3.org/2001/XInclude">
<para>This is a paragraph.</para>
</document>

If the parse attribute is omitted, it defaults to “xml”. 如果省略了parse属性,则默认为“xml”。The href attribute is required.href属性是必需的。

To include a text document, use the {http://www.w3.org/2001/XInclude}include element, and set the parse attribute to “text”:要包含文本文档,请使用{http://www.w3.org/2001/XInclude}include元素,并将parse属性设置为“text”:

<?xml version="1.0"?>
<document xmlns:xi="http://www.w3.org/2001/XInclude">
Copyright (c) <xi:include href="year.txt" parse="text" />.
</document>

The result might look something like:结果可能类似于:

<document xmlns:xi="http://www.w3.org/2001/XInclude">
Copyright (c) 2003.
</document>

Reference参考

Functions函数

xml.etree.ElementInclude.default_loader(href, parse, encoding=None)

Default loader. 默认加载程序。This default loader reads an included resource from disk. 此默认加载程序从磁盘读取包含的资源。href is a URL. 是一个URL。parse is for parse mode either “xml” or “text”. 用于解析模式“xml”或“text”。encoding is an optional text encoding. 是可选的文本编码。If not given, encoding is utf-8. 如果没有给出,编码为utf-8Returns the expanded resource. 返回展开的资源。If the parse mode is "xml", this is an ElementTree instance. 如果解析模式为"xml",则这是一个ElementTree实例。If the parse mode is “text”, this is a Unicode string. 如果解析模式为“文本”,则这是一个Unicode字符串。If the loader fails, it can return None or raise an exception.如果加载器失败,它可以返回None或引发异常。

xml.etree.ElementInclude.include(elem, loader=None, base_url=None, max_depth=6)

This function expands XInclude directives. 此函数用于扩展XInclude指令。elem is the root element. loader is an optional resource loader. elem是根元素。加载器是一个可选的资源加载器。If omitted, it defaults to default_loader(). 如果省略,则默认为default_loader()If given, it should be a callable that implements the same interface as default_loader(). 如果给定,它应该是一个实现与default_loader()相同接口的可调用函数。base_url is base URL of the original file, to resolve relative include file references. 是原始文件的基本URL,用于解析相对的包含文件引用。max_depth is the maximum number of recursive inclusions. 是递归包含的最大数量。Limited to reduce the risk of malicious content explosion. 限制以降低恶意内容爆炸的风险。Pass a negative value to disable the limitation.传递负值以禁用限制。

Returns the expanded resource. 返回展开的资源。If the parse mode is "xml", this is an ElementTree instance. 如果解析模式为"xml",则这是一个ElementTree实例。If the parse mode is “text”, this is a Unicode string. 如果解析模式为“文本”,则这是一个Unicode字符串。If the loader fails, it can return None or raise an exception.如果加载器失败,它可以返回None或引发异常。

New in version 3.9.版本3.9中新增。The base_url and max_depth parameters.base_urlmax_depth参数。

Element Objects对象

classxml.etree.ElementTree.Element(tag, attrib={}, **extra)

Element class. 元素类。This class defines the Element interface, and provides a reference implementation of this interface.此类定义Element接口,并提供该接口的引用实现。

The element name, attribute names, and attribute values can be either bytestrings or Unicode strings. 元素名称、属性名称和属性值可以是字节字符串或Unicode字符串。tag is the element name. 是元素名称。attrib is an optional dictionary, containing element attributes. 是一个可选字典,包含元素属性。extra contains additional attributes, given as keyword arguments.包含作为关键字参数提供的其他属性。

tag

A string identifying what kind of data this element represents (the element type, in other words).一个字符串,标识此元素表示的数据类型(换句话说,元素类型)。

text
tail

These attributes can be used to hold additional data associated with the element. 这些属性可用于保存与元素关联的其他数据。Their values are usually strings but may be any application-specific object. 它们的值通常是字符串,但可以是任何特定于应用程序的对象。If the element is created from an XML file, the text attribute holds either the text between the element’s start tag and its first child or end tag, or None, and the tail attribute holds either the text between the element’s end tag and the next tag, or None. 如果元素是从XML文件创建的,则text属性保存元素的开始标记与其第一个子标记或结束标记之间的文本,或None,而tail属性保存该元素的结束标记与下一个标记之间的文字,或NoneFor the XML data对于XML数据

<a><b>1<c>2<d/>3</c></b>4</a>

the a element has None for both text and tail attributes, the b element has text "1" and tail "4", the c element has text "2" and tail None, and the d element has text None and tail "3".a元素的texttail属性均为Noneb元素的 text"1"tail"4"c元素的text"2"tailNoned元素的text则为Nonetail"3"

To collect the inner text of an element, see itertext(), for example "".join(element.itertext()).要收集元素的内部文本,请参阅itertext(),例如"".join(element.itertext())

Applications may store arbitrary objects in these attributes.应用程序可以在这些属性中存储任意对象。

attrib

A dictionary containing the element’s attributes. 包含元素属性的字典。Note that while the attrib value is always a real mutable Python dictionary, an ElementTree implementation may choose to use another internal representation, and create the dictionary only if someone asks for it. 请注意,虽然attrib值始终是一个真正的可变Python字典,但ElementTree实现可能会选择使用另一个内部表示,并且只有在有人要求时才创建字典。To take advantage of such implementations, use the dictionary methods below whenever possible.为了利用这些实现,尽可能使用下面的字典方法。

The following dictionary-like methods work on the element attributes.以下类似字典的方法适用于元素属性。

clear()

Resets an element. 重置元素。This function removes all subelements, clears all attributes, and sets the text and tail attributes to None.此函数移除所有子元素,清除所有属性,并将文本和尾部属性设置为None

get(key, default=None)

Gets the element attribute named key.获取名为key的元素属性。

Returns the attribute value, or default if the attribute was not found.返回属性值,如果未找到属性,则返回default

items()

Returns the element attributes as a sequence of (name, value) pairs. 以(名称、值)对序列的形式返回元素属性。The attributes are returned in an arbitrary order.属性以任意顺序返回。

keys()

Returns the elements attribute names as a list. 以列表形式返回元素属性名称。The names are returned in an arbitrary order.名称以任意顺序返回。

set(key, value)

Set the attribute key on the element to value.将元素的属性key设置为value

The following methods work on the element’s children (subelements).以下方法适用于元素的子元素(子元素)。

append(subelement)

Adds the element subelement to the end of this element’s internal list of subelements. 将元素subelement添加到此元素的内部子元素列表的末尾。Raises TypeError if subelement is not an Element.如果subelement不是TypeError,则引发TypeError

extend(subelements)

Appends subelements from a sequence object with zero or more elements. 用零个或多个元素附加序列对象的subelementsRaises TypeError if a subelement is not an Element.如果subelement不是Element,则引发TypeError

New in version 3.2.版本3.2中新增。

find(match, namespaces=None)

Finds the first subelement matching match. 查找第一个子元素匹配项matchmatch may be a tag name or a path. 可以是标签名或路径Returns an element instance or None. 返回元素实例或Nonenamespaces is an optional mapping from namespace prefix to full name. 是从命名空间前缀到全名的可选映射。Pass '' as prefix to move all unprefixed tag names in the expression into the given namespace.传递''作为前缀,将表达式中所有未固定的标记名移动到给定的命名空间中。

findall(match, namespaces=None)

Finds all matching subelements, by tag name or path. 按标记名或路径查找所有匹配的子元素。Returns a list containing all matching elements in document order. 返回按文档顺序包含所有匹配元素的列表。namespaces is an optional mapping from namespace prefix to full name. 是从命名空间前缀到全名的可选映射。Pass '' as prefix to move all unprefixed tag names in the expression into the given namespace.传递''作为前缀,将表达式中所有未固定的标记名移动到给定的命名空间中。

findtext(match, default=None, namespaces=None)

Finds text for the first subelement matching match. 查找第一个子元素匹配项match的文本。match may be a tag name or a path. 可以是标签名或路径Returns the text content of the first matching element, or default if no element was found. 返回第一个匹配元素的文本内容,如果未找到元素,则返回defaultNote that if the matching element has no text content an empty string is returned. 请注意,如果匹配元素没有文本内容,则返回空字符串。namespaces is an optional mapping from namespace prefix to full name. 是从命名空间前缀到全名的可选映射。Pass '' as prefix to move all unprefixed tag names in the expression into the given namespace.传递''作为前缀,将表达式中所有未固定的标记名移动到给定的命名空间中。

insert(index, subelement)

Inserts subelement at the given position in this element. 在此元素中的给定位置插入subelementRaises TypeError if subelement is not an Element.如果subelement不是Element,则引发TypeError

iter(tag=None)

Creates a tree iterator with the current element as the root. 创建以当前元素为根的树迭代器The iterator iterates over this element and all elements below it, in document (depth first) order. 迭代器按文档(深度优先)顺序迭代该元素及其下面的所有元素。If tag is not None or '*', only elements whose tag equals tag are returned from the iterator. 如果tag不是None'*',则仅从迭代器返回其tag等于tag的元素。If the tree structure is modified during iteration, the result is undefined.如果在迭代过程中修改了树结构,则结果是未定义的。

New in version 3.2.版本3.2中新增。

iterfind(match, namespaces=None)

Finds all matching subelements, by tag name or path. 按标记名或路径查找所有匹配的子元素。Returns an iterable yielding all matching elements in document order. 返回一个按文档顺序生成所有匹配元素的可迭代函数。namespaces is an optional mapping from namespace prefix to full name.是从命名空间前缀到全名的可选映射。

New in version 3.2.版本3.2中新增。

itertext()

Creates a text iterator. 创建文本迭代器。The iterator loops over this element and all subelements, in document order, and returns all inner text.迭代器按文档顺序循环遍历此元素和所有子元素,并返回所有内部文本。

New in version 3.2.版本3.2中新增。

makeelement(tag, attrib)

Creates a new element object of the same type as this element. 创建与此元素类型相同的新元素对象。Do not call this method, use the SubElement() factory function instead.不要调用此方法,而是使用SubElement()工厂函数。

remove(subelement)

Removes subelement from the element. 从元素中删除subelementUnlike the find* methods this method compares elements based on the instance identity, not on tag value or contents.与find*方法不同,该方法基于实例标识而不是标记值或内容来比较元素。

Element objects also support the following sequence type methods for working with subelements: 对象还支持以下用于处理子元素的序列类型方法:__delitem__(), __getitem__(), __setitem__(), __len__().

Caution: Elements with no subelements will test as False. 警告:没有子元素的元素将测试为FalseThis behavior will change in future versions. 此行为将在将来的版本中更改。Use specific len(elem) or elem is None test instead.使用特定的len(elem)elem is None测试。

element = root.find('foo')
if not element: # careful!
print("element not found, or element has no subelements")

if element is None:
print("element not found")

Prior to Python 3.8, the serialisation order of the XML attributes of elements was artificially made predictable by sorting the attributes by their name. 在Python 3.8之前,元素的XML属性的序列化顺序是通过按名称对属性进行排序来人为预测的。Based on the now guaranteed ordering of dicts, this arbitrary reordering was removed in Python 3.8 to preserve the order in which attributes were originally parsed or created by user code.基于现在保证的dict排序,Python 3.8中删除了这种任意的重新排序,以保留用户代码最初解析或创建属性的顺序。

In general, user code should try not to depend on a specific ordering of attributes, given that the XML Information Set explicitly excludes the attribute order from conveying information. 一般来说,用户代码应该尽量不依赖于属性的特定顺序,因为XML信息集明确地排除了传递信息的属性顺序。Code should be prepared to deal with any ordering on input. 应准备代码以处理输入的任何订单。In cases where deterministic XML output is required, e.g. for cryptographic signing or test data sets, canonical serialisation is available with the canonicalize() function.在需要确定性XML输出的情况下,例如对于加密签名或测试数据集,可以使用canonicalize()函数进行规范序列化。

In cases where canonical output is not applicable but a specific attribute order is still desirable on output, code should aim for creating the attributes directly in the desired order, to avoid perceptual mismatches for readers of the code. 如果规范输出不适用,但输出上仍需要特定的属性顺序,则代码应以直接以所需顺序创建属性为目标,以避免代码读者的感知不匹配。In cases where this is difficult to achieve, a recipe like the following can be applied prior to serialisation to enforce an order independently from the Element creation:在难以实现的情况下,可以在序列化之前应用如下配方,以独立于元素创建执行订单:

def reorder_attributes(root):
for el in root.iter():
attrib = el.attrib
if len(attrib) > 1:
# adjust attribute order, e.g. by sorting
attribs = sorted(attrib.items())
attrib.clear()
attrib.update(attribs)

ElementTree Objects对象

classxml.etree.ElementTree.ElementTree(element=None, file=None)

ElementTree wrapper class. ElementTree包装类。This class represents an entire element hierarchy, and adds some extra support for serialization to and from standard XML.这个类表示一个完整的元素层次结构,并添加了对标准XML的序列化和从标准XML序列化的额外支持。

element is the root element. element是根元素。The tree is initialized with the contents of the XML file if given.如果给定,则使用XMLfile的内容初始化树。

_setroot(element)

Replaces the root element for this tree. 替换此树的根元素。This discards the current contents of the tree, and replaces it with the given element. 这将丢弃树的当前内容,并将其替换为给定的元素。Use with care. 小心使用。element is an element instance.element是元素实例。

find(match, namespaces=None)

Same as Element.find(), starting at the root of the tree.Element.find()相同,从树的根开始。

findall(match, namespaces=None)

Same as Element.findall(), starting at the root of the tree.Element.findall()相同,从树的根开始。

findtext(match, default=None, namespaces=None)

Same as Element.findtext(), starting at the root of the tree.Element.findtext()相同,从树的根开始。

getroot()

Returns the root element for this tree.返回此树的根元素。

iter(tag=None)

Creates and returns a tree iterator for the root element. 为根元素创建并返回树迭代器。The iterator loops over all elements in this tree, in section order. 迭代器按节顺序遍历此树中的所有元素。tag is the tag to look for (default is to return all elements).tag是要查找的标记(默认值是返回所有元素)。

iterfind(match, namespaces=None)

Same as Element.iterfind(), starting at the root of the tree.Element.iterfind()相同,从树的根开始。

New in version 3.2.版本3.2中新增。

parse(source, parser=None)

Loads an external XML section into this element tree. 将外部XML节加载到此元素树中。source is a file name or file object. source是文件名或文件对象parser is an optional parser instance. 是可选的解析器实例。If not given, the standard XMLParser parser is used. 如果没有给出,则使用标准XMLParser解析器。Returns the section root element.返回节根元素。

write(file, encoding='us-ascii', xml_declaration=None, default_namespace=None, method='xml', *, short_empty_elements=True)

Writes the element tree to a file, as XML. 将元素树作为XML写入文件。file is a file name, or a file object opened for writing. 是一个文件名,或者是一个为写入而打开的文件对象encoding 1 is the output encoding (default is US-ASCII). 是输出编码(默认为US-ASCII)。xml_declaration controls if an XML declaration should be added to the file. 控制是否应将XML声明添加到文件中。Use False for never, True for always, None for only if not US-ASCII or UTF-8 or Unicode (default is None). 如果不是US-ASCII或UTF-8或Unicode,则使用False表示从不,使用True表示始终,使用None表示无(默认值为None)。default_namespace sets the default XML namespace (for “xmlns”). 设置默认的XML命名空间(对于“xmlns”)。method is either "xml", "html" or "text" (default is "xml"). "xml""html""text"(默认为"xml")。The keyword-only short_empty_elements parameter controls the formatting of elements that contain no content. 关键字专用short_empty_elements参数控制不包含内容的元素的格式。If True (the default), they are emitted as a single self-closed tag, otherwise they are emitted as a pair of start/end tags.如果为True(默认值),则它们将作为单个自闭标记发射,否则将作为一对开始/结束标记发射。

The output is either a string (str) or binary (bytes). 输出是字符串(str)或二进制(bytes)。This is controlled by the encoding argument. 这由encoding参数控制。If encoding is "unicode", the output is a string; otherwise, it’s binary. 如果encoding"unicode",则输出为字符串;否则,它是二进制的。Note that this may conflict with the type of file if it’s an open file object; make sure you do not try to write a string to a binary stream and vice versa.请注意,如果是打开的文件对象,这可能与file类型冲突;确保不要尝试将字符串写入二进制流,反之亦然。

New in version 3.4.版本3.4中新增。The short_empty_elements parameter.short_empty_elements参数。

Changed in version 3.8:版本3.8中更改: The write() method now preserves the attribute order specified by the user.write()方法现在保留用户指定的属性顺序。

This is the XML file that is going to be manipulated:这是要处理的XML文件:

<html>
<head>
<title>Example page</title>
</head>
<body>
<p>Moved to <a href="http://example.org/">example.org</a>
or <a href="http://example.com/">example.com</a>.</p>
</body>
</html>

Example of changing the attribute “target” of every link in first paragraph:更改第一段中每个链接的属性“target”的示例:

>>> from xml.etree.ElementTree import ElementTree
>>> tree = ElementTree()
>>> tree.parse("index.xhtml")
<Element 'html' at 0xb77e6fac>
>>> p = tree.find("body/p") # Finds first occurrence of tag p in body
>>> p
<Element 'p' at 0xb77ec26c>
>>> links = list(p.iter("a")) # Returns list of all links
>>> links
[<Element 'a' at 0xb77ec2ac>, <Element 'a' at 0xb77ec1cc>]
>>> for i in links: # Iterates through all found links
... i.attrib["target"] = "blank"
>>> tree.write("output.xhtml")

QName Objects对象

classxml.etree.ElementTree.QName(text_or_uri, tag=None)

QName wrapper. QName包装。This can be used to wrap a QName attribute value, in order to get proper namespace handling on output. 这可以用于包装QName属性值,以便对输出进行正确的命名空间处理。text_or_uri is a string containing the QName value, in the form {uri}local, or, if the tag argument is given, the URI part of a QName. 是一个包含QName值的字符串,格式为{uri}local,如果给定了标记参数,则为QName的uri部分。If tag is given, the first argument is interpreted as a URI, and this argument is interpreted as a local name. 如果给定了tag,第一个参数将被解释为URI,而这个参数将被理解为本地名称。QName instances are opaque.实例是不透明的。

TreeBuilder Objects对象

classxml.etree.ElementTree.TreeBuilder(element_factory=None, *, comment_factory=None, pi_factory=None, insert_comments=False, insert_pis=False)

Generic element structure builder. 通用元素结构生成器。This builder converts a sequence of start, data, end, comment and pi method calls to a well-formed element structure. 该生成器将start、data、end、comment和pi方法调用的序列转换为格式良好的元素结构。You can use this class to build an element structure using a custom XML parser, or a parser for some other XML-like format.您可以使用该类使用自定义XML解析器或其他类似XML格式的解析器构建元素结构。

element_factory, when given, must be a callable accepting two positional arguments: a tag and a dict of attributes. ,当给定时,必须是可调用的,接受两个位置参数:标记和属性的dict。It is expected to return a new element instance.它应该返回一个新元素实例。

The comment_factory and pi_factory functions, when given, should behave like the Comment() and ProcessingInstruction() functions to create comments and processing instructions. 当给定comment_factorypi_factor函数时,其行为应与Comment()ProcessingInstruction()函数类似,以创建注释和处理指令。When not given, the default factories will be used. 如果未给出,将使用默认工厂。When insert_comments and/or insert_pis is true, comments/pis will be inserted into the tree if they appear within the root element (but not outside of it).insert_comments和/或insert_pistrue时,如果注释/pis出现在根元素内(但不在根元素外),则它们将被插入到树中。

close()

Flushes the builder buffers, and returns the toplevel document element. 刷新生成器缓冲区,并返回顶级文档元素。Returns an Element instance.返回Element实例。

data(data)

Adds text to the current element. 将文本添加到当前元素。data is a string. data是字符串。This should be either a bytestring, or a Unicode string.这应该是字节字符串或Unicode字符串。

end(tag)

Closes the current element. 关闭当前元素。tag is the element name. 是元素名称。Returns the closed element.返回闭合元素。

start(tag, attrs)

Opens a new element. 打开新元素。tag is the element name. 是元素名称。attrs is a dictionary containing element attributes. 是包含元素属性的字典。Returns the opened element.返回打开的元素。

comment(text)

Creates a comment with the given text. 使用给定text创建注释。If insert_comments is true, this will also add it to the tree.如果insert_commentstrue,这也会将其添加到树中。

New in version 3.8.版本3.8中新增。

pi(target, text)

Creates a comment with the given target name and text. 使用给定的target名称和text创建注释。If insert_pis is true, this will also add it to the tree.如果insert_pistrue,这也会将其添加到树中。

New in version 3.8.版本3.8中新增。

In addition, a custom TreeBuilder object can provide the following methods:此外,自定义TreeBuilder对象可以提供以下方法:

doctype(name, pubid, system)

Handles a doctype declaration. 处理doctype声明。name is the doctype name. 是doctype名称。pubid is the public identifier. 是公共标识符。system is the system identifier. 是系统标识符。This method does not exist on the default TreeBuilder class.默认TreeBuilder类上不存在此方法。

New in version 3.2.版本3.2中新增。

start_ns(prefix, uri)

Is called whenever the parser encounters a new namespace declaration, before the start() callback for the opening element that defines it. 每当解析器遇到新的命名空间声明时,在定义它的开始元素的start()回调之前调用。prefix is '' for the default namespace and the declared namespace prefix name otherwise. 默认命名空间的prefix'',否则声明的命名空间前缀名称为“”。uri is the namespace URI.是命名空间URI。

New in version 3.8.版本3.8中新增。

end_ns(prefix)

Is called after the end() callback of an element that declared a namespace prefix mapping, with the name of the prefix that went out of scope.在声明了命名空间前缀映射的元素的end()回调之后调用,并使用超出范围的prefix名称。

New in version 3.8.版本3.8中新增。

classxml.etree.ElementTree.C14NWriterTarget(write, *, with_comments=False, strip_text=False, rewrite_prefixes=False, qname_aware_tags=None, qname_aware_attrs=None, exclude_attrs=None, exclude_tags=None)

A C14N 2.0 writer. C14N 2.0写入程序。Arguments are the same as for the canonicalize() function. 参数与canonicalize()函数的参数相同。This class does not build a tree but translates the callback events directly into a serialised form using the write function.该类不构建树,而是使用write函数将回调事件直接转换为序列化形式。

New in version 3.8.版本3.8中新增。

XMLParser Objects对象

classxml.etree.ElementTree.XMLParser(*, target=None, encoding=None)

This class is the low-level building block of the module. 这个类是模块的底层构建块。It uses xml.parsers.expat for efficient, event-based parsing of XML. 它使用xml.parsers.expat对XML进行高效的基于事件的解析。It can be fed XML data incrementally with the feed() method, and parsing events are translated to a push API - by invoking callbacks on the target object. 它可以通过feed()方法增量地向XML数据馈送,解析事件通过调用target对象上的回调被转换为推送API。If target is omitted, the standard TreeBuilder is used. 如果省略target,则使用标准TreeBuilderIf encoding 1 is given, the value overrides the encoding specified in the XML file.如果给定encoding1,该值将覆盖XML文件中指定的编码。

Changed in version 3.8:版本3.8中更改: Parameters are now keyword-only. 参数现在仅为关键字The html argument no longer supported.不再支持html参数。

close()

Finishes feeding data to the parser. 完成向分析器馈送数据。Returns the result of calling the close() method of the target passed during construction; by default, this is the toplevel document element.返回在构造过程中调用传递的targetclose()方法的结果;默认情况下,这是顶级文档元素。

feed(data)

Feeds data to the parser. 将数据馈送到分析器。data is encoded data.是编码数据。

XMLParser.feed() calls target's start(tag, attrs_dict) method for each opening tag, its end(tag) method for each closing tag, and data is processed by method data(data). 为每个开始标记调用targetstart(tag, attrs_dict)方法,为每个结束标记调用其end(tag)方法,数据由方法data(data)处理。For further supported callback methods, see the TreeBuilder class. 有关进一步支持的回调方法,请参阅TreeBuilder类。XMLParser.close() calls target's method close(). 调用target的方法close()XMLParser can be used not only for building a tree structure. 不仅可以用于构建树结构。This is an example of counting the maximum depth of an XML file:这是一个计算XML文件最大深度的示例:

>>> from xml.etree.ElementTree import XMLParser
>>> class MaxDepth: # The target object of the parser
... maxDepth = 0
... depth = 0
... def start(self, tag, attrib): # Called for each opening tag.
... self.depth += 1
... if self.depth > self.maxDepth:
... self.maxDepth = self.depth
... def end(self, tag): # Called for each closing tag.
... self.depth -= 1
... def data(self, data):
... pass # We do not need to do anything with data.
... def close(self): # Called when all data has been parsed.
... return self.maxDepth
...
>>> target = MaxDepth()
>>> parser = XMLParser(target=target)
>>> exampleXml = """
... <a>
... <b>
... </b>
... <b>
... <c>
... <d>
... </d>
... </c>
... </b>
... </a>"""
>>> parser.feed(exampleXml)
>>> parser.close()
4

XMLPullParser Objects对象

classxml.etree.ElementTree.XMLPullParser(events=None)

A pull parser suitable for non-blocking applications. 适用于非阻塞应用程序的拉式解析器。Its input-side API is similar to that of XMLParser, but instead of pushing calls to a callback target, XMLPullParser collects an internal list of parsing events and lets the user read from it. 它的输入端API类似于XMLParser,但XMLPullParser不是将调用推送到回调目标,而是收集解析事件的内部列表并让用户从中读取。events is a sequence of events to report back. 是要报告的事件序列。The supported events are the strings "start", "end", "comment", "pi", "start-ns" and "end-ns" (the “ns” events are used to get detailed namespace information). 支持的事件是字符串"start""end""comment""pi""start-ns""end-ns"(“ns”事件用于获取详细的命名空间信息)。If events is omitted, only "end" events are reported.如果省略events,则只报告“结束”事件。

feed(data)

Feed the given bytes data to the parser.将给定的字节数据馈送到解析器。

close()

Signal the parser that the data stream is terminated. 通知解析器数据流已终止。Unlike XMLParser.close(), this method always returns None. XMLParser.close()不同,此方法始终返回NoneAny events not yet retrieved when the parser is closed can still be read with read_events().当解析器关闭时,仍然可以使用read_events()读取尚未检索到的任何事件。

read_events()

Return an iterator over the events which have been encountered in the data fed to the parser. 返回一个迭代器,覆盖在馈送到解析器的数据中遇到的事件。The iterator yields (event, elem) pairs, where event is a string representing the type of event (e.g. "end") and elem is the encountered Element object, or other context value as follows.迭代器生成(event, elem)对,其中event是表示事件类型的字符串(例如“end”),elem是遇到的Element对象或其他上下文值,如下所示。

  • start, end: the current Element.:当前元素。

  • comment, pi: the current comment / processing instruction:当前注释/处理指令

  • start-ns: a tuple (prefix, uri) naming the declared namespace mapping.:命名声明的命名空间映射的元组(prefix, uri)

  • end-ns: None (this may change in a future version)(这可能在未来版本中发生变化)

Events provided in a previous call to read_events() will not be yielded again. 不会再次生成上一次调用read_events()中提供的事件。Events are consumed from the internal queue only when they are retrieved from the iterator, so multiple readers iterating in parallel over iterators obtained from read_events() will have unpredictable results.只有从迭代器中检索事件时,才会从内部队列中消耗事件,因此多个读取器在从read_events()获得的迭代器上并行迭代将产生不可预测的结果。

Note

XMLPullParser only guarantees that it has seen the “>” character of a starting tag when it emits a “start” event, so the attributes are defined, but the contents of the text and tail attributes are undefined at that point. 仅保证它在发出“start”事件时看到了起始标记的“>”字符,因此属性被定义,但文本和尾部属性的内容在此时未定义。The same applies to the element children; they may or may not be present.这同样适用于元素子级;它们可能存在也可能不存在。

If you need a fully populated element, look for “end” events instead.如果您需要一个完全填充的元素,请查找“结束”事件。

New in version 3.4.版本3.4中新增。

Changed in version 3.8:版本3.8中更改: The comment and pi events were added.添加了commentpi事件。

Exceptions例外情况

classxml.etree.ElementTree.ParseError

XML parse error, raised by the various parsing methods in this module when parsing fails. XML解析错误,当解析失败时,由该模块中的各种解析方法引发。The string representation of an instance of this exception will contain a user-friendly error message. 此异常实例的字符串表示形式将包含用户友好的错误消息。In addition, it will have the following attributes available:此外,它将具有以下可用属性:

code

A numeric error code from the expat parser. expat解析器中的数字错误代码。See the documentation of xml.parsers.expat for the list of error codes and their meanings.有关错误代码及其含义的列表,请参阅xml.parsers.expat的文档。

position

A tuple of line, column numbers, specifying where the error occurred.linecolumn号的元组,指定错误发生的位置。

Footnotes

1(1,2,3,4)

The encoding string included in XML output should conform to the appropriate standards. XML输出中包含的编码字符串应符合适当的标准。For example, “UTF-8” is valid, but “UTF8” is not. 例如,“UTF-8”有效,但“UTF8”无效。See https://www.w3.org/TR/2006/REC-xml11-20060816/#NT-EncodingDecl and https://www.iana.org/assignments/character-sets/character-sets.xhtml.请参阅https://www.w3.org/TR/2006/REC-xml11-20060816/#NT-EncodingDeclhttps://www.iana.org/assignments/character-sets/character-sets.xhtml