2. Lexical analysis词法分析

A Python program is read by a parser. Python程序由解析器读取。Input to the parser is a stream of tokens, generated by the lexical analyzer. 解析器的输入是词法分析器生成的令牌流。This chapter describes how the lexical analyzer breaks a file into tokens.本章介绍词法分析器如何将文件分解为标记。

Python reads program text as Unicode code points; the encoding of a source file can be given by an encoding declaration and defaults to UTF-8, see PEP 3120 for details. Python将程序文本读取为Unicode代码点;源文件的编码可以通过编码声明给出,默认为UTF-8,详细信息请参见PEP 3120If the source file cannot be decoded, a SyntaxError is raised.如果源文件无法解码,则会引发SyntaxError

2.1. Line structure行结构

A Python program is divided into a number of logical lines.Python程序分为若干逻辑行

2.1.1. Logical lines逻辑行

The end of a logical line is represented by the token NEWLINE. 逻辑行的末尾由标记换行符表示。Statements cannot cross logical line boundaries except where NEWLINE is allowed by the syntax (e.g., between statements in compound statements). 语句不能跨越逻辑行边界,除非语法允许换行(例如,复合语句中的语句之间)。A logical line is constructed from one or more physical lines by following the explicit or implicit line joining rules.逻辑线由一条或多条物理行按照显式或隐式行连接规则构造而成。

2.1.2. Physical lines物理行

A physical line is a sequence of characters terminated by an end-of-line sequence. 物理行是由行尾序列终止的字符序列。In source files and strings, any of the standard platform line termination sequences can be used - the Unix form using ASCII LF (linefeed), the Windows form using the ASCII sequence CR LF (return followed by linefeed), or the old Macintosh form using the ASCII CR (return) character. 在源文件和字符串中,可以使用任何标准的平台行终止序列—Unix表单使用ASCII LF(换行符),Windows表单使用ASCII序列CR LF(返回后接换行符),或者旧的Macintosh表单使用ASCII CR(返回符)。All of these forms can be used equally, regardless of platform. 无论平台如何,所有这些形式都可以平等使用。The end of input also serves as an implicit terminator for the final physical line.输入的结尾还用作最终物理行的隐式终止符。

When embedding Python, source code strings should be passed to Python APIs using the standard C conventions for newline characters (the \n character, representing ASCII LF, is the line terminator).嵌入Python时,应使用换行符的标准C约定将源代码字符串传递给Python API(\n字符,表示ASCII LF,是行终止符)。

2.1.3. Comments注释

A comment starts with a hash character (#) that is not part of a string literal, and ends at the end of the physical line. 注释以不是字符串文字的一部分的哈希字符(#)开头,并在物理行的末尾结束。A comment signifies the end of the logical line unless the implicit line joining rules are invoked. Comments are ignored by the syntax.除非调用隐式行连接规则,否则注释表示逻辑行的结束。语法会忽略注释。

2.1.4. Encoding declarations编码声明

If a comment in the first or second line of the Python script matches the regular expression coding[=:]\s*([-\w.]+), this comment is processed as an encoding declaration; the first group of this expression names the encoding of the source code file. 如果Python脚本第一行或第二行中的注释与正则表达式编码coding[=:]\s*([-\w.]+)匹配,则此注释将作为编码声明处理;此表达式的第一组命名源代码文件的编码。The encoding declaration must appear on a line of its own. 编码声明必须出现在自己的一行上。If it is the second line, the first line must also be a comment-only line. 如果是第二行,第一行也必须是仅注释行。The recommended forms of an encoding expression are编码表达式的建议形式有

# -*- coding: <encoding-name> -*-

which is recognized also by GNU Emacs, andGNU Emacs也认可,并且

# vim:fileencoding=<encoding-name>

which is recognized by Bram Moolenaar’s VIM.这被布拉姆·穆勒纳的《VIM》所认可。

If no encoding declaration is found, the default encoding is UTF-8. 如果未找到编码声明,则默认编码为UTF-8。In addition, if the first bytes of the file are the UTF-8 byte-order mark (b'\xef\xbb\xbf'), the declared file encoding is UTF-8 (this is supported, among others, by Microsoft’s notepad).此外,如果文件的第一个字节是UTF-8字节顺序标记(b'\xef\xbb\xbf'),则声明的文件编码是UTF-8(除其他外,Microsoft的记事本支持这种编码)。

If an encoding is declared, the encoding name must be recognized by Python (see Standard Encodings). 如果声明了编码,则Python必须识别编码名称(请参见标准编码)。The encoding is used for all lexical analysis, including string literals, comments and identifiers.编码用于所有词法分析,包括字符串文本、注释和标识符。

2.1.5. Explicit line joining显式行连接

Two or more physical lines may be joined into logical lines using backslash characters (\), as follows: when a physical line ends in a backslash that is not part of a string literal or comment, it is joined with the following forming a single logical line, deleting the backslash and the following end-of-line character. 可以使用反斜杠字符(\)将两个或多个物理行连接到逻辑行中,如下所示:当物理行以不属于字符串文字或注释一部分的反斜杠结尾时,它将与以下内容连接,形成一个逻辑行,删除反斜杠和以下行尾字符。For example:例如:

if 1900 < year < 2100 and 1 <= month <= 12 \
and 1 <= day <= 31 and 0 <= hour < 24 \
and 0 <= minute < 60 and 0 <= second < 60: # Looks like a valid date
return 1

A line ending in a backslash cannot carry a comment. 以反斜杠结尾的行不能包含注释。A backslash does not continue a comment. 反斜杠不会继续注释。A backslash does not continue a token except for string literals (i.e., tokens other than string literals cannot be split across physical lines using a backslash). 反斜杠不会继续字符串文字以外的标记(即,不能使用反斜杠跨物理行拆分字符串文字以外的标记)。A backslash is illegal elsewhere on a line outside a string literal.反斜杠在字符串文字以外的行的其他位置是非法的。

2.1.6. Implicit line joining隐式行连接

Expressions in parentheses, square brackets or curly braces can be split over more than one physical line without using backslashes. 括号、方括号或大括号中的表达式可以拆分为多个物理行,而无需使用反斜杠。For example:例如:

month_names = ['Januari', 'Februari', 'Maart',      # These are the
'April', 'Mei', 'Juni', # Dutch names
'Juli', 'Augustus', 'September', # for the months
'Oktober', 'November', 'December'] # of the year

Implicitly continued lines can carry comments. The indentation of the continuation lines is not important. 隐式连续的行可以包含注释。连续行的缩进并不重要。Blank continuation lines are allowed. 允许空白续行。There is no NEWLINE token between implicit continuation lines. 隐式续行之间没有换行符。Implicitly continued lines can also occur within triple-quoted strings (see below); in that case they cannot carry comments.隐式连行也可以出现在三引号字符串中(见下文);在这种情况下,他们不能发表评论。

2.1.7. Blank lines空白行

A logical line that contains only spaces, tabs, formfeeds and possibly a comment, is ignored (i.e., no NEWLINE token is generated). 忽略仅包含空格、制表符、formfeed和可能的注释的逻辑行(即,不生成换行符)。During interactive input of statements, handling of a blank line may differ depending on the implementation of the read-eval-print loop. 在语句的交互式输入过程中,根据read-eval-print循环的实现,空行的处理可能会有所不同。In the standard interactive interpreter, an entirely blank logical line (i.e. one containing not even whitespace or a comment) terminates a multi-line statement.在标准交互式解释器中,完全空白的逻辑行(即不包含空格或注释的逻辑行)终止多行语句。

2.1.8. Indentation缩进

Leading whitespace (spaces and tabs) at the beginning of a logical line is used to compute the indentation level of the line, which in turn is used to determine the grouping of statements.逻辑行开头的前导空格(空格和制表符)用于计算行的缩进级别,而缩进级别又用于确定语句的分组。

Tabs are replaced (from left to right) by one to eight spaces such that the total number of characters up to and including the replacement is a multiple of eight (this is intended to be the same rule as used by Unix). 制表符(从左到右)由一到八个空格替换,这样替换之前(包括替换之后)的字符总数是八的倍数(这与Unix使用的规则相同)。The total number of spaces preceding the first non-blank character then determines the line’s indentation. 第一个非空白字符前面的空格总数决定了行的缩进。Indentation cannot be split over multiple physical lines using backslashes; the whitespace up to the first backslash determines the indentation.不能使用反斜杠将缩进拆分为多个物理行;直到第一个反斜杠的空格决定缩进。

Indentation is rejected as inconsistent if a source file mixes tabs and spaces in a way that makes the meaning dependent on the worth of a tab in spaces; a TabError is raised in that case.如果源文件将制表符和空格混合在一起,使其含义取决于空格中制表符的值,则缩进将被视为不一致而拒绝;在这种情况下会引发TabError

Cross-platform compatibility note:跨平台兼容性说明: because of the nature of text editors on non-UNIX platforms, it is unwise to use a mixture of spaces and tabs for the indentation in a single source file. 由于非UNIX平台上文本编辑器的性质,在单个源文件中混合使用空格和制表符进行缩进是不明智的。It should also be noted that different platforms may explicitly limit the maximum indentation level.还应注意,不同的平台可能会明确限制最大压痕水平。

A formfeed character may be present at the start of the line; it will be ignored for the indentation calculations above. 换行符可能出现在行的开头;对于上面的压痕计算,它将被忽略。Formfeed characters occurring elsewhere in the leading whitespace have an undefined effect (for instance, they may reset the space count to zero).前导空格中其他地方出现的Formfeed字符具有未定义的效果(例如,它们可能会将空格计数重置为零)。

The indentation levels of consecutive lines are used to generate INDENT and DEDENT tokens, using a stack, as follows.连续行的缩进级别用于使用堆栈生成缩进和DEDENT标记,如下所示。

Before the first line of the file is read, a single zero is pushed on the stack; this will never be popped off again. 在读取文件的第一行之前,在堆栈上推一个零;这将永远不会被弹出了。The numbers pushed on the stack will always be strictly increasing from bottom to top. 推到堆栈上的数字将始终严格地从下到上递增。At the beginning of each logical line, the line’s indentation level is compared to the top of the stack. 在每个逻辑行的开头,将行的缩进级别与堆栈的顶部进行比较。If it is equal, nothing happens. 如果它是相等的,什么也不会发生。If it is larger, it is pushed on the stack, and one INDENT token is generated. 如果较大,则将其推送到堆栈上,并生成一个缩进标记。If it is smaller, it must be one of the numbers occurring on the stack; all numbers on the stack that are larger are popped off, and for each number popped off a DEDENT token is generated. 如果较小,则必须是堆栈上出现的数字之一;堆栈上所有较大的数字都会弹出,对于弹出的每个数字,都会生成DEDENT标记。At the end of the file, a DEDENT token is generated for each number remaining on the stack that is larger than zero.在文件末尾,将为堆栈上剩余的大于零的每个数字生成一个DEDENT标记。

Here is an example of a correctly (though confusingly) indented piece of Python code:下面是一段正确缩进(尽管令人困惑)的Python代码示例:

def perm(l):
# Compute the list of all permutations of l
if len(l) <= 1:
return [l]
r = []
for i in range(len(l)):
s = l[:i] + l[i+1:]
p = perm(s)
for x in p:
r.append(l[i:i+1] + x)
return r

The following example shows various indentation errors:以下示例显示了各种缩进错误:

 def perm(l):                       # error: first line indented
for i in range(len(l)): # error: not indented
s = l[:i] + l[i+1:]
p = perm(l[:i] + l[i+1:]) # error: unexpected indent
for x in p:
r.append(l[i:i+1] + x)
return r # error: inconsistent dedent

(Actually, the first three errors are detected by the parser; only the last error is found by the lexical analyzer — the indentation of return r does not match a level popped off the stack.)(实际上,前三个错误是由解析器检测到的;只有最后一个错误是由词法分析器发现的:return r的缩进与堆栈中弹出的级别不匹配。)

2.1.9. Whitespace between tokens符号间的空白

Except at the beginning of a logical line or in string literals, the whitespace characters space, tab and formfeed can be used interchangeably to separate tokens. 除了在逻辑行的开头或字符串文字中,空格字符空格、制表符和formfeed可以互换使用来分隔标记。Whitespace is needed between two tokens only if their concatenation could otherwise be interpreted as a different token (e.g., ab is one token, but a b is two tokens).只有当两个标记的串联可以解释为不同的标记时(例如,ab是一个标记,但a b是两个标记),才需要在两个标记之间使用空格。

2.2. Other tokens其它语言符号

Besides NEWLINE, INDENT and DEDENT, the following categories of tokens exist: identifiers, keywords, literals, operators, and delimiters. Whitespace characters (other than line terminators, discussed earlier) are not tokens, but serve to delimit tokens. 空白字符(除前面讨论的行终止符外)不是标记,而是用于分隔标记。Where ambiguity exists, a token comprises the longest possible string that forms a legal token, when read from left to right.如果存在歧义,当从左到右读取时,标记包含形成合法标记的最长字符串。

2.3. Identifiers and keywords标识符和关键字

Identifiers (also referred to as names) are described by the following lexical definitions.标识符(也称为名称)由以下词汇定义描述。

The syntax of identifiers in Python is based on the Unicode standard annex UAX-31, with elaboration and changes as defined below; see also PEP 3131 for further details.Python中标识符的语法基于Unicode标准附录UAX-31,其细化和更改定义如下;更多详细信息,请参阅PEP 3131

Within the ASCII range (U+0001..U+007F), the valid characters for identifiers are the same as in Python 2.x: the uppercase and lowercase letters A through Z, the underscore _ and, except for the first character, the digits 0 through 9.在ASCII范围(U+0001..U+007F)内,标识符的有效字符与Python 2.x中的相同:大写和小写字母AZ、下划线_以及除第一个字符外的数字09

Python 3.0 introduces additional characters from outside the ASCII range (see PEP 3131). Python 3.0引入了ASCII范围之外的其他字符(请参见PEP 3131)。For these characters, the classification uses the version of the Unicode Character Database as included in the unicodedata module.对于这些字符,分类使用Unicode字符数据库的版本,该版本包含在unicodedata模块中。

Identifiers are unlimited in length. 标识符的长度不受限制。Case is significant.大小写很重要。


identifier ::= xid_start xid_continue*
id_start ::= <all characters in general categories Lu, Ll, Lt, Lm, Lo, Nl, the underscore, and characters with the Other_ID_Start property>
id_continue ::= <all characters in id_start, plus characters in the categories Mn, Mc, Nd, Pc and others with the Other_ID_Continue property>
xid_start ::= <all characters in id_start whose NFKC normalization is in "id_start xid_continue*">
xid_continue ::= <all characters in id_continue whose NFKC normalization is in "id_continue*">

The Unicode category codes mentioned above stand for:上述Unicode类别代码代表:

  • Lu - uppercase letters大写字母

  • Ll - lowercase letters小写字母

  • Lt - titlecase letters标题大小写字母(单词首字母大写)

  • Lm - modifier letters修饰字母

  • Lo - other letters其他字母

  • Nl - letter numbers字母数字

  • Mn - nonspacing marks无间距标记

  • Mc - spacing combining marks间距组合标记

  • Nd - decimal numbers十进制数

  • Pc - connector punctuations连接器标点符号

  • Other_ID_Start - explicit list of characters in PropList.txt to support backwards compatibilityPropList.txt中支持向后兼容性的显式字符列表

  • Other_ID_Continue - likewise同样地

All identifiers are converted into the normal form NFKC while parsing; comparison of identifiers is based on NFKC.解析时将所有标识符转换为标准格式NFKC;标识符的比较基于NFKC。

A non-normative HTML file listing all valid identifier characters for Unicode 4.1 can be found at 列出Unicode 4.1所有有效标识符字符的非规范HTML文件可在以下位置找到:https://www.unicode.org/Public/13.0.0/ucd/DerivedCoreProperties.txt

2.3.1. Keywords关键词

The following identifiers are used as reserved words, or keywords of the language, and cannot be used as ordinary identifiers. 以下标识符用作语言的保留字或关键字,不能用作普通标识符。They must be spelled exactly as written here:它们的拼写必须与此处所写的完全一致:

False      await      else       import     pass
None break except in raise
True class finally is return
and continue for lambda try
as def from nonlocal while
assert del global not with
async elif if or yield

2.3.2. Soft Keywords软关键字

New in version 3.10.版本3.10中新增。

Some identifiers are only reserved under specific contexts. 某些标识符仅在特定上下文下保留。These are known as soft keywords. 这些被称为软关键字The identifiers match, case and _ can syntactically act as keywords in contexts related to the pattern matching statement, but this distinction is done at the parser level, not when tokenizing.标识符matchcase_可以在与模式匹配语句相关的上下文中充当关键字,但这种区分是在解析器级别完成的,而不是在标记化时。

As soft keywords, their use with pattern matching is possible while still preserving compatibility with existing code that uses match, case and _ as identifier names.作为软关键字,它们可以用于模式匹配,同时仍保持与使用matchcase_作为标识符名称的现有代码的兼容性。

2.3.3. Reserved classes of identifiers标识符的保留类

Certain classes of identifiers (besides keywords) have special meanings. 某些类别的标识符(除关键字外)具有特殊含义。These classes are identified by the patterns of leading and trailing underscore characters:这些类由前导和尾随下划线字符的模式标识:

_*

Not imported by from module import *.不由from module import *导入

_

In a case pattern within a match statement, _ is a soft keyword that denotes a wildcard.match语句中的case模式中,_是表示通配符软关键字

Separately, the interactive interpreter makes the result of the last evaluation available in the variable _. 另外,交互式解释器将最后一次求值的结果保存在变量_中。(It is stored in the builtins module, alongside built-in functions like print.)(它与打印等内置功能一起存储在builtins模块中。)

Elsewhere, _ is a regular identifier. 在其他地方,_是常规标识符。It is often used to name “special” items, but it is not special to Python itself.它通常用于命名“特殊”项,但它对Python本身并不特殊。

Note

The name _ is often used in conjunction with internationalization; refer to the documentation for the gettext module for more information on this convention.名称_通常与国际化结合使用;有关此约定的更多信息,请参阅gettext模块的文档。

It is also commonly used for unused variables.它也常用于未使用的变量。

__*__

System-defined names, informally known as “dunder” names. 系统定义的名称,非正式地称为“dunder”名称。These names are defined by the interpreter and its implementation (including the standard library). 这些名称由解释器及其实现(包括标准库)定义。Current system names are discussed in the Special method names section and elsewhere. 当前系统名称将在特殊方法名称部分和其他地方讨论。More will likely be defined in future versions of Python. 更多可能会在Python的未来版本中定义。Any use of __*__ names, in any context, that does not follow explicitly documented use, is subject to breakage without warning.在任何上下文中,如果未遵循明确记录的使用方法,则会在没有警告的情况下破坏__*__名称。

__*

Class-private names. Names in this category, when used within the context of a class definition, are re-written to use a mangled form to help avoid name clashes between “private” attributes of base and derived classes. 类私有名称。当在类定义的上下文中使用此类别中的名称时,会重新写入以使用损坏的形式,以帮助避免基类和派生类的“私有”属性之间的名称冲突。See section Identifiers (Names).请参见标识符(名称)一节。

2.4. Literals直接常量

Literals are notations for constant values of some built-in types.文字是某些内置类型的常量值的符号。

2.4.1. String and Bytes literals字符串和字节文本

String literals are described by the following lexical definitions:字符串文字由以下词法定义描述:


stringliteral ::= [stringprefix](shortstring | longstring)
stringprefix ::= "r" | "u" | "R" | "U" | "f" | "F"
| "fr" | "Fr" | "fR" | "FR" | "rf" | "rF" | "Rf" | "RF"
shortstring ::= "'" shortstringitem* "'" | '"' shortstringitem* '"'
longstring ::= "'''" longstringitem* "'''" | '"""' longstringitem* '"""'
shortstringitem ::= shortstringchar | stringescapeseq
longstringitem ::= longstringchar | stringescapeseq
shortstringchar ::= <any source character except "\" or newline or the quote>
longstringchar ::= <any source character except "\">
stringescapeseq ::= "\" <any source character>

bytesliteral ::= bytesprefix(shortbytes | longbytes)
bytesprefix ::= "b" | "B" | "br" | "Br" | "bR" | "BR" | "rb" | "rB" | "Rb" | "RB"
shortbytes ::= "'" shortbytesitem* "'" | '"' shortbytesitem* '"'
longbytes ::= "'''" longbytesitem* "'''" | '"""' longbytesitem* '"""'
shortbytesitem ::= shortbyteschar | bytesescapeseq
longbytesitem ::= longbyteschar | bytesescapeseq
shortbyteschar ::= <any ASCII character except "\" or newline or the quote>
longbyteschar ::= <any ASCII character except "\">
bytesescapeseq ::= "\" <any ASCII character>

One syntactic restriction not indicated by these productions is that whitespace is not allowed between the stringprefix or bytesprefix and the rest of the literal. 这些结果未指出的一个语法限制是,stringprefixbytesprefix与文本的其余部分之间不允许有空格。The source character set is defined by the encoding declaration; it is UTF-8 if no encoding declaration is given in the source file; see section Encoding declarations.源字符集由编码声明定义;如果源文件中没有给出编码声明,则为UTF-8;请参阅编码声明一节。

In plain English: Both types of literals can be enclosed in matching single quotes (') or double quotes ("). 在普通英语中:两种类型的文字都可以用匹配的单引号(')或双引号(")括起来。They can also be enclosed in matching groups of three single or double quotes (these are generally referred to as triple-quoted strings). 它们也可以包含在三个单引号或双引号(通常称为三引号字符串)的匹配组中。The backslash (\) character is used to escape characters that otherwise have a special meaning, such as newline, backslash itself, or the quote character.反斜杠(\)字符用于转义具有特殊含义的字符,例如换行符、反斜杠本身或引号字符。

Bytes literals are always prefixed with 'b' or 'B'; they produce an instance of the bytes type instead of the str type. 字节文本始终以'b''B'作为前缀;它们生成bytes类型的实例,而不是str类型的实例。They may only contain ASCII characters; bytes with a numeric value of 128 or greater must be expressed with escapes.它们只能包含ASCII字符;数值大于等于128的字节必须用转义符表示。

Both string and bytes literals may optionally be prefixed with a letter 'r' or 'R'; such strings are called raw strings and treat backslashes as literal characters. 字符串和字节文字都可以选择前缀字母'r''R';此类字符串称为原始字符串,并将反斜杠视为文字字符。As a result, in string literals, '\U' and '\u' escapes in raw strings are not treated specially. 因此,在字符串文字中,原始字符串中的'\U''\u'转义没有得到特殊处理。Given that Python 2.x’s raw unicode literals behave differently than Python 3.x’s the 'ur' syntax is not supported.由于Python 2x的原始unicode文本与Python 3.x的行为不同,因此不支持'ur'语法。

New in version 3.3: 版本3.3中新增:The 'rb' prefix of raw bytes literals has been added as a synonym of 'br'.原始字节文本的'rb'前缀已添加为'br'的同义词。

New in version 3.3: 版本3.3中新增:Support for the unicode legacy literal (u'value') was reintroduced to simplify the maintenance of dual Python 2.x and 3.x codebases. 重新引入了对unicode遗留文字(u'value')的支持,以简化对双Python 2.x和3.x代码基的维护。See PEP 414 for more information.详见PEP 414

A string literal with 'f' or 'F' in its prefix is a formatted string literal; see Formatted string literals. 前缀为'f''F'的字符串文字是格式化的字符串文字;请参见格式化字符串文字The 'f' may be combined with 'r', but not with 'b' or 'u', therefore raw formatted strings are possible, but formatted bytes literals are not.'f'可以与'r'组合,但不能与'b''u'组合,因此可以使用原始格式化字符串,但不能使用格式化字节文本。

In triple-quoted literals, unescaped newlines and quotes are allowed (and are retained), except that three unescaped quotes in a row terminate the literal. 在三引号文字中,允许(并保留)未转义的换行符和引号,但一行中的三个未转义引号终止文字。(A “quote” is the character used to open the literal, i.e. either ' or ".)(引号是用于打开文本的字符,即'"。)

Unless an 'r' or 'R' prefix is present, escape sequences in string and bytes literals are interpreted according to rules similar to those used by Standard C. 除非存在前缀'r''R',否则字符串和字节文本中的转义序列将根据类似于标准C使用的规则进行解释。The recognized escape sequences are:已识别的逃逸序列为:

Escape Sequence转义序列

Meaning意思

Notes注释

\newline

Backslash and newline ignored已忽略反斜杠和换行符

\\

Backslash (\)反斜杠(\

\'

Single quote (')单引号('

\"

Double quote (")双引号("

\a

ASCII Bell (BEL)

\b

ASCII Backspace (BS)ASCII退格(BS)

\f

ASCII Formfeed (FF)ASCII 翻页(FF)

\n

ASCII Linefeed (LF)ASCII换行符(LF)

\r

ASCII Carriage Return (CR)ASCII回车(CR)

\t

ASCII Horizontal Tab (TAB)ASCII水平制表符(TAB)

\v

ASCII Vertical Tab (VT)ASCII垂直制表符(VT)

\ooo

Character with octal value ooo具有八进制值ooo的字符

(1,3)

\xhh

Character with hex value hh具有十六进制值hh的字符

(2,3)

Escape sequences only recognized in string literals are:仅在字符串文字中识别的转义序列包括:

Escape Sequence转义序列

Meaning意思

Notes注释

\N{name}

Character named name in the Unicode databaseUnicode数据库中名为name的字符

(4)

\uxxxx

Character with 16-bit hex value xxxx具有16位十六进制值xxxx的字符

(5)

\Uxxxxxxxx

Character with 32-bit hex value xxxxxxxx具有32位十六进制值xxxxxxxx的字符

(6)

Notes:备注:

  1. As in Standard C, up to three octal digits are accepted.与标准C一样,最多可接受三个八进制数字。

  2. Unlike in Standard C, exactly two hex digits are required.与标准C不同,只需要两个十六进制数字。

  3. In a bytes literal, hexadecimal and octal escapes denote the byte with the given value. 在字节文字中,十六进制和八进制转义表示具有给定值的字节。In a string literal, these escapes denote a Unicode character with the given value.在字符串文字中,这些转义符表示具有给定值的Unicode字符。

  4. Changed in version 3.3: 在版本3.3中更改:Support for name aliases 1 has been added.已添加对名称别名1的支持。

  5. Exactly four hex digits are required.需要四个十六进制数字。

  6. Any Unicode character can be encoded this way. 任何Unicode字符都可以这样编码。Exactly eight hex digits are required.需要八个十六进制数字。

Unlike Standard C, all unrecognized escape sequences are left in the string unchanged, i.e., the backslash is left in the result. 与标准C不同,所有无法识别的转义序列在字符串中保持不变,即在结果中保留反斜杠(This behavior is useful when debugging: if an escape sequence is mistyped, the resulting output is more easily recognized as broken.) (此行为在调试时很有用:如果转义序列输入错误,则结果输出更容易被识别为中断。)It is also important to note that the escape sequences only recognized in string literals fall into the category of unrecognized escapes for bytes literals.还需要注意的是,仅在字符串文字中识别的转义序列属于字节文字的未识别转义类别。

Changed in version 3.6: 在版本3.6中更改:Unrecognized escape sequences produce a DeprecationWarning. 无法识别的转义序列会产生DeprecationWarningIn a future Python version they will be a SyntaxWarning and eventually a SyntaxError.在未来的Python版本中,它们将是一个SyntaxWarning,最终是一个SyntaxError

Even in a raw literal, quotes can be escaped with a backslash, but the backslash remains in the result; for example, r"\"" is a valid string literal consisting of two characters: a backslash and a double quote; r"\" is not a valid string literal (even a raw string cannot end in an odd number of backslashes). 即使在原始文字中,引号也可以用反斜杠转义,但反斜杠仍保留在结果中;例如,r"\""是由两个字符组成的有效字符串文字:反斜杠和双引号;r"\"不是有效的字符串文字(即使是原始字符串也不能以奇数个反斜杠结尾)。Specifically, a raw literal cannot end in a single backslash (since the backslash would escape the following quote character). 具体而言,原始文字不能以单个反斜杠结尾(因为反斜杠将转义以下引号字符)。Note also that a single backslash followed by a newline is interpreted as those two characters as part of the literal, not as a line continuation.还要注意,后跟换行符的单个反斜杠被解释为这两个字符作为文本的一部分,而不是作为行的延续。

2.4.2. String literal concatenation串字面值的连接

Multiple adjacent string or bytes literals (delimited by whitespace), possibly using different quoting conventions, are allowed, and their meaning is the same as their concatenation. 允许多个相邻的字符串或字节文字(由空格分隔),可能使用不同的引用约定,其含义与它们的串联相同。Thus, "hello" 'world' is equivalent to "helloworld". 因此,"hello" 'world'等同于"helloworld"This feature can be used to reduce the number of backslashes needed, to split long strings conveniently across long lines, or even to add comments to parts of strings, for example:此功能可用于减少所需反斜杠的数量,方便地跨长行拆分长字符串,甚至可以向字符串的某些部分添加注释,例如:

re.compile("[A-Za-z_]"       # letter or underscore
"[A-Za-z0-9_]*" # letter, digit or underscore
)

Note that this feature is defined at the syntactical level, but implemented at compile time. 请注意,此功能在语法级别定义,但在编译时实现。The ‘+’ operator must be used to concatenate string expressions at run time. 必须使用“+”运算符在运行时连接字符串表达式。Also note that literal concatenation can use different quoting styles for each component (even mixing raw strings and triple quoted strings), and formatted string literals may be concatenated with plain string literals.还要注意,文字串联可以为每个组件使用不同的引用样式(甚至混合原始字符串和三重引用字符串),格式化字符串文字可以与普通字符串文字串联。

2.4.3. Formatted string literals格式化字符串文字

New in version 3.6.版本3.6中新增。

A formatted string literal or f-string is a string literal that is prefixed with 'f' or 'F'. 格式化字符串文字f-string是前缀为'f''F'的字符串文字。These strings may contain replacement fields, which are expressions delimited by curly braces {}. 这些字符串可能包含替换字段,这些字段是由大括号{}分隔的表达式。While other string literals always have a constant value, formatted strings are really expressions evaluated at run time.虽然其他字符串文字总是有一个常量值,但格式化字符串实际上是在运行时计算的表达式。

Escape sequences are decoded like in ordinary string literals (except when a literal is also marked as a raw string). 转义序列的解码方式与普通字符串文本类似(除非文本也标记为原始字符串)。After decoding, the grammar for the contents of the string is:解码后,字符串内容的语法为:


f_string ::= (literal_char | "{{" | "}}" | replacement_field)*
replacement_field ::= "{" f_expression ["="] ["!" conversion] [":" format_spec] "}"
f_expression ::= (conditional_expression | "*" or_expr)
("," conditional_expression | "," "*" or_expr)* [","]
| yield_expression
conversion ::= "s" | "r" | "a"
format_spec ::= (literal_char | NULL | replacement_field)*
literal_char ::= <any code point except "{", "}" or NULL>

The parts of the string outside curly braces are treated literally, except that any doubled curly braces '{{' or '}}' are replaced with the corresponding single curly brace. 字符串中花括号外的部分按字面处理,除了任何双花括号'{{''}}'被相应的单花括号替换。A single opening curly bracket '{' marks a replacement field, which starts with a Python expression. 一个开头的花括号'{'标记一个替换字段,该字段以Python表达式开头。To display both the expression text and its value after evaluation, (useful in debugging), an equal sign '=' may be added after the expression. 要在计算后显示表达式文本及其值(在调试中很有用),可以在表达式后添加等号'='A conversion field, introduced by an exclamation point '!' may follow. 转换字段,由感叹号'!'引入可能会随之而来。A format specifier may also be appended, introduced by a colon ':'. 还可以附加格式说明符,由冒号':'引入。A replacement field ends with a closing curly bracket '}'.替换字段以右括号'}'结尾。

Expressions in formatted string literals are treated like regular Python expressions surrounded by parentheses, with a few exceptions. 格式化字符串文本中的表达式被视为由括号包围的常规Python表达式,只有少数例外。An empty expression is not allowed, and both lambda and assignment expressions := must be surrounded by explicit parentheses. 不允许使用空表达式,lambda和赋值表达式:=必须用显式括号括起来。Replacement expressions can contain line breaks (e.g. in triple-quoted strings), but they cannot contain comments. 替换表达式可以包含换行符(例如,在三引号字符串中),但不能包含注释。Each expression is evaluated in the context where the formatted string literal appears, in order from left to right.每个表达式都在格式化字符串文本出现的上下文中按从左到右的顺序进行计算。

Changed in version 3.7: 在版本3.7中更改:Prior to Python 3.7, an await expression and comprehensions containing an async for clause were illegal in the expressions in formatted string literals due to a problem with the implementation.在Python 3.7之前,由于实现存在问题,在格式化字符串文本的表达式中,包含async for子句的await表达式和理解是非法的。

When the equal sign '=' is provided, the output will have the expression text, the '=' and the evaluated value. 当提供等号'='时,输出将包含表达式文本、'='和计算值。Spaces after the opening brace '{', within the expression and after the '=' are all retained in the output. 表达式中大括号'{'之后以及'='之后的空格都保留在输出中。By default, the '=' causes the repr() of the expression to be provided, unless there is a format specified. 默认情况下,'='会导致提供表达式的repr(),除非指定了格式。When a format is specified it defaults to the str() of the expression unless a conversion '!r' is declared.指定格式时,它默认为表达式的str(),除非已声明转换'!r'

New in version 3.8: 版本3.8中新增:The equal sign '='.等号'='

If a conversion is specified, the result of evaluating the expression is converted before formatting. 如果指定了转换,则计算表达式的结果将在格式化之前转换。Conversion '!s' calls str() on the result, '!r' calls repr(), and '!a' calls ascii().转换'!s'对结果调用str()'!r'调用repr(),而'!a'调用ascii()

The result is then formatted using the format() protocol. 然后使用format()协议格式化结果。The format specifier is passed to the __format__() method of the expression or conversion result. 格式说明符将传递给表达式或转换结果的__format__()方法。An empty string is passed when the format specifier is omitted. 省略格式说明符时,传递空字符串。The formatted result is then included in the final value of the whole string.格式化后的结果将包含在整个字符串的最终值中。

Top-level format specifiers may include nested replacement fields. 顶级格式说明符可能包括嵌套的替换字段。These nested fields may include their own conversion fields and format specifiers, but may not include more deeply-nested replacement fields. 这些嵌套字段可能包括它们自己的转换字段和格式说明符,但可能不包括嵌套更深入的替换字段。The format specifier mini-language is the same as that used by the str.format() method.格式说明符迷你语言str.format()方法使用的语言相同。

Formatted string literals may be concatenated, but replacement fields cannot be split across literals.格式化的字符串文字可以串联,但替换字段不能跨文字拆分。

Some examples of formatted string literals:格式化字符串文字的一些示例:

>>> name = "Fred"
>>> f"He said his name is {name!r}."
"He said his name is 'Fred'."
>>> f"He said his name is {repr(name)}." # repr() is equivalent to !r
"He said his name is 'Fred'."
>>> width = 10
>>> precision = 4
>>> value = decimal.Decimal("12.34567")
>>> f"result: {value:{width}.{precision}}" # nested fields
'result: 12.35'
>>> today = datetime(year=2017, month=1, day=27)
>>> f"{today:%B %d, %Y}" # using date format specifier
'January 27, 2017'
>>> f"{today=:%B %d, %Y}" # using date format specifier and debugging
'today=January 27, 2017'
>>> number = 1024
>>> f"{number:#0x}" # using integer format specifier
'0x400'
>>> foo = "bar"
>>> f"{ foo = }" # preserves whitespace
" foo = 'bar'"
>>> line = "The mill's closed"
>>> f"{line = }"
'line = "The mill\'s closed"'
>>> f"{line = :20}"
"line = The mill's closed "
>>> f"{line = !r:20}"
'line = "The mill\'s closed" '

A consequence of sharing the same syntax as regular string literals is that characters in the replacement fields must not conflict with the quoting used in the outer formatted string literal:与常规字符串文字共享相同语法的结果是,替换字段中的字符不得与外部格式化字符串文字中使用的引号冲突:

f"abc {a["x"]} def"    # error: outer string literal ended prematurely
f"abc {a['x']} def" # workaround: use different quoting

Backslashes are not allowed in format expressions and will raise an error:格式表达式中不允许使用反斜杠,并将引发错误:

f"newline: {ord('\n')}"  # raises SyntaxError

To include a value in which a backslash escape is required, create a temporary variable.要包含需要反斜杠转义的值,请创建一个临时变量。

>>> newline = ord('\n')
>>> f"newline: {newline}"
'newline: 10'

Formatted string literals cannot be used as docstrings, even if they do not include expressions.格式化的字符串文字不能用作docstring,即使它们不包含表达式。

>>> def foo():
... f"Not a docstring"
...
>>> foo.__doc__ is None
True

See also PEP 498 for the proposal that added formatted string literals, and str.format(), which uses a related format string mechanism.另请参见PEP 498,了解添加格式化字符串文本的建议,以及使用相关格式字符串机制的str.format()

2.4.4. Numeric literals数值型的字面值

There are three types of numeric literals: integers, floating point numbers, and imaginary numbers. 有三种类型的数字文字:整数、浮点数和虚数。There are no complex literals (complex numbers can be formed by adding a real number and an imaginary number).没有复数(复数可以通过实数和虚数相加形成)。

Note that numeric literals do not include a sign; a phrase like -1 is actually an expression composed of the unary operator ‘-’ and the literal 1.请注意,数字文字不包括符号;像-1这样的短语实际上是由一元运算符-和文字1组成的表达式。

2.4.5. Integer literals整数字面值

Integer literals are described by the following lexical definitions:整数文本由以下词法定义描述:


integer ::= decinteger | bininteger | octinteger | hexinteger
decinteger ::= nonzerodigit (["_"] digit)* | "0"+ (["_"] "0")*
bininteger ::= "0" ("b" | "B") (["_"] bindigit)+
octinteger ::= "0" ("o" | "O") (["_"] octdigit)+
hexinteger ::= "0" ("x" | "X") (["_"] hexdigit)+
nonzerodigit ::= "1"..."9"
digit ::= "0"..."9"
bindigit ::= "0" | "1"
octdigit ::= "0"..."7"
hexdigit ::= digit | "a"..."f" | "A"..."F"

There is no limit for the length of integer literals apart from what can be stored in available memory.除了可以存储在可用内存中的内容外,整数文本的长度没有限制。

Underscores are ignored for determining the numeric value of the literal. 在确定文字的数值时,忽略下划线。They can be used to group digits for enhanced readability. 它们可用于对数字进行分组,以增强可读性。One underscore can occur between digits, and after base specifiers like 0x.一个下划线可以出现在数字之间,也可以出现在基本说明符(如0x)之后。

Note that leading zeros in a non-zero decimal number are not allowed. 请注意,不允许在非零十进制数中使用前导零。This is for disambiguation with C-style octal literals, which Python used before version 3.0.这是为了消除C风格的八进制文字的歧义,Python在3.0版本之前就使用了这种文字。

Some examples of integer literals:整数文本的一些示例:

7     2147483647                        0o177    0b100110111
3 79228162514264337593543950336 0o377 0xdeadbeef
100_000_000_000 0b_1110_0101

Changed in version 3.6: 在版本3.6中更改:Underscores are now allowed for grouping purposes in literals.下划线现在可以用于文本中的分组目的。

2.4.6. Floating point literals浮点字面值

Floating point literals are described by the following lexical definitions:浮点文字由以下词汇定义描述:


floatnumber ::= pointfloat | exponentfloat
pointfloat ::= [digitpart] fraction | digitpart "."
exponentfloat ::= (digitpart | pointfloat) exponent
digitpart ::= digit (["_"] digit)*
fraction ::= "." digitpart
exponent ::= ("e" | "E") ["+" | "-"] digitpart

Note that the integer and exponent parts are always interpreted using radix 10. 请注意,整数和指数部分始终使用基数10进行解释。For example, 077e010 is legal, and denotes the same number as 77e10. 例如,077e010是合法的,表示与77e10相同的数字。The allowed range of floating point literals is implementation-dependent. 允许的浮点文字范围取决于实现。As in integer literals, underscores are supported for digit grouping.与整数文本一样,数字分组支持下划线。

Some examples of floating point literals:浮点文字的一些示例:

3.14    10.    .001    1e100    3.14e-10    0e0    3.14_15_93

Changed in version 3.6: 在版本3.6中更改:Underscores are now allowed for grouping purposes in literals.下划线现在可以用于文本中的分组目的。

2.4.7. Imaginary literals虚数的字面值

Imaginary literals are described by the following lexical definitions:假想文字由以下词汇定义描述:


imagnumber ::= (floatnumber | digitpart) ("j" | "J")

An imaginary literal yields a complex number with a real part of 0.0. Complex numbers are represented as a pair of floating point numbers and have the same restrictions on their range. 虚文字产生实数部分为0.0的复数。复数表示为一对浮点数,并且对其范围有相同的限制。To create a complex number with a nonzero real part, add a floating point number to it, e.g., (3+4j). 要创建具有非零实部的复数,请向其添加一个浮点数,例如,(3+4j)Some examples of imaginary literals:想象文字的一些示例:

3.14j   10.j    10j     .001j   1e100j   3.14e-10j   3.14_15_93j

2.5. Operators运算符

The following tokens are operators:以下标记是运算符:

+       -       *       **      /       //      %      @
<< >> & | ^ ~ :=
< > <= >= == !=

2.6. Delimiters分隔符

The following tokens serve as delimiters in the grammar:以下标记在语法中用作分隔符:

(       )       [       ]       {       }
, : . ; @ = ->
+= -= *= /= //= %= @=
&= |= ^= >>= <<= **=

The period can also occur in floating-point and imaginary literals. 句点也可以出现在浮点和虚文字中。A sequence of three periods has a special meaning as an ellipsis literal. 由三个句点组成的序列作为省略文字具有特殊意义。The second half of the list, the augmented assignment operators, serve lexically as delimiters, but also perform an operation.列表的后半部分是增广赋值运算符,在词汇上用作分隔符,但也执行一个操作。

The following printing ASCII characters have special meaning as part of other tokens or are otherwise significant to the lexical analyzer:以下打印ASCII字符作为其他标记的一部分具有特殊含义,或对词法分析器有其他意义:

'       "       #       \

The following printing ASCII characters are not used in Python. Python中不使用以下打印ASCII字符。Their occurrence outside string literals and comments is an unconditional error:它们出现在字符串文本和注释之外是一个无条件错误:

$       ?       `

Footnotes

1

https://www.unicode.org/Public/11.0.0/ucd/NameAliases.txt