Table of Contents
This chapter specifies the lexical structure of the Java programming language.本章介绍Java编程语言的词汇结构。
Programs are written in Unicode (3.1), but lexical translations are provided (3.2) so that Unicode escapes (3.3) can be used to include any Unicode character using only ASCII characters. 程序是用Unicode(3.1)编写的,但提供了词汇翻译(3.2),因此Unicode转义(3.3)可以用于包含任何仅使用ASCII字符的Unicode字符。Line terminators are defined (3.4) to support the different conventions of existing host systems while maintaining consistent line numbers.定义了线路终端(3.4),以支持现有主机系统的不同约定,同时保持一致的线路编号。
The Unicode characters resulting from the lexical translations are reduced to a sequence of input elements (3.5), which are white space (3.6), comments (3.7), and tokens. 词汇翻译产生的Unicode字符被简化为一系列输入元素(3.5),这些元素是空格(3.6)、注释(3.7)和标记。The tokens are the identifiers (3.8), keywords (3.9), literals (3.10), separators (3.11), and operators (3.12) of the syntactic grammar.标记是句法语法的标识符(3.8)、关键字(3.9)、文字(3.10)、分隔符(3.11)和运算符(3.12)。
Programs are written using the Unicode character set (1.7). 程序使用Unicode字符集(1.7)编写。Information about this character set and its associated character encodings may be found at https://www.unicode.org/.有关此字符集及其关联字符编码的信息,请访问https://www.unicode.org/。
The Java SE Platform tracks the Unicode Standard as it evolves. JavaSE平台跟踪Unicode标准的发展。The precise version of Unicode used by a given release is specified in the documentation of the class 类Character
.Character
的文档中指定了给定版本使用的Unicode的精确版本。
Versions of the Java programming language prior to JDK 1.1 used Unicode 1.1.5. JDK1.1之前的Java编程语言版本使用Unicode 1.1.5。Upgrades to newer versions of the Unicode Standard occurred in JDK 1.1 (to Unicode 2.0), JDK 1.1.7 (to Unicode 2.1), Java SE 1.4 (to Unicode 3.0), Java SE 5.0 (to Unicode 4.0), Java SE 7 (to Unicode 6.0), Java SE 8 (to Unicode 6.2), Java SE 9 (to Unicode 8.0), Java SE 11 (to Unicode 10.0), Java SE 12 (to Unicode 11.0), Java SE 13 (to Unicode 12.1), and Java SE 15 (to Unicode 13.0).在JDK1.1(到Unicode 2.0)、JDK1.1.7(到Unicode 2.1)、JavaSE1.4(到Unicode 3.0)、JavaSE5.0(到Unicode 4.0)、JavaSE7(到Unicode 6.0)、JavaSE8(到Unicode 6.2)、JavaSE9(到Unicode 8.0)、JavaSE11(到Unicode 10.0)、JavaSE12(到Unicode 11.0)、JavaSE13(到Unicode 12.1)中升级到Unicode标准的更新版本,和JavaSE15(到Unicode 13.0)。
The Unicode standard was originally designed as a fixed-width 16-bit character encoding. Unicode标准最初设计为固定宽度的16位字符编码。It has since been changed to allow for characters whose representation requires more than 16 bits. The range of legal code points is now U+0000 to U+10FFFF, using the hexadecimal U+n notation. 此后,它被修改为允许表示需要超过16位的字符。使用十六进制U+n表示法,法定代码点的范围现在是U+0000到U+10FFFF。Characters whose code points are greater than U+FFFF are called supplementary characters. 代码点大于U+FFFF的字符称为补充字符。To represent the complete range of characters using only 16-bit units, the Unicode standard defines an encoding called UTF-16. 为了仅使用16位单位表示整个字符范围,Unicode标准定义了一种称为UTF-16的编码。In this encoding, supplementary characters are represented as pairs of 16-bit code units, the first from the high-surrogates range (U+D800 to U+DBFF), and the second from the low-surrogates range (U+DC00 to U+DFFF). 在这种编码中,补充字符表示为16位代码单元对,第一个来自高代理项范围(U+D800到U+DBFF),第二个来自低代理项范围(U+DC00到U+DFFF)。For characters in the range U+0000 to U+FFFF, the values of code points and UTF-16 code units are the same.对于U+0000到U+FFFF范围内的字符,代码点和UTF-16代码单位的值相同。
The Java programming language represents text in sequences of 16-bit code units, using the UTF-16 encoding.Java编程语言使用UTF-16编码以16位代码单元的序列表示文本。
Some APIs of the Java SE Platform, primarily in the JavaSE平台的一些API(主要是Character
class, use 32-bit integers to represent code points as individual entities. Character
类)使用32位整数将代码点表示为单个实体。The Java SE Platform provides methods to convert between 16-bit and 32-bit representations.JavaSE平台提供了在16位和32位表示之间转换的方法。
This specification uses the terms code point and UTF-16 code unit where the representation is relevant, and the generic term character where the representation is irrelevant to the discussion.本规范使用术语代码点和UTF-16代码单元(表示相关),以及通用术语字符(表示与讨论无关)。
Except for comments (3.7), identifiers (3.8), and the contents of character literals, string literals, and text blocks (3.10.4, 3.10.5, 3.10.6), all input elements (3.5) in a program are formed only from ASCII characters (or Unicode escapes (3.3) which result in ASCII characters).除了注释(3.7)、标识符(3.8)以及字符文本、字符串文本和文本块(3.10.4、3.10.5、3.10.6)的内容外,程序中的所有输入元素(3.5)仅由ASCII字符(或导致ASCII字符的Unicode转义(3.3))构成。
ASCII (ANSI X3.4) is the American Standard Code for Information Interchange. ASCII(ANSI X3.4)是信息交换的美国标准代码。The first 128 characters of the Unicode UTF-16 encoding are the ASCII characters.Unicode UTF-16编码的前128个字符是ASCII字符。
A raw Unicode character stream is translated into a sequence of tokens, using the following three lexical translation steps, which are applied in turn:原始Unicode字符流使用以下三个词汇转换步骤转换为一系列标记,这三个步骤依次应用:
A translation of Unicode escapes (3.3) in the raw stream of Unicode characters to the corresponding Unicode character. 将原始Unicode字符流中的Unicode转义(3.3)转换为相应的Unicode字符。A Unicode escape of the form 格式\uxxxx
, where xxxx
is a hexadecimal value, represents the UTF-16 code unit whose encoding is xxxx
. \uxxxx
为Unicode转义,其中xxxx
是十六进制值,表示编码为xxxx的UTF-16代码单元。This translation step allows any program to be expressed using only ASCII characters.此翻译步骤允许任何程序仅使用ASCII字符表示。
A translation of the Unicode stream resulting from step 1 into a stream of input characters and line terminators (3.4).将步骤1产生的Unicode流转换为输入字符流和行终止符(3.4)。
A translation of the stream of input characters and line terminators resulting from step 2 into a sequence of input elements (3.5) which, after white space (3.6) and comments (3.7) are discarded, comprise the tokens that are the terminal symbols of the syntactic grammar (2.3).将步骤2产生的输入字符流和行结束符翻译成一系列输入元素(3.5),在空白(3.6)和注释(3.7)后丢弃,这些输入元素包括作为句法语法(2.3)终端符号的标记。
The longest possible translation is used at each step, even if the result does not ultimately make a correct program while another lexical translation would. 每一步都使用尽可能长的翻译,即使结果最终不能生成正确的程序,而另一个词汇翻译可能会生成正确的程序。There are two exceptions to account for situations that need more granular translation: in step 1, for the processing of contiguous 对于需要更精细翻译的情况,有两种例外情况:在步骤1中,用于处理连续\
characters (3.3), and in step 3, for the processing of contextual keywords and adjacent >
characters (3.5).\
字符(3.3),在步骤3中,用于处理上下文关键字和相邻>
字符(3.5)。
The input characters 输入字符a--b
are tokenized as a
, --
, and b
, which is not part of any grammatically correct program, even though the tokenization a
, -
, -
, b
could be part of a grammatically correct program. a--b
被标记为a
,--
和b
,这不是任何语法正确程序的一部分,即使标记化a
、-
、-
、b
可能是语法正确程序的一部分。The tokenization 可以使用输入字符a
, -
, -
, b
can be realized with the input characters a- -b
(with an ASCII SP character between the two -
characters).a- -b
(两个-
字符之间有一个ASCII SP字符)实现标记化a
、-
、-
、-
、b
。
It might be supposed that the raw input 可以假定原始输入\\u1234
is translated to a \
character and (following the "longest possible" rule) a Unicode escape of the form \u1234
. \\u1234
被转换为\
字符和(遵循“尽可能长的”规则)形式为\u1234
的Unicode转义。In fact, the leading 事实上,前导\
character causes this raw input to be translated to seven distinct characters:\
字符导致此原始输入被转换为七个不同的字符:
\ \ u 1 2 3 4
.
A compiler for the Java programming language ("Java compiler") first recognizes Unicode escapes in its raw input, translating the ASCII characters Java编程语言的编译器(“Java编译器”)首先识别其原始输入中的Unicode转义,将ASCII字符\u
followed by four hexadecimal digits to a raw input character which denotes the UTF-16 code unit (3.1) for the indicated hexadecimal value. \u
后跟四个十六进制数字转换为原始输入字符,表示所示十六进制值的UTF-16代码单元(3.1)。One Unicode escape can represent characters in the range U+0000 to U+FFFF; representing supplementary characters in the range U+010000 to U+10FFFF requires two consecutive Unicode escapes. 一个Unicode转义可以表示U+0000到U+FFFF范围内的字符;表示U+010000到U+10FFFF范围内的补充字符需要两个连续的Unicode转义。All other characters in the compiler's raw input are recognized as raw input characters and passed unchanged.编译器原始输入中的所有其他字符都会被识别为原始输入字符,并以不变的方式传递。
This translation step results in a sequence of Unicode input characters, all of which are raw input characters (any Unicode escapes having been reduced to raw input characters).此转换步骤产生一系列Unicode输入字符,所有这些字符都是原始输入字符(任何Unicode转义都已减少为原始输入字符)。
The 这里的\
, u
, and hexadecimal digits here are all ASCII characters.\
、u
和十六进制数字都是ASCII字符。
The UnicodeInputCharacter production is ambiguous because an ASCII UnicodeInputCharacter产品是不明确的,因为编译器原始输入中的ASCII \
character in the compiler's raw input could be reduced to either a RawInputCharacter or the \
of a UnicodeEscape (to be followed by an ASCII u
). \
字符可以减少为RawInputCharacter或UnicodeEscape的\
字符(后面跟一个ASCII u
)。To avoid ambiguity, for each ASCII 为了避免歧义,对于编译器的原始输入中的每个ASCII字符\
character in the compiler's raw input, input processing must consider the most recent raw input characters that resulted from this translation step:\
,输入处理必须考虑由该翻译步骤产生的最新的原始输入字符:
If the most recent raw input character in the result was itself translated from a Unicode escape in the compiler's raw input, then the ASCII 如果结果中最近的原始输入字符本身是从编译器原始输入中的Unicode转义转换而来的,则ASCII字符\
character is eligible to begin a Unicode escape.\
有资格开始Unicode转义。
For example, if the most recent raw input character in the result was a backslash that arose from a Unicode escape 例如,如果结果中最近的原始输入字符是由原始输入中的Unicode转义\u005c
in the raw input, then an ASCII \
character appearing next in the raw input is eligible to begin another Unicode escape.\u005c
产生的反斜杠,则原始输入中出现的下一个ASCII 字符\
有资格开始另一个Unicode转义。
Otherwise, consider how many backslashes appeared contiguously as raw input characters in the result, back to a non-backslash character or the start of the result. 否则,考虑在结果中出现了多少个反斜杠作为原始输入字符,返回到非反斜杠字符或结果开始。(It is immaterial whether any such backslash arose from an ASCII (任何此类反斜杠是源于编译器原始输入中的ASCII 字符\
character in the compiler's raw input or from a Unicode escape \u005c
in the compiler's raw input.) \
还是源于编译器原始输入中的Unicode转义\u005c
并不重要。)If this number is even, then the ASCII 如果此数字为偶数,则ASCII字符\
character is eligible to begin a Unicode escape; if the number is odd, then the ASCII \
character is not eligible to begin a Unicode escape.\
有资格开始Unicode转义;如果数字为奇数,则ASCII字符\
不符合开始Unicode转义的条件。
For example, the raw input 例如,原始输入"\\u2122=\u2122"
results in the eleven characters " \ \ u 2 1 2 2 = ™ "
because while the second ASCII \
character in the raw input is not eligible to begin a Unicode escape, the third ASCII \
character is eligible, and \u2122
is the Unicode encoding of the character ™
."\\u2122=\u2122"
会产生11个字符" \ \ u 2 1 2 2 = ™ "
,因为虽然原始输入中的第二个ASCII字符\
不符合开始Unicode转义的条件,但第三个ASCII字符\
符合条件,并且\u2122
是字符™
的Unicode编码。
If an eligible 如果合格的\
is not followed by u
, then it is treated as a RawInputCharacter and remains part of the escaped Unicode stream.\
后面没有u
,则它将被视为RawInputCharacter
,并且仍然是转义Unicode流的一部分。
If an eligible 如果一个合格的\
is followed by u
, or more than one u
, and the last u
is not followed by four hexadecimal digits, then a compile-time error occurs.\
后面跟有u
,或多个u
,而最后一个u
后面没有四个十六进制数字,则会发生编译时错误。
The character produced by a Unicode escape does not participate in further Unicode escapes.Unicode转义生成的字符不参与进一步的Unicode转义。
For example, the raw input 例如,原始输入\u005cu005a
results in the six characters \ u 0 0 5 a
, because 005c
is the Unicode value for a backslash. \u005cu005a
会产生六个字符\ u 0 0 5 a
,因为005c
是反斜杠的Unicode值。It does not result in the character 它不会产生字符Z
, which is Unicode value 005a
, because the backslash that resulted from processing the Unicode escape \u005c
is not interpreted as the start of a further Unicode escape.Z
,即Unicode值005a
,因为处理Unicode转义\u005c
产生的反斜杠不会被解释为进一步Unicode转义的开始。
Note that 请注意\u005cu005a
cannot be written in a string literal to denote the six characters \ u 0 0 5 a
. \u005cu005a
不能用字符串文字来表示六个字符\ u 0 0 5 a
。This is because the first two characters resulting from translation, 这是因为翻译产生的前两个字符、\
and u
, are interpreted in a string literal as an illegal escape sequence (3.10.7).\
和u
在字符串文本中被解释为非法转义序列(3.10.7)。
Fortunately, the rule about contiguous backslash characters helps programmers to craft raw inputs that denote Unicode escapes in a string literal. 幸运的是,关于连续反斜杠字符的规则可以帮助程序员手工制作原始输入,以字符串文字表示Unicode转义。Denoting the six characters 表示字符串文字中的六个字符\ u 0 0 5 a
in a string literal simply requires another \
to be placed adjacent to the existing \
, such as "\\u005a is Z"
. \ u 0 0 5 a
只需要在现有的\
旁边放置另一个\
即可,例如"\\u005a is Z"
。This works because the second 这是因为原始输入\
in the raw input \\u005a
is not eligible to begin a Unicode escape, so the first \
and the second \
are preserved as raw input characters, as are the next five characters u 0 0 5 a
. \\u005a
中的第二个\
没有资格开始Unicode转义,因此第一个\
和第二个\
将作为原始输入字符保留,接下来的五个字符u 0 0 5 a
也是如此。The two 这两个\
characters are subsequently interpreted in a string literal as the escape sequence for a backslash, resulting in a string with the desired six characters \ u 0 0 5 a
. \
字符随后在字符串文字中被解释为反斜杠的转义序列,从而生成一个包含所需六个字符\ u 0 0 5 a
的字符串。Without the rule, the raw input 如果没有规则,原始输入\\u005a
would be processed as a raw input character \
followed by a Unicode escape \u005a
which becomes a raw input character Z
; this would be unhelpful because \Z
is an illegal escape sequence in a string literal. \\u005a
将被处理为原始输入字符\
后跟Unicode转义符\u005a
,它将成为原始输入字符Z
;这将毫无帮助,因为\Z
是字符串文本中的非法转义序列。(Note that the rule translates (请注意,该规则将\u005c\u005c
to \\
because the translation of the first Unicode escape to a raw input character \
does not prevent the translation of the second Unicode escape to another raw input character \
.)\u005c\u005c
转换为\\
,因为将第一个Unicode转义转换为原始输入字符\
不会阻止将第二个Unicode转义转换为另一个原始输入字符\
。)
The rule also allows programmers to craft raw inputs that denote escape sequences in a string literal. 该规则还允许程序员手工制作用字符串文字表示转义序列的原始输入。For example, the raw input 例如,原始输入\\\u006e
results in the three characters \ \ n
because the first \
and the second \
are preserved as raw input characters, while the third \
is eligible to begin a Unicode escape and thus \u006e
is translated to a raw input character n
. \\\u006e
产生三个字符\ \ n
,因为第一个\
和第二个\
被保留为原始输入字符,而第三个\
有资格开始Unicode转义,因此\u006e
被转换为原始输入字符n
。The three characters 三个字符\ \ n
are subsequently interpreted in a string literal as \n
which denotes the escape sequence for a linefeed. \ \ n
随后在字符串文字中解释为\n
,表示换行符的转义序列。(Note that (请注意,\\\u006e
may be written as \u005c\u005c\u006e
because each Unicode escape \u005c
is translated to a raw input character \
and so the remaining raw input \u006e
is preceded by an even number of backslashes and processed as the Unicode escape for n
.)\\\u006e
可能被写为\u005c\u005c\u006e
,因为每个Unicode转义\u005c
都被翻译成一个原始输入字符,因此剩余的原始输入\u006e
前面有偶数个反斜杠,并作为n
的Unicode转义进行处理。)
The Java programming language specifies a standard way of transforming a program written in Unicode into ASCII that changes a program into a form that can be processed by ASCII-based tools. Java编程语言指定了一种将Unicode编写的程序转换为ASCII的标准方法,该方法将程序转换为可由基于ASCII的工具处理的形式。The transformation involves converting any Unicode escapes in the source text of the program to ASCII by adding an extra 转换涉及将程序源文本中的任何Unicode转义转换为ASCII,方法是添加一个额外的u
- for example, \uxxxx
becomes \uuxxxx
- while simultaneously converting non-ASCII characters in the source text to Unicode escapes containing a single u
each.u
(例如,\uxxxx
变为\uuxxxx
),同时将源文本中的非ASCII字符转换为包含单个u的Unicode转义。
This transformed version is equally acceptable to a Java compiler and represents the exact same program. 这个转换后的版本同样可以被Java编译器接受,并且表示完全相同的程序。The exact Unicode source can later be restored from this ASCII form by converting each escape sequence where multiple 通过将存在多个u
's are present to a sequence of Unicode characters with one fewer u
, while simultaneously converting each escape sequence with a single u
to the corresponding single Unicode character.u
的每个转义序列转换为少一个u
的Unicode字符序列,同时将每个单u转义序列转换为相应的单Unicode字符,可以从这种ASCII形式恢复确切的Unicode源。
A Java compiler should use the 当合适的字体不可用时,Java编译器应使用\uxxxx
notation as an output format to display Unicode characters when a suitable font is not available.\uxxxx
符号作为输出格式来显示Unicode字符。
A Java compiler next divides the sequence of Unicode input characters into lines by recognizing line terminators.Java编译器接下来通过识别行终止符将Unicode输入字符序列分成行。
Lines are terminated by the ASCII characters CR, or LF, or CR LF. 行由ASCII字符CR、LF或CR LF终止。The two characters CR immediately followed by LF are counted as one line terminator, not two.紧跟LF的两个字符CR被算作一行结束符,而不是两行结束符。
A line terminator specifies the termination of the 行终止符指定注释//
form of a comment (3.7).//
形式的终止(3.7)。
The lines defined by line terminators may determine the line numbers produced by a Java compiler.由行终止符定义的行可以确定Java编译器生成的行号。
The result is a sequence of line terminators and input characters, which are the terminal symbols for the third step in the tokenization process.结果是一系列行终止符和输入字符,它们是标记化过程中第三步的终端符号。
The input characters and line terminators that result from Unicode escape processing (3.3) and then input line recognition (3.4) are reduced to a sequence of input elements.Unicode转义处理(3.3)和输入行识别(3.4)产生的输入字符和行终止符被简化为一个输入元素序列。
Those input elements that are not white space or comments are tokens. 那些不是空白或注释的输入元素是口令。The tokens are the terminal symbols of the syntactic grammar (2.3).标记是句法语法的终端符号(2.3)。
White space (3.6) and comments (3.7) can serve to separate tokens that, if adjacent, might be tokenized in another manner.空白(3.6)和注释(3.7)可以用来分隔标记,如果相邻,可能会以另一种方式标记。
For example, the input characters -
and =
can form the operator token -=
(3.12) only if there is no intervening white space or comment. As another example, the ten input characters staticvoid
form a single identifier token while the eleven input characters static void
(with an ASCII SP character between c
and v
)
form a pair of keyword tokens, static
and void
, separated by white space.
As a special concession for compatibility with certain operating systems, the ASCII SUB character (作为与某些操作系统兼容的特殊让步,如果ASCII子字符(\u001a
, or control-Z) is ignored if it is the last character in the escaped input stream.\u001a
,或control-Z)是转义输入流中的最后一个字符,则忽略它。
The Input production is ambiguous, meaning that for some sequences of input characters, there is more than one way to reduce the input characters to input elements (that is, to tokenize the input characters). 输入结果是不明确的,这意味着对于某些输入字符序列,有不止一种方法可以将输入字符简化为输入元素(即,将输入字符标记化)。Ambiguities are resolved as follows:歧义的解决方法如下:
A sequence of input characters that could be reduced to either an identifier token or a literal token is always reduced to a literal token.可以缩减为标识符标记或文字标记的输入字符序列始终缩减为文字标记。
A sequence of input characters that could be reduced to either an identifier token or a reserved keyword token (3.9) is always reduced to a reserved keyword token.可以缩减为标识符标记或保留关键字标记(3.9)的输入字符序列始终缩减为保留关键字标记。
A sequence of input characters that could be reduced to either a contextual keyword token or to other (non-keyword) tokens is reduced according to context, as specified in 3.9.
If the input character >
appears in a type context (4.11), that is, as part of a Type or an UnannType in the syntactic grammar (4.1, 8.3), it is always reduced to the numerical comparison operator >
, even when it could be combined with an adjacent >
character to form a different operator.
Without this rule for >
characters, two consecutive >
brackets in a type such as List<List<String>>
would be tokenized as the signed right shift operator >>
, while three consecutive >
brackets in a type such as List<List<List<String>>>
would be tokenized as the unsigned right shift operator >>>
. Worse, the tokenization of four or more consecutive >
brackets in a type such as List<List<List<List<String>>>>
would be ambiguous, as various combinations of >
, >>
, and >>>
tokens could represent the >
>
>
>
characters.
Consider two tokens x
and y
in the resulting input stream. If x
precedes y
, then we say that x
is to the left of y
and that y
is to the right of x
.
For example, in this simple piece of code:例如,在这段简单的代码中:
class Empty { }
we say that the 我们说,}
token is to the right of the {
token, even though it appears, in this two-dimensional representation, downward and to the left of the {
token. {
口令在{
口令的右边,尽管在这个二维表示中,{
口令看起来是向下的,在{
口令的左边。This convention about the use of the words left and right allows us to speak, for example, of the right-hand operand of a binary operator or of the left-hand side of an assignment.这个关于单词left和right的用法的约定允许我们谈论,例如,二进制运算符的右侧操作数或赋值的左侧操作数。
White space is defined as the ASCII space character, horizontal tab character, form feed character, and line terminator characters (3.4).空白被定义为ASCII空格字符、水平制表符、换行符和行结束符(3.4)。
There are two kinds of comments:有两种评论:
These productions imply all of the following properties:这些产品包含以下所有属性:
As a result, the following text is a single complete comment:因此,以下文本是一个完整的注释:
/* this comment /* // /** ends here: */
The lexical grammar implies that comments do not occur within character literals, string literals, or text blocks (3.10.4, 3.10.5, 3.10.6).词汇语法意味着注释不会出现在字符、字符串或文本块中(3.10.4、3.10.5、3.10.6)。
An identifier is an unlimited-length sequence of Java letters and Java digits, the first of which must be a Java letter.
A "Java letter" is a character for which the method Character.isJavaIdentifierStart(int)
returns true.
A "Java letter-or-digit" is a character for which the method Character.isJavaIdentifierPart(int)
returns true.
The "Java letters" include uppercase and lowercase ASCII Latin letters A-Z
(\u0041-\u005a
), and a-z
(\u0061-\u007a
), and, for historical reasons, the ASCII dollar sign ($
, or \u0024
) and underscore (_
, or \u005f
). The dollar sign should be used only in mechanically generated source code or, rarely, to access pre-existing names on legacy systems. The underscore may be used in identifiers formed of two or more characters, but it cannot be used as a one-character identifier due to being a keyword.
The "Java digits" include the ASCII digits 0-9
(\u0030-\u0039
).
Letters and digits may be drawn from the entire Unicode character set, which supports most writing scripts in use in the world today, including the large sets for Chinese, Japanese, and Korean. This allows programmers to use identifiers in their programs that are written in their native languages.
Two identifiers are the same only if, after ignoring characters that are ignorable, the identifiers have the same Unicode character for each letter or digit. An ignorable character is a character for which the method Character.isIdentifierIgnorable(int)
returns true. Identifiers that have the same external appearance may yet be different.
For example, the identifiers consisting of the single letters LATIN CAPITAL LETTER A (A
, \u0041
), LATIN SMALL LETTER A (a
, \u0061
), GREEK CAPITAL LETTER ALPHA (A
, \u0391
), CYRILLIC SMALL LETTER A (a
, \u0430
) and MATHEMATICAL BOLD ITALIC SMALL A (a
, \ud835\udc82
) are all different.
Unicode composite characters are different from their canonical equivalent decomposed characters. For example, a LATIN CAPITAL LETTER A ACUTE (Á
, \u00c1
) is different from a LATIN CAPITAL LETTER A (A
, \u0041
) immediately followed by a NON-SPACING ACUTE (´
, \u0301
) in identifiers. See The Unicode Standard, Section 3.11 "Normalization Forms".
Examples of identifiers are:
String
i3
αρετη
MAX_VALUE
isLetterOrDigit
An identifier never has the same spelling (Unicode character sequence) as a reserved keyword (3.9), a boolean literal (3.10.3) or the null literal (3.10.8), due to the rules of tokenization (3.5). However, an identifier may have the same spelling as a contextual keyword, because the tokenization of a sequence of input characters as an identifier or a contextual keyword depends on where the sequence appears in the program.
To facilitate the recognition of contextual keywords, the syntactic grammar (2.3) sometimes disallows certain identifiers by defining a production to accept only a subset of identifiers. The subsets are as follows:
TypeIdentifier is used in the declaration of classes, interfaces, and type parameters (8.1, 9.1, 4.4), and when referring to types (6.5). For example, the name of a class must be a TypeIdentifier, so it is illegal to declare a class named permits
, record
, sealed
, var
, or yield
.
UnqualifiedMethodIdentifier is used when a method invocation expression refers to a method by its simple name (6.5.7.1). Since the term yield
is excluded from UnqualifiedMethodIdentifier, any invocation of a method named yield
must be qualified, thus distinguishing the invocation from a yield
statement (14.21).
51 character sequences, formed from ASCII characters, are reserved for use as keywords and cannot be used as identifiers (3.8). Another 16 character sequences, also formed from ASCII characters, may be interpreted as keywords or as other tokens, depending on the context in which they appear.
abstract continue for new switch
assert default if package synchronized
boolean do goto private this
break double implements protected throw
byte else import public throws
case enum instanceof return transient
catch extends int short try
char final interface static void
class finally long strictfp volatile
const float native super while
_
(underscore)
The keywords const
and goto
are reserved, even though they are not currently used. This may allow a Java compiler to produce better error messages if these C++ keywords incorrectly appear in programs.
The keyword strictfp
is obsolete and should not be used in new code.
The keyword _
(underscore) is reserved for possible future use in parameter declarations.
true
and false
are not keywords, but rather boolean literals (3.10.3).
null
is not a keyword, but rather the null literal (3.10.8).
During the reduction of input characters to input elements (3.5), a sequence of input characters that notionally matches a contextual keyword is reduced to a contextual keyword if and only if both of the following conditions hold:
The sequence is recognized as a terminal specified in a suitable context of the syntactic grammar (2.3), as follows:
For module
and open
, when recognized as a terminal in a ModuleDeclaration (7.7).
For exports
, opens
, provides
, requires
, to
, uses
, and with
, when recognized as a terminal in a ModuleDirective.
For transitive
, when recognized as a terminal in a RequiresModifier.
For example, recognizing the sequence requires
transitive
;
does not make use of RequiresModifier, so the term transitive
is reduced here to an identifier and not a contextual keyword.
For var
, when recognized as a terminal in a LocalVariableType (14.4) or a LambdaParameterType (15.27.1).
In other contexts, attempting to use var
as an identifier will cause an error, because var
is not a TypeIdentifier (3.8).
For yield
, when recognized as a terminal in a YieldStatement (14.21).
In other contexts, attempting to use the yield
as an identifier will cause an error, because yield
is neither a TypeIdentifier nor a UnqualifiedMethodIdentifier.
For record
, when recognized as a terminal in a RecordDeclaration (8.10).
For non-sealed
, permits
, and sealed
, when recognized as a terminal in a NormalClassDeclaration
(8.1) or a NormalInterfaceDeclaration (9.1).
The sequence is not immediately preceded or immediately followed by an input character that matches JavaLetterOrDigit.
In general, accidentally omitting white space in source code will cause a sequence of input characters to be tokenized as an identifier, due to the "longest possible translation" rule (3.2). For example, the sequence of twelve input characters p u b l i c s t a t i c
is always tokenized as the identifier publicstatic
, rather than as the reserved keywords public
and static
. If two tokens are intended, they must be separated by white space or a comment.
The rule above works in tandem with the "longest possible translation" rule to produce an intuitive result in contexts where contextual keywords may appear. For example, the sequence of eleven input characters v a r f i l e n a m e
is usually tokenized as the identifier varfilename
, but in a local variable declaration, the first three input characters are tentatively recognized as the contextual keyword var
by the first condition of the rule above. However, it would be confusing to overlook the lack of white space in the sequence by recognizing the next eight input characters as the identifier filename
. (This would mean that the sequence undergoes different tokenization in different contexts: an identifier in most contexts, but a contextual keyword and an identifier in local variable declarations.) Accordingly, the second condition prevents recognition of the contextual keyword var
on the grounds that the immediately following input character f
is a JavaLetterOrDigit. The sequence v a r f i l e n a m e
is therefore tokenized as the identifier varfilename
in a local variable declaration.
As another example of the careful recognition of contextual keywords, consider the sequence of 15 input characters n o n - s e a l e d c l a s s
. This sequence is usually translated to three tokens - the identifier non
, the operator -
, and the identifier sealedclass
- but in a normal class declaration, where the first condition holds, the first ten input characters are tentatively recognized as the contextual keyword non-sealed
. To avoid translating the sequence to two keyword tokens (non-sealed
and class
) rather than three non-keyword tokens, and to avoid rewarding the programmer for omitting white space before class
, the second condition prevents recognition of the contextual keyword. The sequence n o n - s e a l e d c l a s s
is therefore tokenized as three tokens in a class declaration.
In the rule above, the first condition depends on details of the syntactic grammar, but a compiler for the Java programming language can implement the rule without fully parsing the input program. For example, a heuristic could be used to track the contextual state of the tokenizer, as long as the heuristic guarantees that valid uses of contextual keywords are tokenized as keywords, and valid uses of identifiers are tokenized as identifiers. Alternatively, a compiler could always tokenize a contextual keyword as an identifier, leaving it to a later phase to recognize special uses of these identifiers.
A literal is the source code representation of a value of a primitive type (4.2), the String
type (4.3.3), or the null type (4.1).
An integer literal may be expressed in decimal (base 10), hexadecimal (base 16), octal (base 8), or binary (base 2).
An integer literal is of type long
if it is suffixed with an ASCII letter L
or l
(ell); otherwise it is of type int
(4.2.1).
The suffix L
is preferred, because the letter l
(ell) is often hard to distinguish from the digit 1
(one).
Underscores are allowed as separators between digits that denote the integer.
In a hexadecimal or binary literal, the integer is only denoted by the digits after the 0x
or 0b
characters and before any type suffix. Therefore, underscores may not appear immediately after 0x
or 0b
, or after the last digit in the numeral.
In a decimal or octal literal, the integer is denoted by all the digits in the literal before any type suffix. Therefore, underscores may not appear before the first digit or after the last digit in the numeral. Underscores may appear after the initial 0
in an octal numeral (since 0
is a digit that denotes part of the integer) and after the initial non-zero digit in a non-zero decimal literal.
A decimal numeral is either the single ASCII digit 0
, representing the integer zero, or consists of an ASCII digit from 1
to 9
optionally followed by one or more ASCII digits from 0
to 9
interspersed with underscores, representing a positive integer.
A hexadecimal numeral consists of the leading ASCII characters 0x
or 0X
followed by one or more ASCII hexadecimal digits interspersed with underscores, and can represent a positive, zero, or negative integer.
Hexadecimal digits with values 10 through 15 are represented by the ASCII letters a
through f
or A
through F
, respectively;
each letter used as a hexadecimal digit may be uppercase or lowercase.
The HexDigit production above comes from 3.3.
An octal numeral consists of an ASCII digit 0
followed by one or more of the ASCII digits 0
through 7
interspersed with underscores, and can represent a positive, zero, or negative integer.
Note that octal numerals always consist of two or more digits, as 0
alone is always considered to be a decimal numeral - not that it matters much in practice, for the numerals 0
, 00
, and 0x0
all represent exactly the same integer value.
A binary numeral consists of the leading ASCII characters 0b
or 0B
followed by one or more of the ASCII digits 0
or 1
interspersed with underscores, and can represent a positive, zero, or negative integer.
The largest decimal literal of type int
is 2147483648
(231).
All decimal literals from 0
to 2147483647
may appear anywhere an int
literal may appear. The decimal literal 2147483648
may appear only as the operand of the unary minus operator -
(15.15.4).
It is a compile-time error if the decimal literal 2147483648
appears anywhere other than as the operand of the unary minus operator; or if a decimal literal of type int
is larger than 2147483648
(231).
The largest positive hexadecimal, octal, and binary literals of type int
- each of which represents the decimal value 2147483647
(231-1)
- are respectively:
The most negative hexadecimal, octal, and binary literals of type int
- each of which represents the decimal value -2147483648
(-231) - are respectively:
The following hexadecimal, octal, and binary literals represent the decimal value -1
:
It is a compile-time error if a hexadecimal, octal, or binary int
literal does not fit in 32 bits.
The largest decimal literal of type long
is 9223372036854775808L
(263).
All decimal literals from 0L
to 9223372036854775807L
may appear anywhere a long
literal may appear. The decimal literal 9223372036854775808L
may appear only as the operand of the unary minus operator -
(15.15.4).
It is a compile-time error if the decimal literal 9223372036854775808L
appears anywhere other than as the operand of the unary minus operator; or if a decimal literal of type long
is larger than 9223372036854775808L
(263).
The largest positive hexadecimal, octal, and binary literals of type long
- each of which represents the decimal value 9223372036854775807L
(263-1) - are respectively:
The most negative hexadecimal, octal, and binary literals of type long
- each of which represents the decimal value -9223372036854775808L
(-263) - are respectively:
The following hexadecimal, octal, and binary literals represent the decimal value -1L
:
It is a compile-time error if a hexadecimal, octal, or binary long
literal does not fit in 64 bits.
Examples of int
literals:
0 2 0372 0xDada_Cafe 1996 0x00_FF__00_FF
Examples of long
literals:
0l 0777L 0x100000000L 2_147_483_648L 0xC0B0L
A floating-point literal has the following parts: a whole-number part, a decimal or hexadecimal point (represented by an ASCII period character), a fraction part, an exponent, and a type suffix.
A floating-point literal may be expressed in decimal (base 10) or hexadecimal (base 16).
For decimal floating-point literals, at least one digit (in either the whole number or the fraction part) and either a decimal point, an exponent, or a float type suffix are required. All other parts are optional. The exponent, if present, is indicated by the ASCII letter e
or E
followed by an optionally signed integer.
For hexadecimal floating-point literals, at least one digit is required (in either the whole number or the fraction part), and the exponent is mandatory, and the float type suffix is optional. The exponent is indicated by the ASCII letter p
or P
followed by an optionally signed integer.
Underscores are allowed as separators between digits that denote the whole-number part, and between digits that denote the fraction part, and between digits that denote the exponent.
A floating-point literal is of type float
if it is suffixed with an ASCII letter F
or f
; otherwise its type is double
and it can optionally be suffixed with an ASCII letter D
or d
.
The elements of the types float
and double
are those values that can be represented using the IEEE 754 binary32 and IEEE 754 binary64 floating-point formats, respectively (4.2.3).
The details of proper input conversion from a Unicode string representation of a floating-point number to the internal IEEE 754 binary floating-point representation are described for the methods valueOf
of class Float
and class Double
of the package java.lang
.
The largest and smallest positive literals of type float
are as follows:
The largest positive finite float
value is numerically equal to (2 - 2-23) ⋅ 2127.
The shortest decimal literal which rounds to this value is 3.4028235e38f
.
The smallest positive finite non-zero float
value is numerically equal to 2-149.
The shortest decimal literal which rounds to this value is 1.4e-45f
.
Two hexadecimal literals for this value are 0x0.000002P-126f
and 0x1.0P-149f
.
The largest and smallest positive literals of type double
are as follows:
The largest positive finite double
value is numerically equal to (2 - 2-52) ⋅ 21023.
The shortest decimal literal which rounds to this value is 1.7976931348623157e308
.
A hexadecimal literal for this value is 0x1.f_ffff_ffff_ffffP+1023
.
The smallest positive finite non-zero double
value is numerically equal to 2-1074.
The shortest decimal literal which rounds to this value is 4.9e-324
.
Two hexadecimal literals for this value are 0x0.0_0000_0000_0001P-1022
and 0x1.0P-1074
.
It is a compile-time error if a non-zero floating-point literal is too large, so that on rounded conversion to its internal representation, it becomes an IEEE 754 infinity.
A program can represent infinities without producing a compile-time error by using constant expressions such as 1f/0f
or -1d/0d
or by using the predefined constants POSITIVE_INFINITY
and NEGATIVE_INFINITY
of the classes Float
and Double
.
It is a compile-time error if a non-zero floating-point literal is too small, so that, on rounded conversion to its internal representation, it becomes a zero.
A compile-time error does not occur if a non-zero floating-point literal has a small value that, on rounded conversion to its internal representation, becomes a non-zero subnormal number.
Predefined constants representing Not-a-Number values are defined in the classes Float
and Double
as Float.NaN
and Double.NaN
.
Examples of float
literals:
1e1f 2.f .3f 0f 3.14f 6.022137e+23f
Examples of double
literals:
1e1 2. .3 0.0 3.14 1e-9d 1e137
The boolean
type has two values, represented by the boolean literals true
and false
, formed from ASCII letters.
A boolean literal is always of type boolean
(4.2.5).
A character literal is expressed as a character or an escape sequence (3.10.7), enclosed in ASCII single quotes. (The single-quote, or apostrophe, character is \u0027
.)
A character literal is always of type char
(4.2.1).
The content of a character literal is the SingleCharacter or the EscapeSequence which follows the opening '
.
It is a compile-time error for the character following the content to be other than a '
.
It is a compile-time error for a line terminator (3.4) to appear after the opening '
and before the closing '
.
The characters CR and LF are never an InputCharacter;
each is recognized as constituting a LineTerminator, so may not appear in a character literal, even in the escape sequence \
LineTerminator.
The character represented a character literal
is the content of the character literal with any escape sequence interpreted, as if by execution of String.translateEscapes
on the content.
Character literals can only represent UTF-16 code units (3.1), i.e., they are limited to values from \u0000
to \uffff
. Supplementary characters must be represented either as a surrogate pair within a char
sequence, or as an integer, depending on the API they are used with.
The following are examples of char
literals:
'a'
'%'
'\t'
'\\'
'\''
'\u03a9'
'\uFFFF'
'\177'
'™'
Because Unicode escapes are processed very early, it is not correct to write '\u000a'
for a character literal whose value is linefeed (LF); the Unicode escape \u000a
is transformed into an actual linefeed in translation step 1 (3.3) and the linefeed becomes a LineTerminator
in step 2 (3.4), so the character literal is not valid in step 3. Instead, one should use the escape sequence '\n'
. Similarly, it is not correct to write '\u000d'
for a character literal whose value is carriage return (CR). Instead, use '\r'
. Finally, it is not possible to write '\u0027'
for a character literal containing an apostrophe ('
).
In C and C++, a character literal may contain representations of more than one character, but the value of such a character literal is implementation-defined. In the Java programming language, a character literal always represents exactly one character.
A string literal consists of zero or more characters enclosed in double quotes. Characters such as newlines may be represented by escape sequences (3.10.7).
A string literal is always of type String
(4.3.3).
The content of a string literal is the sequence of characters that begins immediately after the opening "
and ends immediately before the matching closing "
.
It is a compile-time error for a line terminator (3.4) to appear after the opening "
and before the matching closing "
.
The characters CR and LF are never an InputCharacter;
each is recognized as constituting a LineTerminator, so may not appear in a string literal, even in the escape sequence \
LineTerminator.
The string represented by a string literal is the content of the string literal with every escape sequence interpreted, as if by execution of String.translateEscapes
on the content.
The following are examples of string literals:
"" // the empty string "\"" // a string containing " alone "This is a string" // a string containing 16 characters "This is a " + // actually a string-valued constant expression, "two-line string" // formed from two string literals
Because Unicode escapes are processed very early, it is not correct to write "\u000a"
for a string literal containing a single linefeed (LF); the Unicode escape \u000a
is transformed into an actual linefeed in translation step 1 (3.3) and the linefeed becomes a LineTerminator
in step 2 (3.4), so the string literal is not valid in step 3. Instead, one should use the escape sequence "\n"
. Similarly, it is not correct to write "\u000d"
for a string literal containing a single carriage return (CR). Instead, use "\r"
. Finally, it is not possible to write "\u0022"
for a string literal containing a double quotation mark ("
).
A long string literal can always be broken up into shorter pieces and written as a (possibly parenthesized) expression using the string concatenation operator +
(15.18.1).
At run time, a string literal is a reference to an instance of class String
(4.3.3) that denotes the string represented by the string literal.
Moreover, a string literal always refers to the same
instance of class String
. This is because string literals - or, more generally, strings that are the values of constant expressions (15.29) - are "interned" so as to share unique instances, as if by execution of the method String.intern
(12.5).
Example 3.10.5-1. String Literals
The program consisting of the compilation unit (7.3):
package testPackage; class Test { public static void main(String[] args) { String hello = "Hello", lo = "lo"; System.out.println(hello == "Hello"); System.out.println(Other.hello == hello); System.out.println(other.Other.hello == hello); System.out.println(hello == ("Hel"+"lo")); System.out.println(hello == ("Hel"+lo)); System.out.println(hello == ("Hel"+lo).intern()); } } class Other { static String hello = "Hello"; }
and the compilation unit:
package other; public class Other { public static String hello = "Hello"; }
produces the output:
true true true true false true
This example illustrates six points:
String literals in the same class and package represent references to the same String
object (4.3.1).
String literals in different classes in the same package represent references to the same String
object.
String literals in different classes in different packages likewise represent references to the same String
object.
Strings concatenated from constant expressions (15.29) are computed at compile time and then treated as if they were literals.
Strings computed by concatenation at run time are newly created and therefore distinct.
The result of explicitly interning a computed string is the same String
object as any pre-existing string literal with the same contents.
A text block consists of zero or more characters enclosed by opening and closing delimiters. Characters may be represented by escape sequences (3.10.7), but the newline and double quote characters that must be represented with escape sequences in a string literal (3.10.5) may be represented directly in a text block.
The following productions from 3.3, 3.4, and 3.6 are shown here for convenience:
A text block is always of type String
(4.3.3).
The opening delimiter is a sequence that starts with three double quote characters ("""
), continues with zero or more space, tab, and form feed characters, and concludes with a line terminator.
The closing delimiter is a sequence of three double quote characters.
The content of a text block is the sequence of characters that begins immediately after the line terminator of the opening delimiter, and ends immediately before the first double quote of the closing delimiter.
Unlike in a string literal (3.10.5), it is not a compile-time error for a line terminator to appear in the content of a text block.
Example 3.10.6-1. Text Blocks
When multi-line strings are desired, a text block is usually more readable than a concatenation of string literals. For example, compare these alternative representations of a snippet of HTML:
String html = "<html>\n" + " <body>\n" + " <p>Hello, world</p>\n" + " </body>\n" + "</html>\n"; String html = """ <html> <body> <p>Hello, world</p> </body> </html> """;
The following are examples of text blocks:
class Test { public static void main(String[] args) { // The six characters w i n t e r String season = """ winter"""; // The seven characters w i n t e r LF String period = """ winter """; // The ten characters H i , SP " B o b " LF String greeting = """ Hi, "Bob" """; // The eleven characters H i , LF SP " B o b " LF String salutation = """ Hi, "Bob" """; // The empty string (zero length) String empty = """ """; // The two characters " LF String quote = """ " """; // The two characters \ LF String backslash = """ \\ """; } }
Using the escape sequences \n
and \"
to represent a newline character and a double quote character, respectively, is permitted in a text block, though not usually necessary. The exception is where three consecutive double quote characters appear that are not intended to be the closing delimiter """
- in this case, it is necessary to escape at least one of the double quote characters in order to avoid mimicking the closing delimiter.
Example 3.10.6-2. Escape sequences in text blocks
In the following program, the value of the story
variable would be less readable if individual double quote characters were escaped:
class Story1 { public static void main(String[] args) { String story = """ "When I use a word," Humpty Dumpty said, in rather a scornful tone, "it means just what I choose it to mean - neither more nor less." "The question is," said Alice, "whether you can make words mean so many different things." "The question is," said Humpty Dumpty, "which is to be master - that's all." """; } }
If the program is modified to place the closing delimiter on the last line of the content, then an error occurs because the first three consecutive double quote characters on the last line are translated (3.2) into the closing delimiter """
and thus a stray double quote character remains:
class Story2 { public static void main(String[] args) { String story = """ "When I use a word," Humpty Dumpty said, in rather a scornful tone, "it means just what I choose it to mean - neither more nor less." "The question is," said Alice, "whether you can make words mean so many different things." "The question is," said Humpty Dumpty, "which is to be master - that's all.""""; // error } }
The error can be avoided by escaping the final double quote character in the content:
class Story3 { public static void main(String[] args) { String story = """ "When I use a word," Humpty Dumpty said, in rather a scornful tone, "it means just what I choose it to mean - neither more nor less." "The question is," said Alice, "whether you can make words mean so many different things." "The question is," said Humpty Dumpty, "which is to be master - that's all.\""""; // OK } }
If a text block is intended to denote another text block, then it is recommended to escape the first double quote character of the embedded opening and closing delimiters:
class Code { public static void main(String[] args) { String text = """ The quick brown fox jumps over the lazy dog """; String code = """ String text = \""" The quick brown fox jumps over the lazy dog \"""; """; } }
The string represented by a text block is not the literal sequence of characters in the content. Instead, the string represented by a text block is the result of applying the following transformations to the content, in order:
Line terminators are normalized to the ASCII LF character, as follows:
Incidental white space is removed, as if by execution of String.stripIndent
on the characters resulting from step 1.
Escape sequences are interpreted, as if by execution of String.translateEscapes
on the characters resulting from step 2.
When this specification says that a text block contains a particular character or sequence of characters, or that a particular character or sequence of characters is in a text block, it means that the string represented by the text block (as opposed to the literal sequence of characters in the content) contains the character or sequence of characters.
Example 3.10.6-3. Order of transformations on text block content
Interpreting escape sequences last allows programmers to use \n
, \f
, and \r
for vertical formatting of a string without affecting the normalization of line terminators, and to use \b
and \t
for horizontal formatting of a string without affecting the removal of incidental white space. For example, consider this text block that mentions the escape sequence \r
(CR):
String html = """ <html>\r <body>\r <p>Hello, world</p>\r </body>\r </html>\r """;
The \r
escape sequences are not interpreted until after the line terminators have been normalized to LF. Using Unicode escapes to visualize LF (\u000A
) and CR (\u000D
), and using |
to visualize the left margin, the string represented by the text block is:
|<html>\u000D\u000A | <body>\u000D\u000A | <p>Hello, world</p>\u000D\u000A | </body>\u000D\u000A |</html>\u000D\u000A
At run time, a text block is a reference to an instance of class String
that denotes the string represented by the text block.
Moreover, a text block always refers to the same
instance of class String
. This is because the strings represented by text blocks - or, more generally, strings that are the values of constant expressions (15.29) - are "interned" so as to share unique instances, as if by execution of the method String.intern
(12.5).
Example 3.10.6-4. Text blocks evaluate to String
Text blocks can be used wherever an expression of type String
is allowed, such as in string concatenation (15.18.1), in the invocation of methods on instances of String
, and in annotations with String
elements:
System.out.println("ab" + """ cde """); String cde = """ abcde""".substring(2); String math = """ 1+1 equals \ """ + String.valueOf(2); @Preconditions(""" rate > 0 && rate <= MAX_REFRESH_RATE """) public void setRefreshRate(int rate) { ... }
In character literals, string literals, and text blocks (3.10.4, 3.10.5, 3.10.6), the escape sequences allow for the representation of some nongraphic characters without using Unicode escapes (3.3), as well as the single quote, double quote, and backslash characters.
\ b
(backspace BS, Unicode \u0008
) \ s
(space SP, Unicode \u0020
) \ t
(horizontal tab HT, Unicode \u0009
) \ n
(linefeed LF, Unicode \u000a
) \ f
(form feed FF, Unicode \u000c
) \ r
(carriage return CR, Unicode \u000d
) \
LineTerminator (line continuation, no Unicode representation) \ "
(double quote "
, Unicode \u0022
) \ '
(single quote '
, Unicode \u0027
) \ \
(backslash \
, Unicode \u005c
) \u0000
to \u00ff
)0 1 2 3 4 5 6 7
The OctalDigit production above comes from 3.10.1. Octal escapes are provided for compatibility with C, but can express only Unicode values \u0000
through \u00FF
, so Unicode escapes are usually preferred.
It is a compile-time error if the character following a backslash in an escape sequence is not a LineTerminator or an ASCII b
, s
, t
, n
, f
, r
, "
, '
, \
, 0
, 1
, 2
, 3
, 4
, 5
, 6
, or 7
.
An escape sequence in the content of a character literal, string literal, or text block is interpreted
by replacing its \
and trailing character(s) with the single character denoted by the Unicode escape in the EscapeSequence grammar. The line continuation escape sequence has no corresponding Unicode escape, so is interpreted by replacing it with nothing.
The line continuation escape sequence can appear in a text block, but cannot appear in a character literal or a string literal because each disallows a LineTerminator.
The null type has one value, the null reference, represented by the null literal null
, which is formed from ASCII characters.
A null literal is always of the null type (4.1).