The Java Tutorials have been written for JDK 8.Java教程是为JDK 8编写的。Examples and practices described in this page don't take advantage of improvements introduced in later releases and might use technology no longer available.本页中描述的示例和实践没有利用后续版本中引入的改进,并且可能使用不再可用的技术。See Java Language Changes for a summary of updated language features in Java SE 9 and subsequent releases.有关Java SE 9及其后续版本中更新的语言特性的摘要,请参阅Java语言更改。
See JDK Release Notes for information about new features, enhancements, and removed or deprecated options for all JDK releases.有关所有JDK版本的新功能、增强功能以及已删除或不推荐的选项的信息,请参阅JDK发行说明。
As of the JDK 7 release, Regular Expression pattern matching has expanded functionality to support Unicode 6.0.从JDK 7版本开始,正则表达式模式匹配已经扩展了支持Unicode 6.0的功能。
You can match a specific Unicode code point using an escape sequence of the form 可以使用格式为\uFFFF
, where FFFF
is the hexadecimal value of the code point you want to match. \uFFFF
的转义序列匹配特定的Unicode代码点,其中FFFF
是要匹配的代码点的十六进制值。For example, 例如,\u6771
matches the Han character for east.\u6771
匹配东方的汉字。
Alternatively, you can specify a code point using Perl-style hex notation, 或者,您可以使用Perl风格的十六进制表示法\x{...}
. \x{...}
指定代码点。For example:例如:
String hexPattern = "\x{" + Integer.toHexString(codePoint) + "}";
Each Unicode character, in addition to its value, has certain attributes, or properties. 每个Unicode字符除了其值之外,还具有某些属性。You can match a single character belonging to a particular category with the expression 可以将属于特定类别的单个字符与表达式\p{prop}
. \p{prop}
匹配。You can match a single character not belonging to a particular category with the expression 可以将不属于特定类别的单个字符与表达式\P{prop}
.\P{prop}
匹配。
The three supported property types are scripts, blocks, and a "general" category.支持的三种属性类型是脚本、块和“常规”类别。
To determine if a code point belongs to a specific script, you can either use the 要确定某个代码点是否属于特定脚本,可以使用script
keyword, or the sc
short form, for example, \p{script=Hiragana}
. script
关键字或sc
缩写形式,例如,\p{script=Hiragana}
。Alternatively, you can prefix the script name with the string 或者,您可以使用字符串Is
, such as \p{IsHiragana}
.Is
作为脚本名称的前缀,例如\p{IsHiragana}
。
Valid script names supported by Pattern
are those accepted by UnicodeScript.forName
.Pattern
支持的有效脚本名是UnicodeScript.forName
接受的脚本名。
A block can be specified using the 可以使用block
keyword, or the blk
short form, for example, \p{block=Mongolian}
. block
关键字或blk
缩写形式指定块,例如,\p{block=Mongolian}
。Alternatively, you can prefix the block name with the string 或者,您可以在块名称的前面加上字符串In
, such as \p{InMongolian}
.In
,例如\p{InMongolian}
。
Valid block names supported by Pattern
are those accepted by UnicodeBlock.forName
.Pattern
支持的有效块名是UnicodeBlock.forName
接受的块名。
Categories can be specified with optional prefix 可以使用可选前缀Is
. Is
指定类别。For example, 例如,IsL
matches the category of Unicode letters. IsL
匹配Unicode字母的类别。Categories can also be specified by using the 还可以使用general_category
keyword, or the short form gc
. general_category
关键字或缩写gc
指定类别。For example, an uppercase letter can be matched using 例如,可以使用general_category=Lu
or gc=Lu
.general_category=Lu
或gc=Lu
匹配大写字母。
Supported categories are those of The Unicode Standard in the version specified by the 支持的类别是Character
class.Character
类指定版本中的Unicode标准类别。