Trail: Essential Java Classes
Lesson: Regular Expressions

« Previous • Trail • Next »

~~The Java Tutorials have been written for JDK 8.~~Java教程是为JDK 8编写的。~~Examples and practices described in this page don't take advantage of improvements introduced in later releases and might use technology no longer available.~~本页中描述的示例和实践没有利用后续版本中引入的改进，并且可能使用不再可用的技术。
~~See Java Language Changes for a summary of updated language features in Java SE 9 and subsequent releases.~~有关Java SE 9及其后续版本中更新的语言特性的摘要，请参阅Java语言更改。
~~See JDK Release Notes for information about new features, enhancements, and removed or deprecated options for all JDK releases.~~有关所有JDK版本的新功能、增强功能以及已删除或不推荐的选项的信息，请参阅JDK发行说明。

Unicode SupportUnicode支持

~~As of the JDK 7 release, Regular Expression pattern matching has expanded functionality to support Unicode 6.0.~~从JDK 7版本开始，正则表达式模式匹配已经扩展了支持Unicode 6.0的功能。

~~Matching a Specific Code Point~~匹配特定代码点
~~Unicode Character Properties~~Unicode字符属性

Matching a Specific Code Point匹配特定代码点

~~You can match a specific Unicode code point using an escape sequence of the form \uFFFF, where FFFF is the hexadecimal value of the code point you want to match.~~ 可以使用格式为\uFFFF的转义序列匹配特定的Unicode代码点，其中FFFF是要匹配的代码点的十六进制值。~~For example, \u6771 matches the Han character for east.~~例如，\u6771匹配东方的汉字。

~~Alternatively, you can specify a code point using Perl-style hex notation, \x{...}.~~ 或者，您可以使用Perl风格的十六进制表示法\x{...}指定代码点。~~For example:~~例如：

String hexPattern = "\x{" + Integer.toHexString(codePoint) + "}";

Unicode Character PropertiesUnicode字符属性

~~Each Unicode character, in addition to its value, has certain attributes, or properties.~~ 每个Unicode字符除了其值之外，还具有某些属性。~~You can match a single character belonging to a particular category with the expression \p{prop}.~~ 可以将属于特定类别的单个字符与表达式\p{prop}匹配。~~You can match a single character not belonging to a particular category with the expression \P{prop}.~~可以将不属于特定类别的单个字符与表达式\P{prop}匹配。

~~The three supported property types are scripts, blocks, and a "general" category.~~支持的三种属性类型是脚本、块和“常规”类别。

Scripts脚本

~~To determine if a code point belongs to a specific script, you can either use the script keyword, or the sc short form, for example, \p{script=Hiragana}.~~ 要确定某个代码点是否属于特定脚本，可以使用script关键字或sc缩写形式，例如，\p{script=Hiragana}。~~Alternatively, you can prefix the script name with the string Is, such as \p{IsHiragana}.~~或者，您可以使用字符串Is作为脚本名称的前缀，例如\p{IsHiragana}。

~~Valid script names supported by Pattern are those accepted by UnicodeScript.forName.~~Pattern支持的有效脚本名是UnicodeScript.forName接受的脚本名。

Blocks块

~~A block can be specified using the block keyword, or the blk short form, for example, \p{block=Mongolian}.~~ 可以使用block关键字或blk缩写形式指定块，例如，\p{block=Mongolian}。~~Alternatively, you can prefix the block name with the string In, such as \p{InMongolian}.~~或者，您可以在块名称的前面加上字符串In，例如\p{InMongolian}。

~~Valid block names supported by Pattern are those accepted by UnicodeBlock.forName.~~Pattern支持的有效块名是UnicodeBlock.forName接受的块名。

General Category一般类别

~~Categories can be specified with optional prefix Is.~~ 可以使用可选前缀Is指定类别。~~For example, IsL matches the category of Unicode letters.~~ 例如，IsL匹配Unicode字母的类别。~~Categories can also be specified by using the general_category keyword, or the short form gc.~~ 还可以使用general_category关键字或缩写gc指定类别。~~For example, an uppercase letter can be matched using general_category=Lu or gc=Lu.~~例如，可以使用general_category=Lu或gc=Lu匹配大写字母。

~~Supported categories are those of The Unicode Standard in the version specified by the Character class.~~支持的类别是Character类指定版本中的Unicode标准类别。

« Previous • Trail • Next »