Trail: Essential Java Classes
Lesson: Regular Expressions

« Previous • Trail • Next »

~~The Java Tutorials have been written for JDK 8.~~Java教程是为JDK 8编写的。~~Examples and practices described in this page don't take advantage of improvements introduced in later releases and might use technology no longer available.~~本页中描述的示例和实践没有利用后续版本中引入的改进，并且可能使用不再可用的技术。
~~See Java Language Changes for a summary of updated language features in Java SE 9 and subsequent releases.~~有关Java SE 9及其后续版本中更新的语言特性的摘要，请参阅Java语言更改。
~~See JDK Release Notes for information about new features, enhancements, and removed or deprecated options for all JDK releases.~~有关所有JDK版本的新功能、增强功能以及已删除或不推荐的选项的信息，请参阅JDK发行说明。

Methods of the Pattern Class模式类的方法

~~Until now, we've only used the test harness to create Pattern objects in their most basic form.~~ 到目前为止，我们只使用测试工具以最基本的形式创建Pattern对象。~~This section explores advanced techniques such as creating patterns with flags and using embedded flag expressions.~~ 本节探讨高级技术，例如使用标志创建模式和使用嵌入式标志表达式。~~It also explores some additional useful methods that we haven't yet discussed.~~它还探索了一些我们尚未讨论的其他有用方法。

Creating a Pattern with Flags创建带有标志的模式

~~The Pattern class defines an alternate compile method that accepts a set of flags affecting the way the pattern is matched.~~ Pattern类定义了一个替代compile方法，该方法接受一组影响模式匹配方式的标志。~~The flags parameter is a bit mask that may include any of the following public static fields:~~flags参数是一个位掩码，可包括以下任何公共静态字段：

Pattern.CANON_EQ ~~Enables canonical equivalence.~~ 启用规范等价性。~~When this flag is specified, two characters will be considered to match if, and only if, their full canonical decompositions match.~~ 指定此标志后，当且仅当两个字符的完整规范分解匹配时，才会将其视为匹配。~~The expression "a\u030A", for example, will match the string "\u00E5" when this flag is specified.~~例如，当指定此标志时，表达式"a\u030A"将与字符串"\u00E5"匹配。 ~~By default, matching does not take canonical equivalence into account.~~ 默认情况下，匹配不考虑规范等价性。~~Specifying this flag may impose a performance penalty.~~指定此标志可能会造成性能损失。
Pattern.CASE_INSENSITIVE ~~Enables case-insensitive matching.~~ 启用不区分大小写的匹配。~~By default, case-insensitive matching assumes that only characters in the US-ASCII charset are being matched.~~ 默认情况下，不区分大小写的匹配假定仅匹配US-ASCII字符集中的字符。~~Unicode-aware case-insensitive matching can be enabled by specifying the UNICODE_CASE flag in conjunction with this flag.~~ 可以通过将UNICODE_CASE标志与此标志一起指定来启用支持Unicode的不区分大小写匹配。~~Case-insensitive matching can also be enabled via the embedded flag expression (?i).~~ 也可以通过嵌入的标志表达式(?i)启用不区分大小写的匹配。~~Specifying this flag may impose a slight performance penalty.~~指定此标志可能会造成轻微的性能损失。
Pattern.COMMENTS ~~Permits whitespace and comments in the pattern.~~ 允许在模式中使用空格和注释。~~In this mode, whitespace is ignored, and embedded comments starting with # are ignored until the end of a line.~~ 在这种模式下，空白被忽略，以#开头的嵌入注释被忽略，直到行尾。~~Comments mode can also be enabled via the embedded flag expression (?x).~~还可以通过嵌入式标志表达式(?x)启用注释模式。
Pattern.DOTALL ~~Enables dotall mode.~~ 启用dotall模式。~~In dotall mode, the expression . matches any character, including a line terminator.~~ 在dotall模式下，表达式.匹配任何字符，包括行终止符。~~By default this expression does not match line terminators.~~ 默认情况下，此表达式与行终止符不匹配。~~Dotall mode can also be enabled via the embedded flag expression (?s).~~ 也可以通过嵌入的标志表达式(?s)启用Dotall模式。~~(The s is a mnemonic for "single-line" mode, which is what this is called in Perl.)~~（s是“单行”模式的助记符，在Perl中就是这样称呼的。）
Pattern.LITERAL ~~Enables literal parsing of the pattern.~~ 启用模式的文字分析。~~When this flag is specified then the input string that specifies the pattern is treated as a sequence of literal characters.~~ 指定此标志后，指定模式的输入字符串将被视为一系列文字字符。~~Metacharacters or escape sequences in the input sequence will be given no special meaning.~~ 输入序列中的元字符或转义序列没有特殊意义。~~The flags CASE_INSENSITIVE and UNICODE_CASE retain their impact on matching when used in conjunction with this flag.~~ 与此标志结合使用时，标志CASE_INSENSITIVE和UNICODE_CASE保留其对匹配的影响。~~The other flags become superfluous.~~ 其他的标志变得多余了。~~There is no embedded flag character for enabling literal parsing.~~没有用于启用文字分析的嵌入标志字符。
Pattern.MULTILINE ~~Enables multiline mode.~~ 启用多行模式。~~In multiline mode the expressions ^ and $ match just after or just before, respectively, a line terminator or the end of the input sequence.~~ 在多行模式下，表达式^和$分别在行终止符或输入序列结尾之后或之前匹配。~~By default these expressions only match at the beginning and the end of the entire input sequence.~~ 默认情况下，这些表达式仅在整个输入序列的开头和结尾匹配。~~Multiline mode can also be enabled via the embedded flag expression (?m).~~多行模式也可以通过嵌入式标志表达式(?m)启用。
Pattern.UNICODE_CASE ~~Enables Unicode-aware case folding.~~ 启用支持Unicode的大小写折叠。~~When this flag is specified then case-insensitive matching, when enabled by the CASE_INSENSITIVE flag, is done in a manner consistent with the Unicode Standard.~~ 如果指定了此标志，则在由CASE_INSENSITIVE标志启用时，将以与Unicode标准一致的方式进行不区分大小写的匹配。~~By default, case-insensitive matching assumes that only characters in the US-ASCII charset are being matched.~~ 默认情况下，不区分大小写的匹配假定仅匹配US-ASCII字符集中的字符。~~Unicode-aware case folding can also be enabled via the embedded flag expression (?u).~~ 还可以通过嵌入的标志表达式(?u)启用支持Unicode的大小写折叠。~~Specifying this flag may impose a performance penalty.~~指定此标志可能会造成性能损失。
Pattern.UNIX_LINES ~~Enables UNIX lines mode.~~ 启用UNIX行模式。~~In this mode, only the '\n' line terminator is recognized in the behavior of ., ^, and $.~~ 在此模式中，在.、^和$的行为中仅识别'\n'行终止符。~~UNIX lines mode can also be enabled via the embedded flag expression (?d).~~UNIX行模式也可以通过嵌入的标志表达式(?d)启用。

~~In the following steps we will modify the test harness, RegexTestHarness.java to create a pattern with case-insensitive matching.~~在以下步骤中，我们将修改测试工具RegexTestHarness.java，以创建具有不区分大小写匹配的模式。

~~First, modify the code to invoke the alternate version of compile:~~首先，修改代码以调用compile的备用版本：

Pattern pattern = 
Pattern.compile(console.readLine("%nEnter your regex: "),
Pattern.CASE_INSENSITIVE);

~~Then compile and run the test harness to get the following results:~~然后编译并运行测试线束以获得以下结果：

Enter your regex: dog
Enter input string to search: DoGDOg
I found the text "DoG" starting at index 0 and ending at index 3.
I found the text "DOg" starting at index 3 and ending at index 6.

~~As you can see, the string literal "dog" matches both occurences, regardless of case.~~ 如您所见，字符串文字“dog”匹配这两种情况，不管大小写如何。~~To compile a pattern with multiple flags, separate the flags to be included using the bitwise OR operator "|".~~ 要使用多行标志编译模式，请分隔标志以包含，方法是使用按位或运算符|。~~For clarity, the following code samples hardcode the regular expression instead of reading it from the Console:~~为清楚起见，以下代码示例对正则表达式进行硬编码，而不是从控制台读取：

pattern = Pattern.compile("[az]$", Pattern.MULTILINE | Pattern.UNIX_LINES);

~~You could also specify an int variable instead:~~也可以指定一个int变量：

final int flags = Pattern.CASE_INSENSITIVE | Pattern.UNICODE_CASE;
Pattern pattern = Pattern.compile("aa", flags);

Embedded Flag Expressions嵌入的标志表达式

~~It's also possible to enable various flags using embedded flag expressions.~~ 还可以使用嵌入的标志表达式启用各种标志。~~Embedded flag expressions are an alternative to the two-argument version of compile, and are specified in the regular expression itself.~~ 嵌入式标志表达式是compile的两参数版本的替代，并在正则表达式本身中指定。~~The following example uses the original test harness, RegexTestHarness.java with the embedded flag expression (?i) to enable case-insensitive matching.~~下面的示例使用原始测试工具RegexTestHarness.java和嵌入的标志表达式(?i)来启用不区分大小写的匹配。

Enter your regex: (?i)foo
Enter input string to search: FOOfooFoOfoO
I found the text "FOO" starting at index 0 and ending at index 3.
I found the text "foo" starting at index 3 and ending at index 6.
I found the text "FoO" starting at index 6 and ending at index 9.
I found the text "foO" starting at index 9 and ending at index 12.

~~Once again, all matches succeed regardless of case.~~再一次，无论情况如何，所有匹配都会成功。

~~The embedded flag expressions that correspond to Pattern's publicly accessible fields are presented in the following table:~~下表显示了与Pattern的公共可访问字段相对应的嵌入式标志表达式：

~~Constant~~常数	~~Equivalent Embedded Flag Expression~~等价嵌入标志表达式
`Pattern.CANON_EQ`	~~None~~无
`Pattern.CASE_INSENSITIVE`	`(?i)`
`Pattern.COMMENTS`	`(?x)`
`Pattern.MULTILINE`	`(?m)`
`Pattern.DOTALL`	`(?s)`
`Pattern.LITERAL`	~~None~~无
`Pattern.UNICODE_CASE`	`(?u)`
`Pattern.UNIX_LINES`	`(?d)`

Using the `matches(String,CharSequence)` Method使用`matches(String,CharSequence)`方法

~~The Pattern class defines a convenient matches method that allows you to quickly check if a pattern is present in a given input string.~~ Pattern类定义了一个方便的matches方法，允许您快速检查给定输入字符串中是否存在模式。~~As with all public static methods, you should invoke matches by its class name, such as Pattern.matches("\\d","1");.~~ 与所有公共静态方法一样，您应该通过其类名调用matches，例如Pattern.matches("\\d","1");。~~In this example, the method returns true, because the digit "1" matches the regular expression \d.~~在本例中，该方法返回true，因为数字“1”与正则表达式\d匹配。

Using the `split(String)` Method使用`split(String)`方法

~~The split method is a great tool for gathering the text that lies on either side of the pattern that's been matched.~~ split方法是收集匹配模式两侧的文本的一个很好的工具。~~As shown below in SplitDemo.java, the split method could extract the words "one two three four five" from the string "one:two:three:four:five":~~如SplitDemo.java中所示，split方法可以从字符串"one:two:three:four:five"中提取单词"one"、"two"、"three"、"four"、"five"：

import java.util.regex.Pattern;
import java.util.regex.Matcher;

public class SplitDemo {

    private static final String REGEX = ":";
    private static final String INPUT =
        "one:two:three:four:five";
    
    public static void main(String[] args) {
        Pattern p = Pattern.compile(REGEX);
        String[] items = p.split(INPUT);
        for(String s : items) {
            System.out.println(s);
        }
    }
}

OUTPUT:

one
two
three
four
five

~~For simplicity, we've matched a string literal, the colon (:) instead of a complex regular expression.~~ 为简单起见，我们匹配了字符串文字，冒号（:）而不是复杂的正则表达式。~~Since we're still using Pattern and Matcher objects, you can use split to get the text that falls on either side of any regular expression.~~ 因为我们仍然使用Pattern和Matcher对象，所以可以使用拆分来获取任何正则表达式两侧的文本。~~Here's the same example, SplitDemo2.java, modified to split on digits instead:~~下面是相同的示例SplitDemo2.java，改为按数字拆分：

import java.util.regex.Pattern;
import java.util.regex.Matcher;

public class SplitDemo2 {

    private static final String REGEX = "\\d";
    private static final String INPUT =
        "one9two4three7four1five";

    public static void main(String[] args) {
        Pattern p = Pattern.compile(REGEX);
        String[] items = p.split(INPUT);
        for(String s : items) {
            System.out.println(s);
        }
    }
}

OUTPUT:

one
two
three
four
five

Other Utility Methods其他实用方法

~~You may find the following methods to be of some use as well:~~您可能会发现以下方法也有一些用处：

public static String quote(String s) ~~Returns a literal pattern String for the specified String.~~ 返回指定String的文本模式String。~~This method produces a String that can be used to create a Pattern that would match String s as if it were a literal pattern.~~ 此方法生成一个String，可用于创建与String s匹配的Pattern，就像它是一个文本模式一样。~~Metacharacters or escape sequences in the input sequence will be given no special meaning.~~输入序列中的元字符或转义序列没有特殊意义。
public String toString() ~~Returns the String representation of this pattern.~~ 返回此模式的String表示形式。~~This is the regular expression from which this pattern was compiled.~~这是从中编译此模式的正则表达式。

Pattern Method Equivalents in `java.lang.String``java.lang.String`中的模式方法等价物

~~Regular expression support also exists in java.lang.String through several methods that mimic the behavior of java.util.regex.Pattern.~~ 通过模拟java.util.regex.Pattern行为的几种方法，java.lang.String中也存在正则表达式支持。~~For convenience, key excerpts from their API are presented below.~~为方便起见，下面提供了API的关键摘录。

public boolean matches(String regex)~~: Tells whether or not this string matches the given regular expression.~~ ：说明此字符串是否与给定的正则表达式匹配。~~An invocation of this method of the form str.matches(regex) yields exactly the same result as the expression Pattern.matches(regex, str).~~调用str.matches(regex)形式的此方法会产生与表达式Pattern.matches(regex, str)完全相同的结果。
public String[] split(String regex, int limit)~~: Splits this string around matches of the given regular expression.~~ ：围绕给定正则表达式的匹配项拆分此字符串。~~An invocation of this method of the form str.split(regex, n) yields the same result as the expression Pattern.compile(regex).split(str, n)~~调用str.split(regex, n)形式的此方法会产生与表达式Pattern.compile(regex).split(str, n)相同的结果
public String[] split(String regex)~~: Splits this string around matches of the given regular expression.~~ ：围绕给定正则表达式的匹配项拆分此字符串。~~This method works the same as if you invoked the two-argument split method with the given expression and a limit argument of zero.~~ 此方法的工作原理与使用给定表达式和零限制参数调用双参数拆分方法的工作原理相同。~~Trailing empty strings are not included in the resulting array.~~结果数组中不包括尾随空字符串。

~~There is also a replace method, that replaces one CharSequence with another:~~还有一种替换方法，可以用一个CharSequence替换另一个：

public String replace(CharSequence target,CharSequence replacement)~~: Replaces each substring of this string that matches the literal target sequence with the specified literal replacement sequence.~~ ：将此字符串中与文字目标序列匹配的每个子字符串替换为指定的文字替换序列。~~The replacement proceeds from the beginning of the string to the end, for example, replacing "aa" with "b" in the string "aaa" will result in "ba" rather than "ab".~~替换从字符串的开头一直进行到结尾，例如，将字符串“aaa”中的“aa”替换为“b”将导致“ba”而不是“ab”。