Trail: Essential Java Classes
Lesson: Regular Expressions

« Previous • Trail • Next »

~~The Java Tutorials have been written for JDK 8.~~Java教程是为JDK 8编写的。~~Examples and practices described in this page don't take advantage of improvements introduced in later releases and might use technology no longer available.~~本页中描述的示例和实践没有利用后续版本中引入的改进，并且可能使用不再可用的技术。
~~See Java Language Changes for a summary of updated language features in Java SE 9 and subsequent releases.~~有关Java SE 9及其后续版本中更新的语言特性的摘要，请参阅Java语言更改。
~~See JDK Release Notes for information about new features, enhancements, and removed or deprecated options for all JDK releases.~~有关所有JDK版本的新功能、增强功能以及已删除或不推荐的选项的信息，请参阅JDK发行说明。

Quantifiers量词

Quantifiers allow you to specify the number of occurrences to match against. 量词允许您指定要匹配的引用数。~~For convenience, the three sections of the Pattern API specification describing greedy, reluctant, and possessive quantifiers are presented below.~~ 为了方便起见，下面介绍了模式API规范中描述贪婪量词、不情愿量词和所有格量词的三个部分。~~At first glance it may appear that the quantifiers X?, X?? and X?+ do exactly the same thing, since they all promise to match "X, once or not at all".~~ 乍一看，量词X?，X??和X?+做完全相同的事情，因为它们都承诺匹配“X，一次或根本不匹配”。~~There are subtle implementation differences which will be explained near the end of this section.~~有一些细微的实现差异将在本节末尾解释。

~~Greedy~~贪婪的	~~Reluctant~~勉强的	~~Possessive~~所有格	~~Meaning~~含义
`X?`	`X??`	`X?+`	`X`~~, once or not at all~~，一次或不
`X*`	`X*?`	`X*+`	`X`~~, zero or more times~~，零次或多次
`X+`	`X+?`	`X++`	`X`~~, one or more times~~，一次或多次
`X{n}`	`X{n}?`	`X{n}+`	`X`~~, exactly `n` times~~，正好`n`次
`X{n,}`	`X{n,}?`	`X{n,}+`	`X`~~, at least `n` times~~，至少`n`次
`X{n,m}`	`X{n,m}?`	`X{n,m}+`	`X`~~, at least `n` but not more than `m` times~~，至少`n`次但不超过`m`次

~~Let's start our look at greedy quantifiers by creating three different regular expressions: the letter "a" followed by either ?, *, or +.~~ 让我们通过创建三个不同的正则表达式来开始研究贪婪量词：字母“a”后跟?、*、或+。~~Let's see what happens when these expressions are tested against an empty input string "":~~让我们看看在对空输入字符串""测试这些表达式时会发生什么：

Enter your regex: a?
Enter input string to search: 
I found the text "" starting at index 0 and ending at index 0.

Enter your regex: a*
Enter input string to search: 
I found the text "" starting at index 0 and ending at index 0.

Enter your regex: a+
Enter input string to search: 
No match found.

Zero-Length Matches零长度匹配

~~In the above example, the match is successful in the first two cases because the expressions a? and a* both allow for zero occurrences of the letter a.~~ 在上面的示例中，匹配在前两种情况下是成功的，因为表达式a?和a*都允许字母a不出现。~~You'll also notice that the start and end indices are both zero, which is unlike any of the examples we've seen so far.~~ 您还将注意到，开始和结束索引都是零，这与我们迄今为止看到的任何示例都不同。~~The empty input string "" has no length, so the test simply matches nothing at index 0.~~ 空输入字符串""没有长度，因此测试在索引0处不匹配任何内容。~~Matches of this sort are known as a zero-length matches.~~ 这种类型的匹配称为零长度匹配。A zero-length match can occur in several cases: in an empty input string, at the beginning of an input string, after the last character of an input string, or in between any two characters of an input string. 零长度匹配可以在几种情况下发生：在空输入字符串中、在输入字符串的开头、在输入字符串的最后一个字符之后，或者在输入字符串的任意两个字符之间。~~Zero-length matches are easily identifiable because they always start and end at the same index position.~~零长度匹配很容易识别，因为它们总是在同一索引位置开始和结束。

~~Let's explore zero-length matches with a few more examples.~~ 让我们再举几个例子来探讨零长度匹配。~~Change the input string to a single letter "a" and you'll notice something interesting:~~将输入字符串更改为单个字母“a”，您会注意到一些有趣的事情：

Enter your regex: a?
Enter input string to search: a
I found the text "a" starting at index 0 and ending at index 1.
I found the text "" starting at index 1 and ending at index 1.

Enter your regex: a*
Enter input string to search: a
I found the text "a" starting at index 0 and ending at index 1.
I found the text "" starting at index 1 and ending at index 1.

Enter your regex: a+
Enter input string to search: a
I found the text "a" starting at index 0 and ending at index 1.

~~All three quantifiers found the letter "a", but the first two also found a zero-length match at index 1; that is, after the last character of the input string.~~ 所有三个量词都找到了字母“a”，但前两个也在索引1处找到了零长度匹配；也就是说，在输入字符串的最后一个字符之后。~~Remember, the matcher sees the character "a" as sitting in the cell between index 0 and index 1, and our test harness loops until it can no longer find a match.~~ 请记住，匹配器将字符“a”视为位于索引0和索引1之间的单元格中，测试线束将循环，直到无法再找到匹配为止。~~Depending on the quantifier used, the presence of "nothing" at the index after the last character may or may not trigger a match.~~根据使用的量词，在最后一个字符后的索引中出现“nothing”可能会触发匹配，也可能不会触发匹配。

~~Now change the input string to the letter "a" five times in a row and you'll get the following:~~现在，将输入字符串更改为连续五个字母“a”，您将得到以下结果：

Enter your regex: a?
Enter input string to search: aaaaa
I found the text "a" starting at index 0 and ending at index 1.
I found the text "a" starting at index 1 and ending at index 2.
I found the text "a" starting at index 2 and ending at index 3.
I found the text "a" starting at index 3 and ending at index 4.
I found the text "a" starting at index 4 and ending at index 5.
I found the text "" starting at index 5 and ending at index 5.

Enter your regex: a*
Enter input string to search: aaaaa
I found the text "aaaaa" starting at index 0 and ending at index 5.
I found the text "" starting at index 5 and ending at index 5.

Enter your regex: a+
Enter input string to search: aaaaa
I found the text "aaaaa" starting at index 0 and ending at index 5.

~~The expression a? finds an individual match for each character, since it matches when "a" appears zero or one times.~~ 表达式a?查找每个字符的单个匹配项，因为它在“a”出现零次或一次时匹配。~~The expression a* finds two separate matches: all of the letter "a"'s in the first match, then the zero-length match after the last character at index 5.~~ 表达式a*查找两个单独的匹配：第一个匹配中的所有字母“a”，然后是索引5中最后一个字符后的零长度匹配。~~And finally, a+ matches all occurrences of the letter "a", ignoring the presence of "nothing" at the last index.~~最后，a+匹配字母“a”的所有匹配项，忽略最后一个索引中是否存在“nothing”。

~~At this point, you might be wondering what the results would be if the first two quantifiers encounter a letter other than "a".~~ 此时，您可能想知道，如果前两个量词遇到的不是“a”的字母，结果会是什么。~~For example, what happens if it encounters the letter "b", as in "ababaaaab"?~~例如，如果遇到字母“b”，如“ababaaab”，会发生什么？

~~Let's find out:~~让我们来了解一下：

Enter your regex: a?
Enter input string to search: ababaaaab
I found the text "a" starting at index 0 and ending at index 1.
I found the text "" starting at index 1 and ending at index 1.
I found the text "a" starting at index 2 and ending at index 3.
I found the text "" starting at index 3 and ending at index 3.
I found the text "a" starting at index 4 and ending at index 5.
I found the text "a" starting at index 5 and ending at index 6.
I found the text "a" starting at index 6 and ending at index 7.
I found the text "a" starting at index 7 and ending at index 8.
I found the text "" starting at index 8 and ending at index 8.
I found the text "" starting at index 9 and ending at index 9.

Enter your regex: a*
Enter input string to search: ababaaaab
I found the text "a" starting at index 0 and ending at index 1.
I found the text "" starting at index 1 and ending at index 1.
I found the text "a" starting at index 2 and ending at index 3.
I found the text "" starting at index 3 and ending at index 3.
I found the text "aaaa" starting at index 4 and ending at index 8.
I found the text "" starting at index 8 and ending at index 8.
I found the text "" starting at index 9 and ending at index 9.

Enter your regex: a+
Enter input string to search: ababaaaab
I found the text "a" starting at index 0 and ending at index 1.
I found the text "a" starting at index 2 and ending at index 3.
I found the text "aaaa" starting at index 4 and ending at index 8.

~~Even though the letter "b" appears in cells 1, 3, and 8, the output reports a zero-length match at those locations.~~ 即使字母“b”出现在单元格1、3和8中，输出报告这些位置的长度匹配为零。~~The regular expression a? is not specifically looking for the letter "b"; it's merely looking for the presence (or lack thereof) of the letter "a".~~ 正则表达式a?不是专门寻找字母“b”；它只是寻找字母“a”的存在（或缺乏）。~~If the quantifier allows for a match of "a" zero times, anything in the input string that's not an "a" will show up as a zero-length match.~~ 如果量词允许零次匹配“a”，则输入字符串中任何非“a”的内容都将显示为零长度匹配。~~The remaining a's are matched according to the rules discussed in the previous examples.~~剩余的a将根据前面示例中讨论的规则进行匹配。

~~To match a pattern exactly n number of times, simply specify the number inside a set of braces:~~要精确匹配一个模式n次，只需在一组大括号内指定数字：

Enter your regex: a{3}
Enter input string to search: aa
No match found.

Enter your regex: a{3}
Enter input string to search: aaa
I found the text "aaa" starting at index 0 and ending at index 3.

Enter your regex: a{3}
Enter input string to search: aaaa
I found the text "aaa" starting at index 0 and ending at index 3.

~~Here, the regular expression a{3} is searching for three occurrences of the letter "a" in a row.~~ 这里，正则表达式a{3}正在搜索一行中出现的三个字母“a”。~~The first test fails because the input string does not have enough a's to match against.~~ 第一个测试失败，因为输入字符串没有足够的a来匹配。~~The second test contains exactly 3 a's in the input string, which triggers a match.~~ 第二个测试在输入字符串中正好包含3个a，这将触发匹配。~~The third test also triggers a match because there are exactly 3 a's at the beginning of the input string.~~ 第三个测试也会触发匹配，因为输入字符串的开头正好有3个a。~~Anything following that is irrelevant to the first match.~~ 任何与第一个匹配无关的事情。~~If the pattern should appear again after that point, it would trigger subsequent matches:~~如果该模式在该点之后再次出现，将触发后续匹配：

Enter your regex: a{3}
Enter input string to search: aaaaaaaaa
I found the text "aaa" starting at index 0 and ending at index 3.
I found the text "aaa" starting at index 3 and ending at index 6.
I found the text "aaa" starting at index 6 and ending at index 9.

~~To require a pattern to appear at least n times, add a comma after the number:~~若要求模式至少出现n次，请在数字后添加逗号：

Enter your regex: a{3,}
Enter input string to search: aaaaaaaaa
I found the text "aaaaaaaaa" starting at index 0 and ending at index 9.

~~With the same input string, this test finds only one match, because the 9 a's in a row satisfy the need for "at least" 3 a's.~~对于相同的输入字符串，此测试只找到一个匹配项，因为一行中的9个a满足“至少”3个a的需要。

~~Finally, to specify an upper limit on the number of occurrences, add a second number inside the braces:~~最后，要指定引用次数的上限，请在大括号内添加第二个数字：

Enter your regex: a{3,6} // find at least 3 (but no more than 6) a's in a row
Enter input string to search: aaaaaaaaa
I found the text "aaaaaa" starting at index 0 and ending at index 6.
I found the text "aaa" starting at index 6 and ending at index 9.

~~Here the first match is forced to stop at the upper limit of 6 characters.~~ 在这里，第一个匹配将被迫在6个字符的上限处停止。~~The second match includes whatever is left over, which happens to be three a's — the minimum number of characters allowed for this match.~~ 第二个匹配包括剩下的匹配，正好是三个a—此匹配允许的最小字符数。~~If the input string were one character shorter, there would not be a second match since only two a's would remain.~~如果输入字符串短一个字符，就不会有第二个匹配，因为只剩下两个a。

Capturing Groups and Character Classes with Quantifiers使用量词捕获组和字符类

~~Until now, we've only tested quantifiers on input strings containing one character.~~ 到目前为止，我们只在包含一个字符的输入字符串上测试了量词。~~In fact, quantifiers can only attach to one character at a time, so the regular expression "abc+" would mean "a, followed by b, followed by c one or more times".~~ 事实上，量词一次只能附加到一个字符，因此正则表达式“abc+”的意思是“a，后跟b，后跟c一次或多次”。~~It would not mean "abc" one or more times.~~ 它不会表示一次或多次“abc”。~~However, quantifiers can also attach to Character Classes and Capturing Groups, such as [abc]+ (a or b or c, one or more times) or (abc)+ (the group "abc", one or more times).~~但是，量词也可以附加到字符类和捕获组，例如[abc]+（a或b或c，一次或多次）或(abc)+（组“abc”，一次或多次）。

~~Let's illustrate by specifying the group (dog), three times in a row.~~让我们通过指定组(dog)（一行中三次）来进行说明。

Enter your regex: (dog){3}
Enter input string to search: dogdogdogdogdogdog
I found the text "dogdogdog" starting at index 0 and ending at index 9.
I found the text "dogdogdog" starting at index 9 and ending at index 18.

Enter your regex: dog{3}
Enter input string to search: dogdogdogdogdogdog
No match found.

~~Here the first example finds three matches, since the quantifier applies to the entire capturing group.~~ 这里，第一个示例找到三个匹配项，因为量词应用于整个捕获组。~~Remove the parentheses, however, and the match fails because the quantifier {3} now applies only to the letter "g".~~但是，去掉括号，匹配就会失败，因为量词{3}现在只适用于字母“g”。

~~Similarly, we can apply a quantifier to an entire character class:~~类似地，我们可以将量词应用于整个字符类：

Enter your regex: [abc]{3}
Enter input string to search: abccabaaaccbbbc
I found the text "abc" starting at index 0 and ending at index 3.
I found the text "cab" starting at index 3 and ending at index 6.
I found the text "aaa" starting at index 6 and ending at index 9.
I found the text "ccb" starting at index 9 and ending at index 12.
I found the text "bbc" starting at index 12 and ending at index 15.

Enter your regex: abc{3}
Enter input string to search: abccabaaaccbbbc
No match found.

~~Here the quantifier {3} applies to the entire character class in the first example, but only to the letter "c" in the second.~~ 这里，量词{3}适用于第一个示例中的整个字符类，但仅适用于第二个示例中的字母“c”。

Differences Among Greedy, Reluctant, and Possessive Quantifiers贪婪量词、不情愿量词和所有格量词之间的差异

~~There are subtle differences among greedy, reluctant, and possessive quantifiers.~~贪婪、勉强和占有量词之间存在细微的差异。

~~Greedy quantifiers are considered "greedy" because they force the matcher to read in, or eat, the entire input string prior to attempting the first match.~~ 贪婪量词被认为是“贪婪的”，因为它们迫使匹配器在尝试第一次匹配之前读入或吃掉整个输入字符串。If the first match attempt (the entire input string) fails, the matcher backs off the input string by one character and tries again, repeating the process until a match is found or there are no more characters left to back off from. 如果第一次匹配尝试（整个输入字符串）失败，则匹配器会将输入字符串后退一个字符，然后重试，重复此过程，直到找到匹配项或没有其他字符可后退。~~Depending on the quantifier used in the expression, the last thing it will try matching against is 1 or 0 characters.~~根据表达式中使用的量词，它将尝试匹配的最后一个字符是1或0个字符。

~~The reluctant quantifiers, however, take the opposite approach: They start at the beginning of the input string, then reluctantly eat one character at a time looking for a match.~~ 然而，不情愿的量词采取了相反的方法：它们从输入字符串的开头开始，然后不情愿地一次吃掉一个字符来寻找匹配项。~~The last thing they try is the entire input string.~~他们最后尝试的是整个输入字符串。

~~Finally, the possessive quantifiers always eat the entire input string, trying once (and only once) for a match.~~ 最后，所有格量词总是吃掉整个输入字符串，尝试一次（而且只尝试一次）匹配。~~Unlike the greedy quantifiers, possessive quantifiers never back off, even if doing so would allow the overall match to succeed.~~与贪婪的量词不同，所有格量词从不后退，即使这样做会让整体匹配成功。

~~To illustrate, consider the input string xfooxxxxxxfoo.~~为了说明，请考虑输入字符串xfooxxxxxxfoo。

Enter your regex: .*foo  // greedy quantifier
Enter input string to search: xfooxxxxxxfoo
I found the text "xfooxxxxxxfoo" starting at index 0 and ending at index 13.

Enter your regex: .*?foo  // reluctant quantifier
Enter input string to search: xfooxxxxxxfoo
I found the text "xfoo" starting at index 0 and ending at index 4.
I found the text "xxxxxxfoo" starting at index 4 and ending at index 13.

Enter your regex: .*+foo // possessive quantifier
Enter input string to search: xfooxxxxxxfoo
No match found.

~~The first example uses the greedy quantifier .* to find "anything", zero or more times, followed by the letters "f" "o" "o".~~ 第一个示例使用贪婪量词.*查找“任何东西”，零次或多次，后跟字母“f”“o”“o”。~~Because the quantifier is greedy, the .* portion of the expression first eats the entire input string.~~ 因为量词是贪婪的，所以表达式的.*部分首先吃掉整个输入字符串。~~At this point, the overall expression cannot succeed, because the last three letters ("f" "o" "o") have already been consumed.~~ 此时，整个表达式不可能成功，因为最后三个字母（“f”“o”“o”）已被使用。~~So the matcher slowly backs off one letter at a time until the rightmost occurrence of "foo" has been regurgitated, at which point the match succeeds and the search ends.~~因此，匹配者一次慢慢地后退一个字母，直到最右边出现的“foo”被反刍，此时匹配成功，搜索结束。

~~The second example, however, is reluctant, so it starts by first consuming "nothing".~~ 然而，第二个例子是勉强的，因此它首先消耗“nothing”。~~Because "foo" doesn't appear at the beginning of the string, it's forced to swallow the first letter (an "x"), which triggers the first match at 0 and 4.~~ 因为“foo”不出现在字符串的开头，所以它被迫吞下第一个字母（“x”），这将在0和4处触发第一个匹配。~~Our test harness continues the process until the input string is exhausted.~~ 我们的测试工具将继续这个过程，直到输入字符串耗尽为止。~~It finds another match at 4 and 13.~~它在4和13处找到另一个匹配项。

~~The third example fails to find a match because the quantifier is possessive.~~ 第三个例子找不到匹配项，因为量词是所有格。~~In this case, the entire input string is consumed by .*+, leaving nothing left over to satisfy the "foo" at the end of the expression.~~ 在本例中，整个输入字符串由.*+使用，不留下任何剩余内容来满足表达式末尾的“foo”。Use a possessive quantifier for situations where you want to seize all of something without ever backing off; it will outperform the equivalent greedy quantifier in cases where the match is not immediately found.使用所有格量词来表示你想抓住所有东西而不后退的情况；在没有立即找到匹配项的情况下，它将优于等价的贪婪量词。

« Previous • Trail • Next »