Documentation

The Java™ Tutorials
Hide TOC
Capturing Groups捕获组
Trail: Essential Java Classes
Lesson: Regular Expressions

Capturing Groups捕获组

In the previous section, we saw how quantifiers attach to one character, character class, or capturing group at a time. 上一节中,我们看到了量词如何一次附加到一个字符、字符类或捕获组。But until now, we have not discussed the notion of capturing groups in any detail.但直到现在,我们还没有详细讨论捕获群体的概念。

Capturing groups are a way to treat multiple characters as a single unit. 捕获组是将多个角色视为单个单元的一种方法。They are created by placing the characters to be grouped inside a set of parentheses. 它们是通过将要分组的字符放在一组括号内创建的。For example, the regular expression (dog) creates a single group containing the letters "d" "o" and "g". 例如,正则表达式(dog)创建了单个组,包含了字母"d""o""g"The portion of the input string that matches the capturing group will be saved in memory for later recall via backreferences (as discussed below in the section, Backreferences).与捕获组匹配的输入字符串部分将保存在内存中,以便以后通过反向引用调用(如下文反向引用一节所述)。

Numbering编号

As described in the Pattern API, capturing groups are numbered by counting their opening parentheses from left to right. PatternAPI中所述,捕获组通过从左到右计算其左括号进行编号。In the expression ((A)(B(C))), for example, there are four such groups:例如,在表达式((A)(B(C)))中,有四个这样的组:

  1. ((A)(B(C)))
  2. (A)
  3. (B(C))
  4. (C)

To find out how many groups are present in the expression, call the groupCount method on a matcher object. 要了解表达式中存在多少组,请对matcher对象调用groupCount方法。The groupCount method returns an int showing the number of capturing groups present in the matcher's pattern. groupCount方法返回一个int,显示匹配器模式中存在的捕获组的数量。In this example, groupCount would return the number 4, showing that the pattern contains 4 capturing groups.在本例中,groupCount将返回数字4,表示模式包含4个捕获组。

There is also a special group, group 0, which always represents the entire expression. 还有一个特殊的组,组0,它始终表示整个表达式。This group is not included in the total reported by groupCount. 此组不包括在groupCount报告的总数中。Groups beginning with (? are pure, non-capturing groups that do not capture text and do not count towards the group total. (?开头的组是纯的非捕获组,不捕获文本,不计入组总数。(You'll see examples of non-capturing groups later in the section Methods of the Pattern Class.)(稍后将在模式类的方法一节中看到非捕获组的示例。)

It's important to understand how groups are numbered because some Matcher methods accept an int specifying a particular group number as a parameter:了解组的编号方式很重要,因为某些Matcher方法接受指定特定组编号的int作为参数:

Backreferences反向引用

The section of the input string matching the capturing group(s) is saved in memory for later recall via backreference. 与捕获组匹配的输入字符串部分保存在内存中,以便以后通过反向引用调用。A backreference is specified in the regular expression as a backslash (\) followed by a digit indicating the number of the group to be recalled. 反引用在正则表达式中指定为反斜杠(\),后跟一个数字,表示要调用的组的编号。For example, the expression (\d\d) defines one capturing group matching two digits in a row, which can be recalled later in the expression via the backreference \1.例如,表达式(\d\d)定义了一个与一行中的两个数字匹配的捕获组,稍后可通过反向引用\1在表达式中调用该组。

To match any 2 digits, followed by the exact same two digits, you would use (\d\d)\1 as the regular expression:要匹配任意两位数字,后跟完全相同的两位数字,请使用(\d\d)\1作为正则表达式:

Enter your regex: (\d\d)\1
Enter input string to search: 1212
I found the text "1212" starting at index 0 and ending at index 4.

If you change the last two digits the match will fail:如果更改最后两位数字,匹配将失败:

Enter your regex: (\d\d)\1
Enter input string to search: 1234
No match found.

For nested capturing groups, backreferencing works in exactly the same way: Specify a backslash followed by the number of the group to be recalled.对于嵌套的捕获组,反向引用的工作方式完全相同:指定一个反斜杠,后跟要调用的组的编号。


Previous page: Quantifiers
Next page: Boundary Matchers