Documentation

The Java™ Tutorials
Hide TOC
Character Classes字符类
Trail: Essential Java Classes
Lesson: Regular Expressions

Character Classes字符类

If you browse through the Pattern class specification, you'll see tables summarizing the supported regular expression constructs. 如果浏览Pattern类规范,您将看到汇总支持的正则表达式构造的表。In the "Character Classes" section you'll find the following:在“字符类”部分,您将看到以下内容:

Construct构造 Description描述
[abc] a, b, or c (simple class)a、 b或c(简单类)
[^abc] Any character except a, b, or c (negation)除a、b或c以外的任何字符(否定)
[a-zA-Z] a through z, or A through Z, inclusive (range)a到z,或A到Z,包括在内(并集)
[a-d[m-p]] a through d, or m through p: [a-dm-p] (union)a到d,或m到p:[a-dm-p](联合)
[a-z&&[def]] d, e, or f (intersection)d、 e或f(交集)
[a-z&&[^bc]] a through z, except for b and c: [ad-z] (subtraction)a到z,b和c除外:[ad-z](差集)
[a-z&&[^m-p]] a through z, and not m through p: [a-lq-z] (subtraction)a到z,而不是m到p:[a-lq-z](差集)

The left-hand column specifies the regular expression constructs, while the right-hand column describes the conditions under which each construct will match.左侧列指定正则表达式构造,而右侧列描述每个构造将匹配的条件。


Note: The word "class" in the phrase "character class" does not refer to a .class file. 短语“character class”中的“class”一词不是指.class文件。In the context of regular expressions, a character class is a set of characters enclosed within square brackets. 在正则表达式的上下文中,字符类是括在方括号内的一组字符。It specifies the characters that will successfully match a single character from a given input string. 它指定将成功匹配给定输入字符串中单个字符的字符。

Simple Classes简单类

The most basic form of a character class is to simply place a set of characters side-by-side within square brackets. 字符类最基本的形式是将一组字符并排放在方括号内。For example, the regular expression [bcr]at will match the words "bat", "cat", or "rat" because it defines a character class (accepting either "b", "c", or "r") as its first character.例如,的正则表达式[bcr]at将匹配单词“bat”、“cat”或“rat”,因为它定义了一个字符类(接受“b”、“c”或“r”)作为其第一个字符。

Enter your regex: [bcr]at
Enter input string to search: bat
I found the text "bat" starting at index 0 and ending at index 3.

Enter your regex: [bcr]at
Enter input string to search: cat
I found the text "cat" starting at index 0 and ending at index 3.

Enter your regex: [bcr]at
Enter input string to search: rat
I found the text "rat" starting at index 0 and ending at index 3.

Enter your regex: [bcr]at
Enter input string to search: hat
No match found.

In the above examples, the overall match succeeds only when the first letter matches one of the characters defined by the character class.在上面的示例中,仅当第一个字母与character类定义的字符之一匹配时,整体匹配才会成功。

Negation否定

To match all characters except those listed, insert the "^" metacharacter at the beginning of the character class. 要匹配除了列出的字符外的所有字符,请在字符类的开头插入“^”元字符。This technique is known as negation.这种技巧被称为否定

Enter your regex: [^bcr]at
Enter input string to search: bat
No match found.

Enter your regex: [^bcr]at
Enter input string to search: cat
No match found.

Enter your regex: [^bcr]at
Enter input string to search: rat
No match found.

Enter your regex: [^bcr]at
Enter input string to search: hat
I found the text "hat" starting at index 0 and ending at index 3.

The match is successful only if the first character of the input string does not contain any of the characters defined by the character class.仅当输入字符串的第一个字符不包含字符类定义的任何字符时,匹配才会成功。

Ranges范围

Sometimes you'll want to define a character class that includes a range of values, such as the letters "a through h" or the numbers "1 through 5". 有时,您需要定义一个包含一系列值的字符类,例如字母“a到h”或数字“1到5”。To specify a range, simply insert the "-" metacharacter between the first and last character to be matched, such as [1-5] or [a-h]. 要指定范围,只需在要匹配的第一个和最后一个字符之间插入“-”元字符,如[1-5]或[a-h]。You can also place different ranges beside each other within the class to further expand the match possibilities. 您还可以在类中将不同的范围并排放置,以进一步扩展匹配的可能性。For example, [a-zA-Z] will match any letter of the alphabet: a to z (lowercase) or A to Z (uppercase).例如,[a-zA-Z]将匹配字母表中的任何字母:a到z(小写)或A到Z(大写)。

Here are some examples of ranges and negation:下面是一些范围和否定的示例:

Enter your regex: [a-c]
Enter input string to search: a
I found the text "a" starting at index 0 and ending at index 1.

Enter your regex: [a-c]
Enter input string to search: b
I found the text "b" starting at index 0 and ending at index 1.

Enter your regex: [a-c]
Enter input string to search: c
I found the text "c" starting at index 0 and ending at index 1.

Enter your regex: [a-c]
Enter input string to search: d
No match found.

Enter your regex: foo[1-5]
Enter input string to search: foo1
I found the text "foo1" starting at index 0 and ending at index 4.

Enter your regex: foo[1-5]
Enter input string to search: foo5
I found the text "foo5" starting at index 0 and ending at index 4.

Enter your regex: foo[1-5]
Enter input string to search: foo6
No match found.

Enter your regex: foo[^1-5]
Enter input string to search: foo1
No match found.

Enter your regex: foo[^1-5]
Enter input string to search: foo6
I found the text "foo6" starting at index 0 and ending at index 4.

Unions并集

You can also use unions to create a single character class comprised of two or more separate character classes. 还可以使用并集创建由两个或多个单独的字符类组成的单个字符类。To create a union, simply nest one class inside the other, such as [0-4[6-8]]. 要创建并集,只需将一个类嵌套在另一个类中,例如[0-4[6-8]]This particular union creates a single character class that matches the numbers 0, 1, 2, 3, 4, 6, 7, and 8.这个特定的并集创建一个与数字0、1、2、3、4、6、7和8匹配的单个字符类。

Enter your regex: [0-4[6-8]]
Enter input string to search: 0
I found the text "0" starting at index 0 and ending at index 1.

Enter your regex: [0-4[6-8]]
Enter input string to search: 5
No match found.

Enter your regex: [0-4[6-8]]
Enter input string to search: 6
I found the text "6" starting at index 0 and ending at index 1.

Enter your regex: [0-4[6-8]]
Enter input string to search: 8
I found the text "8" starting at index 0 and ending at index 1.

Enter your regex: [0-4[6-8]]
Enter input string to search: 9
No match found.

Intersections交集

To create a single character class matching only the characters common to all of its nested classes, use &&, as in [0-9&&[345]]. 要创建仅匹配其所有嵌套类的公共字符的单个字符类,请使用&&,如[0-9&&[345]]This particular intersection creates a single character class matching only the numbers common to both character classes: 3, 4, and 5.此特定交集创建一个仅与两个字符类(3、4和5)共有的数字匹配的单个字符类。

Enter your regex: [0-9&&[345]]
Enter input string to search: 3
I found the text "3" starting at index 0 and ending at index 1.

Enter your regex: [0-9&&[345]]
Enter input string to search: 4
I found the text "4" starting at index 0 and ending at index 1.

Enter your regex: [0-9&&[345]]
Enter input string to search: 5
I found the text "5" starting at index 0 and ending at index 1.

Enter your regex: [0-9&&[345]]
Enter input string to search: 2
No match found.

Enter your regex: [0-9&&[345]]
Enter input string to search: 6
No match found.

And here's an example that shows the intersection of two ranges:下面是一个示例,显示了两个范围的交点:

Enter your regex: [2-8&&[4-6]]
Enter input string to search: 3
No match found.

Enter your regex: [2-8&&[4-6]]
Enter input string to search: 4
I found the text "4" starting at index 0 and ending at index 1.

Enter your regex: [2-8&&[4-6]]
Enter input string to search: 5
I found the text "5" starting at index 0 and ending at index 1.

Enter your regex: [2-8&&[4-6]]
Enter input string to search: 6
I found the text "6" starting at index 0 and ending at index 1.

Enter your regex: [2-8&&[4-6]]
Enter input string to search: 7
No match found.

Subtraction减法

Finally, you can use subtraction to negate one or more nested character classes, such as [0-9&&[^345]]. 最后,可以使用差集对一个或多个嵌套字符类求反,例如[0-9&&[^345]]This example creates a single character class that matches everything from 0 to 9, except the numbers 3, 4, and 5.本例创建一个单字符类,该类匹配从0到9的所有内容,数字3、4和5除外

Enter your regex: [0-9&&[^345]]
Enter input string to search: 2
I found the text "2" starting at index 0 and ending at index 1.

Enter your regex: [0-9&&[^345]]
Enter input string to search: 3
No match found.

Enter your regex: [0-9&&[^345]]
Enter input string to search: 4
No match found.

Enter your regex: [0-9&&[^345]]
Enter input string to search: 5
No match found.

Enter your regex: [0-9&&[^345]]
Enter input string to search: 6
I found the text "6" starting at index 0 and ending at index 1.

Enter your regex: [0-9&&[^345]]
Enter input string to search: 9
I found the text "9" starting at index 0 and ending at index 1.

Now that we've covered how character classes are created, You may want to review the Character Classes table before continuing with the next section.既然我们已经介绍了角色类是如何创建的,那么在继续下一节之前,您可能需要查看字符类表


Previous page: String Literals
Next page: Predefined Character Classes