Table 12.14 Regular Expression Functions and Operators正则表达式函数和运算符
NOT REGEXP | |
REGEXP | |
REGEXP_INSTR() | |
REGEXP_LIKE() | |
REGEXP_REPLACE() | |
REGEXP_SUBSTR() | |
RLIKE |
A regular expression is a powerful way of specifying a pattern for a complex search. 正则表达式是为复杂搜索指定模式的强大方法。This section discusses the functions and operators available for regular expression matching and illustrates, with examples, some of the special characters and constructs that can be used for regular expression operations. 本节讨论可用于正则表达式匹配的函数和运算符,并举例说明可用于正则表达式操作的一些特殊字符和构造。See also Section 3.3.4.7, “Pattern Matching”.另请参阅第3.3.4.7节,“模式匹配”。
MySQL implements regular expression support using International Components for Unicode (ICU), which provides full Unicode support and is multibyte safe. MySQL使用国际Unicode组件(ICU)实现正则表达式支持,ICU提供完全的Unicode支持,并且是多字节安全的。(Prior to MySQL 8.0.4, MySQL used Henry Spencer's implementation of regular expressions, which operates in byte-wise fashion and is not multibyte safe. (在MySQL 8.0.4之前,MySQL使用了Henry Spencer的正则表达式实现,它以字节方式运行,并且不是多字节安全的。For information about ways in which applications that use regular expressions may be affected by the implementation change, see Regular Expression Compatibility Considerations.)有关使用正则表达式的应用程序受实现更改影响的方式的信息,请参阅正则表达式兼容性注意事项。)
, expr
NOT REGEXP pat
expr
NOT RLIKE pat
This is the same as 这相当于NOT (
.expr
REGEXP pat
)NOT (
。expr
REGEXP pat
)
, expr
REGEXP pat
expr
RLIKE pat
Returns 1 if the string 如果字符串expr
matches the regular expression specified by the pattern pat
, 0 otherwise. expr
与模式pat
指定的正则表达式匹配,则返回1,否则返回0。If 如果expr
or pat
is NULL
, the return value is NULL
.expr
或pat
为NULL
,则返回值为NULL
。
REGEXP
and RLIKE
are synonyms for REGEXP_LIKE()
.REGEXP
和RLIKE
是REGEXP_LIKE()
的同义词。
For additional information about how matching occurs, see the description for 有关如何进行匹配的其他信息,请参阅REGEXP_LIKE()
.REGEXP_LIKE()
的说明。
mysql>SELECT 'Michael!' REGEXP '.*';
+------------------------+ | 'Michael!' REGEXP '.*' | +------------------------+ | 1 | +------------------------+ mysql>SELECT 'new*\n*line' REGEXP 'new\\*.\\*line';
+---------------------------------------+ | 'new*\n*line' REGEXP 'new\\*.\\*line' | +---------------------------------------+ | 0 | +---------------------------------------+ mysql>SELECT 'a' REGEXP '^[a-d]';
+---------------------+ | 'a' REGEXP '^[a-d]' | +---------------------+ | 1 | +---------------------+ mysql>SELECT 'a' REGEXP 'A', 'a' REGEXP BINARY 'A';
+----------------+-----------------------+ | 'a' REGEXP 'A' | 'a' REGEXP BINARY 'A' | +----------------+-----------------------+ | 1 | 0 | +----------------+-----------------------+
REGEXP_INSTR(
expr
, pat
[, pos
[, occurrence
[, return_option
[, match_type
]]]])
Returns the starting index of the substring of the string 返回与expr
that matches the regular expression specified by the pattern pat
, 0 if there is no match. pat
模式指定的正则表达式匹配的字符串expr
的子字符串的起始索引,如果不匹配,则返回0。If 如果expr
or pat
is NULL
, the return value is NULL
. expr
或pat
为NULL
,则返回值为NULL
。Character indexes begin at 1.字符索引从1开始。
REGEXP_INSTR()
takes these optional arguments:REGEXP_INSTR()
接受以下可选参数:
pos
: The position in :expr
at which to start the search. expr
中开始搜索的位置。If omitted, the default is 1.如果省略,则默认值为1。
occurrence
: Which occurrence of a match to search for. :要搜索匹配项的哪次出现。If omitted, the default is 1.如果省略,则默认值为1。
return_option
: Which type of position to return. :要返回的位置类型。If this value is 0, 如果此值为0,则REGEXP_INSTR()
returns the position of the matched substring's first character. REGEXP_INSTR()
返回匹配子字符串的第一个字符的位置。If this value is 1, 如果此值为1,则REGEXP_INSTR()
returns the position following the matched substring. REGEXP_INSTR()
返回匹配子字符串后面的位置。If omitted, the default is 0.如果省略,则默认值为0。
match_type
: A string that specifies how to perform matching. :指定如何执行匹配的字符串。The meaning is as described for 其含义如REGEXP_LIKE()
.REGEXP_LIKE()
所述。
For additional information about how matching occurs, see the description for 有关如何进行匹配的其他信息,请参阅REGEXP_LIKE()
.REGEXP_LIKE()
的说明。
mysql>SELECT REGEXP_INSTR('dog cat dog', 'dog');
+------------------------------------+ | REGEXP_INSTR('dog cat dog', 'dog') | +------------------------------------+ | 1 | +------------------------------------+ mysql>SELECT REGEXP_INSTR('dog cat dog', 'dog', 2);
+---------------------------------------+ | REGEXP_INSTR('dog cat dog', 'dog', 2) | +---------------------------------------+ | 9 | +---------------------------------------+ mysql>SELECT REGEXP_INSTR('aa aaa aaaa', 'a{2}');
+-------------------------------------+ | REGEXP_INSTR('aa aaa aaaa', 'a{2}') | +-------------------------------------+ | 1 | +-------------------------------------+ mysql>SELECT REGEXP_INSTR('aa aaa aaaa', 'a{4}');
+-------------------------------------+ | REGEXP_INSTR('aa aaa aaaa', 'a{4}') | +-------------------------------------+ | 8 | +-------------------------------------+
REGEXP_LIKE(
expr
, pat
[, match_type
])
Returns 1 if the string 如果字符串expr
matches the regular expression specified by the pattern pat
, 0 otherwise. expr
与模式pat
指定的正则表达式匹配,则返回1,否则返回0。If 如果expr
or pat
is NULL
, the return value is NULL
.expr
或pat
为NULL
,则返回值为NULL
。
The pattern can be an extended regular expression, the syntax for which is discussed in Regular Expression Syntax. 模式可以是一个扩展的正则表达式,其语法将在正则表达式语法中讨论。The pattern need not be a literal string. 模式不必是文本字符串。For example, it can be specified as a string expression or table column.例如,可以将其指定为字符串表达式或表列。
The optional 可选的match_type
argument is a string that may contain any or all the following characters specifying how to perform matching:match
类型参数是一个字符串,可以包含指定如何执行匹配的以下任何或所有字符:
c
: Case-sensitive matching.:区分大小写的匹配。
i
: Case-insensitive matching.:不区分大小写匹配。
m
: Multiple-line mode. Recognize line terminators within the string. :多行模式。识别字符串中的行终止符。The default behavior is to match line terminators only at the start and end of the string expression.默认行为是仅在字符串表达式的开始和结束处匹配行终止符。
n
: The :.
character matches line terminators. .
字符匹配行终止符。The default is for .
matching to stop at the end of a line..
的默认值是匹配到在一行文字的尽头停下来。
u
: Unix-only line endings. :仅Unix行尾。Only the newline character is recognized as a line ending by the 只有换行符被识别为以.
, ^
, and $
match operators..
、^
和$
匹配运算符结尾的行。
If characters specifying contradictory options are specified within 如果在match_type
, the rightmost one takes precedence.match_type
中指定了指定矛盾选项的字符,则最右边的字符优先。
By default, regular expression operations use the character set and collation of the 默认情况下,在确定字符类型和执行比较时,正则表达式操作使用字符集以及expr
and pat
arguments when deciding the type of a character and performing the comparison. expr
和pat
参数的排序规则。If the arguments have different character sets or collations, coercibility rules apply as described in Section 10.8.4, “Collation Coercibility in Expressions”. 如果参数具有不同的字符集或排序规则,则强制规则适用于第10.8.4节,“表达式中的排序规则强制性”。Arguments may be specified with explicit collation indicators to change comparison behavior.可以使用显式排序指示符指定参数以更改比较行为。
mysql>SELECT REGEXP_LIKE('CamelCase', 'CAMELCASE');
+---------------------------------------+ | REGEXP_LIKE('CamelCase', 'CAMELCASE') | +---------------------------------------+ | 1 | +---------------------------------------+ mysql>SELECT REGEXP_LIKE('CamelCase', 'CAMELCASE' COLLATE utf8mb4_0900_as_cs);
+------------------------------------------------------------------+ | REGEXP_LIKE('CamelCase', 'CAMELCASE' COLLATE utf8mb4_0900_as_cs) | +------------------------------------------------------------------+ | 0 | +------------------------------------------------------------------+
match_type
may be specified with the c
or i
characters to override the default case sensitivity. match_type
可以用c
或i
字符指定,以覆盖默认的大小写敏感度。Exception: If either argument is a binary string, the arguments are handled in case-sensitive fashion as binary strings, even if 例外:如果其中一个参数是二进制字符串,则参数将以区分大小写的方式作为二进制字符串处理,即使match_type
contains the i
character.match_type
包含i
字符。
MySQL uses C escape syntax in strings (for example, MySQL在字符串中使用C转义语法(例如,\n
to represent the newline character). \n
表示换行符)。If you want your 如果希望expr
or pat
argument to contain a literal \
, you must double it. expr
或pat
参数包含文字\
,则必须将其加倍。(Unless the (如果启用了NO_BACKSLASH_ESCAPES
SQL mode is enabled, in which case no escape character is used.)NO_BACKSLASH_ESCAPES
SQL模式,则不使用转义字符。)
mysql>SELECT REGEXP_LIKE('Michael!', '.*');
+-------------------------------+ | REGEXP_LIKE('Michael!', '.*') | +-------------------------------+ | 1 | +-------------------------------+ mysql>SELECT REGEXP_LIKE('new*\n*line', 'new\\*.\\*line');
+----------------------------------------------+ | REGEXP_LIKE('new*\n*line', 'new\\*.\\*line') | +----------------------------------------------+ | 0 | +----------------------------------------------+ mysql>SELECT REGEXP_LIKE('a', '^[a-d]');
+----------------------------+ | REGEXP_LIKE('a', '^[a-d]') | +----------------------------+ | 1 | +----------------------------+ mysql>SELECT REGEXP_LIKE('a', 'A'), REGEXP_LIKE('a', BINARY 'A');
+-----------------------+------------------------------+ | REGEXP_LIKE('a', 'A') | REGEXP_LIKE('a', BINARY 'A') | +-----------------------+------------------------------+ | 1 | 0 | +-----------------------+------------------------------+
mysql>SELECT REGEXP_LIKE('abc', 'ABC');
+---------------------------+ | REGEXP_LIKE('abc', 'ABC') | +---------------------------+ | 1 | +---------------------------+ mysql>SELECT REGEXP_LIKE('abc', 'ABC', 'c');
+--------------------------------+ | REGEXP_LIKE('abc', 'ABC', 'c') | +--------------------------------+ | 0 | +--------------------------------+
REGEXP_REPLACE(
expr
, pat
, repl
[, pos
[, occurrence
[, match_type
]]])
Replaces occurrences in the string 用替换字符串expr
that match the regular expression specified by the pattern pat
with the replacement string repl
, and returns the resulting string. repl
替换字符串expr
中与pat
模式指定的正则表达式匹配的匹配项,并返回结果字符串。If 如果expr
, pat
, or repl
is NULL
, the return value is NULL
.expr
、pat
或repl
为NULL
,则返回值为NULL
。
REGEXP_REPLACE()
takes these optional arguments:REGEXP_REPLACE()
接受以下可选参数:
pos
: The position in :expr
at which to start the search. expr
中开始搜索的位置。If omitted, the default is 1.如果省略,则默认值为1。
occurrence
: Which occurrence of a match to replace. :要替换的匹配项的哪次出现。If omitted, the default is 0 (which means “replace all occurrences”).如果省略,则默认值为0(表示“替换所有引用”)。
match_type
: A string that specifies how to perform matching. :指定如何执行匹配的字符串。The meaning is as described for 其含义如REGEXP_LIKE()
.REGEXP_LIKE()
所述。
Prior to MySQL 8.0.17, the result returned by this function used the 在MySQL8.0.17之前,此函数返回的结果使用UTF-16字符集;在MySQL8.0.17及更高版本中,使用搜索匹配项的表达式的字符集和排序规则。(Bug #94203, Bug #29308212)UTF-16
character set; in MySQL 8.0.17 and later, the character set and collation of the expression searched for matches is used.
For additional information about how matching occurs, see the description for 有关如何进行匹配的其他信息,请参阅REGEXP_LIKE()
.REGEXP_LIKE()
的说明。
mysql>SELECT REGEXP_REPLACE('a b c', 'b', 'X');
+-----------------------------------+ | REGEXP_REPLACE('a b c', 'b', 'X') | +-----------------------------------+ | a X c | +-----------------------------------+ mysql>SELECT REGEXP_REPLACE('abc def ghi', '[a-z]+', 'X', 1, 3);
+----------------------------------------------------+ | REGEXP_REPLACE('abc def ghi', '[a-z]+', 'X', 1, 3) | +----------------------------------------------------+ | abc def X | +----------------------------------------------------+
REGEXP_SUBSTR(
expr
, pat
[, pos
[, occurrence
[, match_type
]]])
Returns the substring of the string 返回与expr
that matches the regular expression specified by the pattern pat
, NULL
if there is no match. pat
模式指定的正则表达式匹配的字符串expr
的子字符串,如果不匹配,则返回NULL
。If 如果expr
or pat
is NULL
, the return value is NULL
.expr
或pat
为NULL
,则返回值为NULL
。
REGEXP_SUBSTR()
takes these optional arguments:REGEXP_SUBSTR()
接受以下可选参数:
pos
: The position in :expr
at which to start the search. expr
中开始搜索的位置。If omitted, the default is 1.如果省略,则默认值为1。
occurrence
: Which occurrence of a match to search for. :要搜索的匹配项的哪次出现。If omitted, the default is 1.如果省略,则默认值为1。
match_type
: A string that specifies how to perform matching. :指定如何执行匹配的字符串。The meaning is as described for 其含义如REGEXP_LIKE()
.REGEXP_LIKE()
所述。
Prior to MySQL 8.0.17, the result returned by this function used the 在MySQL8.0.17之前,此函数返回的结果使用UTF-16
character set; in MySQL 8.0.17 and later, the character set and collation of the expression searched for matches is used. UTF-16
字符集;在MySQL8.0.17及更高版本中,使用搜索匹配项的表达式的字符集和排序规则。(Bug #94203, Bug #29308212)
For additional information about how matching occurs, see the description for 有关如何进行匹配的其他信息,请参阅REGEXP_LIKE()
.REGEXP_LIKE()
的说明。
mysql>SELECT REGEXP_SUBSTR('abc def ghi', '[a-z]+');
+----------------------------------------+ | REGEXP_SUBSTR('abc def ghi', '[a-z]+') | +----------------------------------------+ | abc | +----------------------------------------+ mysql>SELECT REGEXP_SUBSTR('abc def ghi', '[a-z]+', 1, 3);
+----------------------------------------------+ | REGEXP_SUBSTR('abc def ghi', '[a-z]+', 1, 3) | +----------------------------------------------+ | ghi | +----------------------------------------------+
A regular expression describes a set of strings. 正则表达式描述一组字符串。The simplest regular expression is one that has no special characters in it. 最简单的正则表达式是没有特殊字符的正则表达式。For example, the regular expression 例如,正则表达式hello
matches hello
and nothing else.hello
匹配hello
,而不是其他。
Nontrivial regular expressions use certain special constructs so that they can match more than one string. 非平凡正则表达式使用某些特殊构造,以便它们可以匹配多个字符串。For example, the regular expression 例如,正则表达式hello|world
contains the |
alternation operator and matches either the hello
or world
.hello|world
包含|
交替运算符并匹配hello
或world
。
As a more complex example, the regular expression 作为一个更复杂的例子,正则表达式B[an]*s
matches any of the strings Bananas
, Baaaaas
, Bs
, and any other string starting with a B
, ending with an s
, and containing any number of a
or n
characters in between.B[an]*s
匹配任一字符串banana
、baaaas
、Bs
,以及任何其他以B
开头、以s
结尾、中间包含任意数量的a
或n
字符的字符串。
The following list covers some of the basic special characters and constructs that can be used in regular expressions. 下面的列表介绍了一些可以在正则表达式中使用的基本特殊字符和构造。For information about the full regular expression syntax supported by the ICU library used to implement regular expression support, visit the International Components for Unicode website.有关用于实现正则表达式支持的ICU库所支持的完整正则表达式语法的信息,请访问International Components For Unicode网站。
^
Match the beginning of a string.匹配字符串的开头。
mysql>SELECT REGEXP_LIKE('fo\nfo', '^fo$');
-> 0 mysql>SELECT REGEXP_LIKE('fofo', '^fo');
-> 1
$
Match the end of a string.匹配字符串的结尾。
mysql>SELECT REGEXP_LIKE('fo\no', '^fo\no$');
-> 1 mysql>SELECT REGEXP_LIKE('fo\no', '^fo$');
-> 0
.
Match any character (including carriage return and newline, although to match these in the middle of a string, the 匹配任何字符(包括回车符和换行符,但要在字符串中间匹配这些字符,必须指定m
(multiple line) match-control character or the (?m)
within-pattern modifier must be given).m
(多行)匹配控制字符或(?m)
模式内修饰符)。
mysql>SELECT REGEXP_LIKE('fofo', '^f.*$');
-> 1 mysql>SELECT REGEXP_LIKE('fo\r\nfo', '^f.*$');
-> 0 mysql>SELECT REGEXP_LIKE('fo\r\nfo', '^f.*$', 'm');
-> 1 mysql>SELECT REGEXP_LIKE('fo\r\nfo', '(?m)^f.*$');
-> 1
a*
Match any sequence of zero or more 匹配零个或多个a
characters.a
字符的任意序列。
mysql>SELECT REGEXP_LIKE('Ban', '^Ba*n');
-> 1 mysql>SELECT REGEXP_LIKE('Baaan', '^Ba*n');
-> 1 mysql>SELECT REGEXP_LIKE('Bn', '^Ba*n');
-> 1
a+
Match any sequence of one or more 匹配一个或多个a
characters.a
字符的任意序列。
mysql>SELECT REGEXP_LIKE('Ban', '^Ba+n');
-> 1 mysql>SELECT REGEXP_LIKE('Bn', '^Ba+n');
-> 0
a?
Match either zero or one 匹配零或一个a
character.a
字符。
mysql>SELECT REGEXP_LIKE('Bn', '^Ba?n');
-> 1 mysql>SELECT REGEXP_LIKE('Ban', '^Ba?n');
-> 1 mysql>SELECT REGEXP_LIKE('Baan', '^Ba?n');
-> 0
de|abc
Alternation; match either of the sequences 交替;匹配de
or abc
.de
或abc
序列。
mysql>SELECT REGEXP_LIKE('pi', 'pi|apa');
-> 1 mysql>SELECT REGEXP_LIKE('axe', 'pi|apa');
-> 0 mysql>SELECT REGEXP_LIKE('apa', 'pi|apa');
-> 1 mysql>SELECT REGEXP_LIKE('apa', '^(pi|apa)$');
-> 1 mysql>SELECT REGEXP_LIKE('pi', '^(pi|apa)$');
-> 1 mysql>SELECT REGEXP_LIKE('pix', '^(pi|apa)$');
-> 0
(abc)*
Match zero or more instances of the sequence 匹配序列abc
.abc
的零个或多个实例。
mysql>SELECT REGEXP_LIKE('pi', '^(pi)*$');
-> 1 mysql>SELECT REGEXP_LIKE('pip', '^(pi)*$');
-> 0 mysql>SELECT REGEXP_LIKE('pipi', '^(pi)*$');
-> 1
{1}
, {2,3}
Repetition; 重复,{
and n
}{
notation provide a more general way of writing regular expressions that match many occurrences of the previous atom (or “piece”) of the pattern. m
,n
}{
和n
}{
表示法提供了一种更通用的方法来编写正则表达式,这些正则表达式与模式的前一个原子(或“片段”)的许多匹配项相匹配。m
,n
}m
and n
are integers.m
和n
是整数。
a*
Can be written as 可以写为a{0,}
.a{0,}
。
a+
Can be written as 可以写为a{1,}
.a{1,}
。
a?
Can be written as 可以写为a{0,1}
.a{0,1}
。
To be more precise, 更准确地说,a{
matches exactly n
}n
instances of a
. a{
正好匹配n
}a
的n
个实例。a{
n
,} matches 匹配n
or more instances of a
. a
的n
个或多个实例。a{
m
,n
}matches 匹配m
through n
instances of a
, inclusive. a
的m
到n
个实例(含)。If both 如果m
and n
are given, m
must be less than or equal to n
.m
和n
都给定,则m
必须小于或等于n
。
mysql>SELECT REGEXP_LIKE('abcde', 'a[bcd]{2}e');
-> 0 mysql>SELECT REGEXP_LIKE('abcde', 'a[bcd]{3}e');
-> 1 mysql>SELECT REGEXP_LIKE('abcde', 'a[bcd]{1,10}e');
-> 1
[a-dX]
, [^a-dX]
Matches any character that is (or is not, if 匹配^
is used) either a
, b
, c
, d
or X
. a
、b
、c
、d
或X
中的任一字符(如果用了^
则是匹配不属于这些字符的字符)。A 其他两个字符之间的-
character between two other characters forms a range that matches all characters from the first character to the second. -
字符形成一个范围,该范围匹配从第一个字符到第二个字符的所有字符。For example, 例如,[0-9]
matches any decimal digit. [0-9]
匹配任何十进制数字。To include a literal 要包含文字]
character, it must immediately follow the opening bracket [
. ]
字符,它必须紧跟在左括号[
后面。To include a literal 要包含文字-
character, it must be written first or last. -
字符,必须写在最前面或最后面。Any character that does not have a defined special meaning inside a []
pair matches only itself.[]
对中没有定义特殊含义的任何字符只匹配自身。
mysql>SELECT REGEXP_LIKE('aXbc', '[a-dXYZ]');
-> 1 mysql>SELECT REGEXP_LIKE('aXbc', '^[a-dXYZ]$');
-> 0 mysql>SELECT REGEXP_LIKE('aXbc', '^[a-dXYZ]+$');
-> 1 mysql>SELECT REGEXP_LIKE('aXbc', '^[^a-dXYZ]+$');
-> 0 mysql>SELECT REGEXP_LIKE('gheis', '^[^a-dXYZ]+$');
-> 1 mysql>SELECT REGEXP_LIKE('gheisa', '^[^a-dXYZ]+$');
-> 0
[=character_class=]
Within a bracket expression (written using 在括号表达式(使用[
and ]
), [=character_class=]
represents an equivalence class. [
和]
编写)中,[=character_class=]
表示等价类。It matches all characters with the same collation value, including itself. 它匹配具有相同排序规则值的所有字符,包括其自身。For example, if 例如,如果o
and (+)
are the members of an equivalence class, [[=o=]]
, [[=(+)=]]
, and [o(+)]
are all synonymous. o
和(+)
是等价类的成员,[[=o=]]
、[[=(+)=]]
和[o(+)]
都是同义词。An equivalence class may not be used as an endpoint of a range.等价类不能用作范围的端点。
[:character_class:]
Within a bracket expression (written using 在方括号表达式(使用[
and ]
), [:character_class:]
represents a character class that matches all characters belonging to that class. [
和]
编写)中,[:character_class:]
表示与属于该类的所有字符匹配的字符类。The following table lists the standard class names. 下表列出了标准类名。These names stand for the character classes defined in the 这些名称代表ctype(3)
manual page. ctype(3)
手册页中定义的字符类。A particular locale may provide other class names. 特定的区域设置可以提供其他类名。A character class may not be used as an endpoint of a range.字符类不能用作范围的端点。
alnum | |
alpha | |
blank | |
cntrl | |
digit | |
graph | |
lower | |
print | |
punct | |
space | |
upper | |
xdigit |
mysql>SELECT REGEXP_LIKE('justalnums', '[[:alnum:]]+');
-> 1 mysql>SELECT REGEXP_LIKE('!!', '[[:alnum:]]+');
-> 0
To use a literal instance of a special character in a regular expression, precede it by two backslash (\) characters. 要在正则表达式中使用特殊字符的文本实例,请在其前面加两个反斜杠(\
)字符。The MySQL parser interprets one of the backslashes, and the regular expression library interprets the other. MySQL解析器解释其中一个反斜杠,正则表达式库解释另一个反斜杠。For example, to match the string 例如,要匹配包含特殊1+2
that contains the special +
character, only the last of the following regular expressions is the correct one:+
字符的字符串1+2
,只有以下正则表达式中的最后一个才是正确的:
mysql>SELECT REGEXP_LIKE('1+2', '1+2');
-> 0 mysql>SELECT REGEXP_LIKE('1+2', '1\+2');
-> 0 mysql>SELECT REGEXP_LIKE('1+2', '1\\+2');
-> 1
REGEXP_LIKE()
and similar functions use resources that can be controlled by setting system variables:REGEXP_LIKE()
和类似的函数使用的资源可以通过设置系统变量来控制:
The match engine uses memory for its internal stack. 匹配引擎使用内存作为其内部堆栈。To control the maximum available memory for the stack in bytes, set the 要控制堆栈的最大可用内存(字节),请设置regexp_stack_limit
system variable.regexp_stack_limit
系统变量。
The match engine operates in steps. 匹配引擎分步运行。To control the maximum number of steps performed by the engine (and thus indirectly the execution time), set the 要控制引擎执行的最大步数(从而间接控制执行时间),请设置regexp_time_limit
system variable. regexp_time_limit
系统变量。Because this limit is expressed as number of steps, it affects execution time only indirectly. 因为这个限制表示为步数,所以它只间接影响执行时间。Typically, it is on the order of milliseconds.通常,它是以毫秒为单位的。
Prior to MySQL 8.0.4, MySQL used the Henry Spencer regular expression library to support regular expression operations, rather than International Components for Unicode (ICU). 在MySQL 8.0.4之前,MySQL使用Henry Spencer正则表达式库来支持正则表达式操作,而不是Unicode(ICU)的国际组件。The following discussion describes differences between the Spencer and ICU libraries that may affect applications:以下讨论描述了Spencer和ICU库之间可能影响应用程序的差异:
With the Spencer library, the 在Spencer库中,REGEXP
and RLIKE
operators work in byte-wise fashion, so they are not multibyte safe and may produce unexpected results with multibyte character sets. REGEXP
和RLIKE
操作符以字节方式工作,因此它们不是多字节安全的,并且可能对多字节字符集产生意外的结果。In addition, these operators compare characters by their byte values and accented characters may not compare as equal even if a given collation treats them as equal.此外,这些运算符按字节值比较字符,即使给定的排序规则将重音字符视为相等,重音字符也可能不相等。
ICU has full Unicode support and is multibyte safe. ICU完全支持Unicode,并且是多字节安全的。Its regular expression functions treat all strings as 它的正则表达式函数将所有字符串视为UTF-16
. UTF-16
。You should keep in mind that positional indexes are based on 16-bit chunks and not on code points. 您应该记住,位置索引基于16位块,而不是代码点。This means that, when passed to such functions, characters using more than one chunk may produce unanticipated results, such as those shown here:这意味着,当传递给此类函数时,使用多个块的字符可能会产生意想不到的结果,如以下所示:
mysql>SELECT REGEXP_INSTR('🍣🍣b', 'b');
+--------------------------+ | REGEXP_INSTR('??b', 'b') | +--------------------------+ | 5 | +--------------------------+ 1 row in set (0.00 sec) mysql>SELECT REGEXP_INSTR('🍣🍣bxxx', 'b', 4);
+--------------------------------+ | REGEXP_INSTR('??bxxx', 'b', 4) | +--------------------------------+ | 5 | +--------------------------------+ 1 row in set (0.00 sec)
Characters within the Unicode Basic Multilingual Plane, which includes characters used by most modern languages, are safe in this regard:Unicode基本多语言平面中的字符(包括大多数现代语言使用的字符)在这方面是安全的:
mysql>SELECT REGEXP_INSTR('бжb', 'b');
+----------------------------+ | REGEXP_INSTR('бжb', 'b') | +----------------------------+ | 3 | +----------------------------+ 1 row in set (0.00 sec) mysql>SELECT REGEXP_INSTR('עבb', 'b');
+----------------------------+ | REGEXP_INSTR('עבb', 'b') | +----------------------------+ | 3 | +----------------------------+ 1 row in set (0.00 sec) mysql>SELECT REGEXP_INSTR('µå周çб', '周');
+------------------------------------+ | REGEXP_INSTR('µå周çб', '周') | +------------------------------------+ | 3 | +------------------------------------+ 1 row in set (0.00 sec)
Emoji, such as the “sushi” character Emoji,例如前两个示例中使用的“sushi”字符🍣
(U+1F363) used in the first two examples, are not included in the Basic Multilingual Plane, but rather in Unicode's Supplementary Multilingual Plane. 🍣
(U+1F363),不包含在基本的多语言平面中,而是包含在Unicode的补充多语言平面中。Another issue can arise with emoji and other 4-byte characters when 当REGEXP_SUBSTR()
or a similar function begins searching in the middle of a character. REGEXP_SUBSTR()
或类似函数开始在字符中间搜索时,emoji和其他4字节字符可能会出现另一个问题。Each of the two statements in the following example starts from the second 2-byte position in the first argument. 以下示例中的两个语句都从第一个参数中的第二个2字节位置开始。The first statement works on a string consisting solely of 2-byte (BMP) characters. 第一个语句处理仅由2字节(BMP)字符组成的字符串。The second statement contains 4-byte characters which are incorrectly interpreted in the result because the first two bytes are stripped off and so the remainder of the character data is misaligned.第二条语句包含的4字节字符在结果中被错误地解释,因为前两个字节被剥离,因此字符数据的其余部分没有对齐。
mysql>SELECT REGEXP_SUBSTR('周周周周', '.*', 2);
+----------------------------------------+ | REGEXP_SUBSTR('周周周周', '.*', 2) | +----------------------------------------+ | 周周周 | +----------------------------------------+ 1 row in set (0.00 sec) mysql>SELECT REGEXP_SUBSTR('🍣🍣🍣🍣', '.*', 2);
+--------------------------------+ | REGEXP_SUBSTR('????', '.*', 2) | +--------------------------------+ | ?㳟揘㳟揘㳟揘 | +--------------------------------+ 1 row in set (0.00 sec)
For the 对于.
operator, the Spencer library matches line-terminator characters (carriage return, newline) anywhere in string expressions, including in the middle. .
运算符,斯宾塞库匹配字符串终止符中任何地方的行终止符(回车符,换行符),包括中间。To match line terminator characters in the middle of strings with ICU, specify the 要将字符串中间的行结束符字符与ICU匹配,请指定m
match-control character.m
匹配控制字符。
The Spencer library supports word-beginning and word-end boundary markers (Spencer库支持单词开头和单词结尾边界标记([[:<:]]
and [[:>:]]
notation). [[:<:]]
和[[:>:]]
表示法)。ICU does not. ICU没有。For ICU, you can use 对于ICU,可以使用\b
to match word boundaries; double the backslash because MySQL interprets it as the escape character within strings.\b
来匹配单词边界;加倍反斜杠,因为MySQL将其解释为字符串中的转义字符。
The Spencer library supports collating element bracket expressions (Spencer库支持排序元素括号表达式([.characters.]
notation). [.characters.]
表示法)。ICU does not.ICU不是这样的。
For repetition counts (对于重复计数({n}
and {m,n}
notation), the Spencer library has a maximum of 255. {n}
和{m,n}
表示法),斯宾塞库的最大值为255。ICU has no such limit, although the maximum number of match engine steps can be limited by setting the ICU没有这样的限制,尽管可以通过设置regexp_time_limit
system variable.regexp_time_limit
系统变量来限制匹配引擎的最大步数。
ICU interprets parentheses as metacharacters. ICU将圆括号解释为元字符。To specify a literal open or close parenthesis 若要在正则表达式中指定文字开括号或闭括号(
in a regular expression, it must be escaped:(
,必须对其进行转义:
mysql>SELECT REGEXP_LIKE('(', '(');
ERROR 3692 (HY000): Mismatched parenthesis in regular expression. mysql>SELECT REGEXP_LIKE('(', '\\(');
+-------------------------+ | REGEXP_LIKE('(', '\\(') | +-------------------------+ | 1 | +-------------------------+ mysql>SELECT REGEXP_LIKE(')', ')');
ERROR 3692 (HY000): Mismatched parenthesis in regular expression. mysql>SELECT REGEXP_LIKE(')', '\\)');
+-------------------------+ | REGEXP_LIKE(')', '\\)') | +-------------------------+ | 1 | +-------------------------+
ICU also interprets square brackets as metacharacters, but only the opening square bracket need be escaped to be used as a literal character:ICU还将方括号解释为元字符,但只有开头的方括号需要转义才能用作文字字符:
mysql>SELECT REGEXP_LIKE('[', '[');
ERROR 3696 (HY000): The regular expression contains an unclosed bracket expression. mysql>SELECT REGEXP_LIKE('[', '\\[');
+-------------------------+ | REGEXP_LIKE('[', '\\[') | +-------------------------+ | 1 | +-------------------------+ mysql>SELECT REGEXP_LIKE(']', ']');
+-----------------------+ | REGEXP_LIKE(']', ']') | +-----------------------+ | 1 | +-----------------------+