$regexFind (aggregation)

On this page本页内容

Definition定义

$regexFind

New in version 4.2.版本4.2中的新功能。

Provides regular expression (regex) pattern matching capability in aggregation expressions. 在聚合表达式中提供正则表达式(regex)模式匹配功能。If a match is found, returns a document that contains information on the first match. 如果找到匹配项,则返回包含第一个匹配项信息的文档。If a match is not found, returns null.如果未找到匹配项,则返回null

MongoDB uses Perl compatible regular expressions (i.e. “PCRE” ) version 8.41 with UTF-8 support.MongoDB使用Perl兼容的正则表达式(即“PCRE”)版本8.41,支持UTF-8。

Prior to MongoDB 4.2, aggregation pipeline can only use the query operator $regex in the $match stage. 在MongoDB 4.2之前,聚合管道只能在$match阶段使用查询运算符$regexFor more information on using regex in a query, see $regex.有关在查询中使用regex的更多信息,请参阅$regex

Syntax语法

The $regexFind operator has the following syntax:$regexFind运算符语法如下所示:

{ $regexFind: { input: <expression> , regex: <expression>, options: <expression> } }

Operator Fields运算符字段

Field字段Description描述
input

The string on which you wish to apply the regex pattern. 要应用正则表达式模式的字符串。Can be a string or any valid expression that resolves to a string.可以是字符串或解析为字符串的任何有效表达式

regex

The regex pattern to apply. 要应用的正则表达式模式。Can be any valid expression that resolves to either a string or regex pattern /<pattern>/. 可以是解析为字符串或正则表达式模式/<pattern>/的任何有效表达式When using the regex /<pattern>/, you can also specify the regex options i and m (but not the s or x options):使用regex/<pattern>/时,还可以指定regex选项im(但不能指定sx选项):

  • "pattern"
  • /<pattern>/
  • /<pattern>/<options>

Alternatively, you can also specify the regex options with the options field. 或者,也可以使用options字段指定正则表达式选项。To specify the s or x options, you must use the options field.要指定sx选项,必须使用options字段。

You cannot specify options in both the regex and the options field.不能同时在regexoptions字段中指定选项。

options

Optional. 可选。The following <options> are available for use with regular expression.以下<options>可用于正则表达式。

Note

You cannot specify options in both the regex and the options field.不能同时在regexoptions字段中指定选项。

OptionDescription描述
i Case insensitivity to match both upper and lower cases. 不区分大小写以匹配大小写。You can specify the option in the options field or as part of the regex field.可以在options字段中指定该选项,也可以将其作为正则表达式字段的一部分指定。
m

For patterns that include anchors (i.e. ^ for the start, $ for the end), match at the beginning or end of each line for strings with multiline values. 对于包含锚定的模式(即^表示开始,$表示结束),请在每行的开头或结尾匹配具有多行值的字符串。Without this option, these anchors match at beginning or end of the string.如果没有此选项,这些锚定将在字符串的开头或结尾匹配。

If the pattern contains no anchors or if the string value has no newline characters (e.g. \n), the m option has no effect.如果模式不包含锚,或者字符串值没有换行符(例如\n),则m选项无效。

x

“Extended” capability to ignore all white space characters in the pattern unless escaped or included in a character class.“扩展”功能可以忽略模式中的所有空白字符,除非转义或包含在字符类中。

Additionally, it ignores characters in-between and including an un-escaped hash/pound (#) character and the next new line, so that you may include comments in complicated patterns. 此外,它会忽略中间的字符,包括一个未转义的哈希/磅(#)字符和下一个新行,这样您就可以在复杂的模式中包含注释。This only applies to data characters; white space characters may never appear within special character sequences in a pattern.这只适用于数据字符;空白字符可能永远不会出现在图案中的特殊字符序列中。

The x option does not affect the handling of the VT character (i.e. code 11).x选项不影响VT字符(即代码11)的处理。

You can specify the option only in the options field.只能在options字段中指定选项。

s

Allows the dot character (i.e. .) to match all characters including newline characters.允许点字符(即.)匹配所有字符,包括换行符。

You can specify the option only in the options field.只能在options字段中指定选项。

Returns返回

If the operator does not find a match, the result of the operator is a null.如果运算符未找到匹配项,则运算符的结果为null

If the operator finds a match, the result of the operator is a document that contains:如果运算符找到匹配项,则运算符的结果是包含以下内容的文档:

  • the first matching string in the input,输入中的第一个匹配字符串,
  • the code point index (not byte index) of the matching string in the input, and输入中匹配字符串的代码点索引(不是字节索引),以及
  • An array of the strings that corresponds to the groups captured by the matching string. 与匹配字符串捕获的组相对应的字符串数组。Capturing groups are specified with unescaped parenthesis () in the regex pattern.在正则表达式模式中,捕获组是用不带转义的括号()指定的。
{ "match" : <string>, "idx" : <num>, "captures" : <array of strings> }

See also参阅

Behavior行为

$regexFind and Collation和排序规则

$regexFind ignores the collation specified for the collection, db.collection.aggregate(), and the index, if used.忽略为集合、db.collection.aggregate()和索引(如果使用)指定的排序规则。

For example, the create a sample collection with collation strength 1 (i.e. compare base character only and ignore other differences such as case and diacritics):例如,创建排序规则强度为1的样本集合(即,仅比较基本字符,忽略其他差异,如大小写和变音符号):

db.createCollection( "myColl", { collation: { locale: "fr", strength: 1 } } )

Insert the following documents:插入以下文档:

db.myColl.insertMany([
   { _id: 1, category: "café" },
   { _id: 2, category: "cafe" },
   { _id: 3, category: "cafE" }
])

Using the collection’s collation, the following operation performs a case-insensitive and diacritic-insensitive match:使用集合的排序规则,以下操作执行不区分大小写和不区分重音的匹配:

db.myColl.aggregate( [ { $match: { category: "cafe" } } ] )

The operation returns the following 3 documents:该操作返回以下3个文档:

{ "_id" : 1, "category" : "café" }
{ "_id" : 2, "category" : "cafe" }
{ "_id" : 3, "category" : "cafE" }

However, the aggregation expression $regexFind ignores collation; that is, the following regular expression pattern matching examples are case-sensitive and diacritic sensitive:但是,聚合表达式$regexFind忽略排序规则;也就是说,以下正则表达式模式匹配示例区分大小写,区分重音:

db.myColl.aggregate( [ { $addFields: { resultObject: { $regexFind: { input: "$category", regex: /cafe/ }  } } } ] )
db.myColl.aggregate(
   [ { $addFields: { resultObject: { $regexFind: { input: "$category", regex: /cafe/ }  } } } ],
   { collation: { locale: "fr", strength: 1 } }           // Ignored in the $regexFind
)

Both operations return the following:两个操作都返回以下内容:

{ "_id" : 1, "category" : "café", "resultObject" : null }
{ "_id" : 2, "category" : "cafe", "resultObject" : { "match" : "cafe", "idx" : 0, "captures" : [ ] } }
{ "_id" : 3, "category" : "cafE", "resultObject" : null }

To perform a case-insensitive regex pattern matching, use the i Option instead. 要执行不区分大小写的正则表达式模式匹配,请改用i选项See i Option for an example.有关示例,请参阅i选项

captures Output Behavior输出行为

If your regex pattern contains capture groups and the pattern finds a match in the input, the captures array in the results corresponds to the groups captured by the matching string. 如果regex模式包含捕获组,并且该模式在输入中找到匹配项,则结果中的捕获数组对应于匹配字符串捕获的组。Capture groups are specified with unescaped parentheses () in the regex pattern. 在正则表达式模式中,捕获组是用不带转义的括号()指定的。The length of the captures array equals the number of capture groups in the pattern and the order of the array matches the order in which the capture groups appear.captures数组的长度等于模式中捕获组的数量,数组的顺序与捕获组的出现顺序匹配。

Create a sample collection named contacts with the following documents:使用以下文档创建名为contacts的样本集合:

db.contacts.insertMany([
  { "_id": 1, "fname": "Carol", "lname": "Smith", "phone": "718-555-0113" },
  { "_id": 2, "fname": "Daryl", "lname": "Doe", "phone": "212-555-8832" },
  { "_id": 3, "fname": "Polly", "lname": "Andrews", "phone": "208-555-1932" },
  { "_id": 4, "fname": "Colleen", "lname": "Duncan", "phone": "775-555-0187" },
  { "_id": 5, "fname": "Luna", "lname": "Clarke", "phone": "917-555-4414" }
])

The following pipeline applies the regex pattern /(C(ar)*)ol/ to the fname field:以下管道将regex模式/(C(ar)*)ol/应用于fname字段:

db.contacts.aggregate([
  {
    $project: {
      returnObject: {
        $regexFind: { input: "$fname", regex: /(C(ar)*)ol/ }
      }
    }
  }
])

The regex pattern finds a match with fname values Carol and Colleen:regex模式找到与fnameCarolColleen匹配的值:

{ "_id" : 1, "returnObject" : { "match" : "Carol", "idx" : 0, "captures" : [ "Car", "ar" ] } }{ "_id" : 2, "returnObject" : null }
{ "_id" : 3, "returnObject" : null }
{ "_id" : 4, "returnObject" : { "match" : "Col", "idx" : 0, "captures" : [ "C", null ] } }{ "_id" : 5, "returnObject" : null }

The pattern contains the capture group (C(ar)*) which contains the nested group (ar). 该模式包含包含嵌套组(ar)的捕获组(C(ar)*)The elements in the captures array correspond to the two capture groups. captures数组中的元素对应于两个捕获组。If a matching document is not captured by a group (e.g. Colleen and the group (ar)), $regexFind replaces the group with a null placeholder.如果组(例如Colleen和组(ar))未捕获匹配的文档,$regexFind将使用空占位符替换组。

As shown in the previous example, the captures array contains an element for each capture group (using null for non-captures). 如前一个示例所示,captures数组包含每个捕获组的一个元素(非捕获使用null)。Consider the following example which searches for phone numbers with New York City area codes by applying a logical or of capture groups to the phone field. 考虑下面的例子,通过在phone字段中应用逻辑or捕获组来搜索带有纽约区域代码的电话号码。Each group represents a New York City area code:每组代表一个纽约市区号:

db.contacts.aggregate([
  {
    $project: {
      nycContacts: {
        $regexFind: { input: "$phone", regex: /^(718).*|^(212).*|^(917).*/ }
      }
    }
  }
])

For documents which are matched by the regex pattern, the captures array includes the matching capture group and replaces any non-capturing groups with null:对于通过regex模式匹配的文档,captures数组包括匹配的捕获组,并将任何非捕获组替换为null

{ "_id" : 1, "nycContacts" : { "match" : "718-555-0113", "idx" : 0, "captures" : [ "718", null, null ] } }
{ "_id" : 2, "nycContacts" : { "match" : "212-555-8832", "idx" : 0, "captures" : [ null, "212", null ] } }
{ "_id" : 3, "nycContacts" : null }
{ "_id" : 4, "nycContacts" : null }
{ "_id" : 5, "nycContacts" : { "match" : "917-555-4414", "idx" : 0, "captures" : [ null, null, "917" ] } }

Examples示例

$regexFind and Its Options及其选择

To illustrate the behavior of the $regexFind operator as discussed in this example, create a sample collection products with the following documents:为了说明本例中讨论的$regexFind运算符的行为,请使用以下文档创建一个示例集合products

db.products.insertMany([
   { _id: 1, description: "Single LINE description." },
   { _id: 2, description: "First lines\nsecond line" },
   { _id: 3, description: "Many spaces before     line" },
   { _id: 4, description: "Multiple\nline descriptions" },
   { _id: 5, description: "anchors, links and hyperlinks" },
   { _id: 6, description: "métier work vocation" }
])

By default, $regexFind performs a case-sensitive match. 默认情况下,$regexFind执行区分大小写的匹配。For example, the following aggregation performs a case-sensitive $regexFind on the description field. 例如,下面的聚合在description字段上执行区分大小写的$regexFindThe regex pattern /line/ does not specify any grouping:正则表达式模式/line/未指定任何分组:

db.products.aggregate([
   { $addFields: { returnObject: { $regexFind: { input: "$description", regex: /line/ } } } }
])

The operation returns the following:该操作返回以下内容:

{ "_id" : 1, "description" : "Single LINE description.", "returnObject" : null }
{ "_id" : 2, "description" : "First lines\nsecond line", "returnObject" : { "match" : "line", "idx" : 6, "captures" : [ ] } }
{ "_id" : 3, "description" : "Many spaces before     line", "returnObject" : { "match" : "line", "idx" : 23, "captures" : [ ] } }
{ "_id" : 4, "description" : "Multiple\nline descriptions", "returnObject" : { "match" : "line", "idx" : 9, "captures" : [ ] } }
{ "_id" : 5, "description" : "anchors, links and hyperlinks", "returnObject" : null }
{ "_id" : 6, "description" : "métier work vocation", "returnObject" : null }

The following regex pattern /lin(e|k)/ specifies a grouping (e|k) in the pattern:以下正则表达式模式/lin(e|k)/指定模式中的分组(e|k)

db.products.aggregate([
   { $addFields: { returnObject: { $regexFind: { input: "$description", regex: /lin(e|k)/ } } } }
])

The operation returns the following:该操作返回以下内容:

{ "_id" : 1, "description" : "Single LINE description.", "returnObject" : null }
{ "_id" : 2, "description" : "First lines\nsecond line", "returnObject" : { "match" : "line", "idx" : 6, "captures" : [ "e" ] } }
{ "_id" : 3, "description" : "Many spaces before     line", "returnObject" : { "match" : "line", "idx" : 23, "captures" : [ "e" ] } }
{ "_id" : 4, "description" : "Multiple\nline descriptions", "returnObject" : { "match" : "line", "idx" : 9, "captures" : [ "e" ] } }
{ "_id" : 5, "description" : "anchors, links and hyperlinks", "returnObject" : { "match" : "link", "idx" : 9, "captures" : [ "k" ] } }
{ "_id" : 6, "description" : "métier work vocation", "returnObject" : null }

In the return option, the idx field is the code point index and not the byte index. 在返回选项中,idx字段是代码点索引,而不是字节索引。To illustrate, consider the following example that uses the regex pattern /tier/:为了说明,请考虑下面的示例,使用正则表达式模式/tier/

db.products.aggregate([
   { $addFields: { returnObject: { $regexFind: { input: "$description", regex: /tier/ } } } }
])

The operation returns the following where only the last record matches the pattern and the returned idx is 2 (instead of 3 if using a byte index)该操作返回以下结果,其中只有最后一条记录与模式匹配,返回的idx2(如果使用字节索引,则不是3)

{ "_id" : 1, "description" : "Single LINE description.", "returnObject" : null }
{ "_id" : 2, "description" : "First lines\nsecond line", "returnObject" : null }
{ "_id" : 3, "description" : "Many spaces before     line", "returnObject" : null }
{ "_id" : 4, "description" : "Multiple\nline descriptions", "returnObject" : null }
{ "_id" : 5, "description" : "anchors, links and hyperlinks", "returnObject" : null }
{ "_id" : 6, "description" : "métier work vocation",
             "returnObject" : { "match" : "tier", "idx" : 2, "captures" : [ ] } }

i Option选项

Note

You cannot specify options in both the regex and the options field.不能同时在regexoptions字段中指定选项。

To perform case-insensitive pattern matching, include the i option as part of the regex field or in the options field:要执行不区分大小写的模式匹配,请在regex字段或options字段中包含i选项:

// Specify i as part of the regex field
{ $regexFind: { input: "$description", regex: /line/i } }

// Specify i in the options field
{ $regexFind: { input: "$description", regex: /line/, options: "i" } }
{ $regexFind: { input: "$description", regex: "line", options: "i" } }

For example, the following aggregation performs a case-insensitive $regexFind on the description field. 例如,以下聚合在description字段上执行不区分大小写的$regexFindThe regex pattern /line/ does not specify any grouping:正则表达式模式/line/未指定任何分组:

db.products.aggregate([
   { $addFields: { returnObject: { $regexFind: { input: "$description", regex: /line/i } } } }
])

The operation returns the following documents:该操作将返回以下文档:

{ "_id" : 1, "description" : "Single LINE description.", "returnObject" : { "match" : "LINE", "idx" : 7, "captures" : [ ] } }
{ "_id" : 2, "description" : "First lines\nsecond line", "returnObject" : { "match" : "line", "idx" : 6, "captures" : [ ] } }
{ "_id" : 3, "description" : "Many spaces before     line", "returnObject" : { "match" : "line", "idx" : 23, "captures" : [ ] } }
{ "_id" : 4, "description" : "Multiple\nline descriptions", "returnObject" : { "match" : "line", "idx" : 9, "captures" : [ ] } }
{ "_id" : 5, "description" : "anchors, links and hyperlinks", "returnObject" : null }
{ "_id" : 6, "description" : "métier work vocation", "returnObject" : null }

m Option

Note

You cannot specify options in both the regex and the options field.不能同时在regexoptions字段中指定选项。

To match the specified anchors (e.g. ^, $) for each line of a multiline string, include the m option as part of the regex field or in the options field:要为多行字符串的每一行匹配指定的定位点(例如^$),请将m选项作为正则表达式字段或选项字段的一部分包括在内:

// Specify m as part of the regex field
{ $regexFind: { input: "$description", regex: /line/m } }

// Specify m in the options field
{ $regexFind: { input: "$description", regex: /line/, options: "m" } }
{ $regexFind: { input: "$description", regex: "line", options: "m" } }

The following example includes both the i and the m options to match lines starting with either the letter s or S for multiline strings:以下示例包括im选项,用于匹配多行字符串中以字母sS开头的行:

db.products.aggregate([
   { $addFields: { returnObject: { $regexFind: { input: "$description", regex: /^s/im } } } }
])

The operation returns the following:该操作返回以下内容:

{ "_id" : 1, "description" : "Single LINE description.", "returnObject" : { "match" : "S", "idx" : 0, "captures" : [ ] } }
{ "_id" : 2, "description" : "First lines\nsecond line", "returnObject" : { "match" : "s", "idx" : 12, "captures" : [ ] } }
{ "_id" : 3, "description" : "Many spaces before     line", "returnObject" : null }
{ "_id" : 4, "description" : "Multiple\nline descriptions", "returnObject" : null }
{ "_id" : 5, "description" : "anchors, links and hyperlinks", "returnObject" : null }
{ "_id" : 6, "description" : "métier work vocation", "returnObject" : null }

x Option选项

Note

You cannot specify options in both the regex and the options field.不能同时在regexoptions字段中指定选项。

To ignore all unescaped white space characters and comments (denoted by the un-escaped hash # character and the next new-line character) in the pattern, include the s option in the options field:要忽略模式中所有未转义的空白字符和注释(由未转义的哈希#字符和下一个新行字符表示),请在options字段中包含s选项:

// Specify x in the options field
{ $regexFind: { input: "$description", regex: /line/, options: "x" } }
{ $regexFind: { input: "$description", regex: "line", options: "x" } }

The following example includes the x option to skip unescaped white spaces and comments:下面的示例包括用于跳过未加修饰的空白和注释的x选项:

db.products.aggregate([
   { $addFields: { returnObject: { $regexFind: { input: "$description", regex: /lin(e|k) # matches line or link/, options:"x" } } } }
])

The operation returns the following:该操作返回以下内容:

{ "_id" : 1, "description" : "Single LINE description.", "returnObject" : null }
{ "_id" : 2, "description" : "First lines\nsecond line", "returnObject" : { "match" : "line", "idx" : 6, "captures" : [ "e" ] } }
{ "_id" : 3, "description" : "Many spaces before     line", "returnObject" : { "match" : "line", "idx" : 23, "captures" : [ "e" ] } }
{ "_id" : 4, "description" : "Multiple\nline descriptions", "returnObject" : { "match" : "line", "idx" : 9, "captures" : [ "e" ] } }
{ "_id" : 5, "description" : "anchors, links and hyperlinks", "returnObject" : { "match" : "link", "idx" : 9, "captures" : [ "k" ] } }
{ "_id" : 6, "description" : "métier work vocation", "returnObject" : null }

s Option选项

Note

You cannot specify options in both the regex and the options field.不能同时在regexoptions字段中指定选项。

To allow the dot character (i.e. .) in the pattern to match all characters including the new line character, include the s option in the options field:允许点字符(即.)在要匹配包括新行字符在内的所有字符的模式中,在options字段中包括s选项:

// Specify s in the options field
{ $regexFind: { input: "$description", regex: /m.*line/, options: "s" } }
{ $regexFind: { input: "$description", regex: "m.*line", options: "s" } }

The following example includes the s option to allow the dot character (i.e. .) to match all characters including new line as well as the i option to perform a case-insensitive match:以下示例包括允许点字符的s选项(即.)要匹配包括新行在内的所有字符,以及执行不区分大小写匹配的i选项,请执行以下操作:

db.products.aggregate([
   { $addFields: { returnObject: { $regexFind: { input: "$description", regex:/m.*line/, options: "si"  } } } }
])

The operation returns the following:该操作返回以下内容:

{ "_id" : 1, "description" : "Single LINE description.", "returnObject" : null }
{ "_id" : 2, "description" : "First lines\nsecond line", "returnObject" : null }
{ "_id" : 3, "description" : "Many spaces before     line", "returnObject" : { "match" : "Many spaces before     line", "idx" : 0, "captures" : [ ] } }
{ "_id" : 4, "description" : "Multiple\nline descriptions", "returnObject" : { "match" : "Multiple\nline", "idx" : 0, "captures" : [ ] } }
{ "_id" : 5, "description" : "anchors, links and hyperlinks", "returnObject" : null }
{ "_id" : 6, "description" : "métier work vocation", "returnObject" : null }

Use $regexFind to Parse Email from String使用$regexFind从字符串解析电子邮件

Create a sample collection feedback with the following documents:使用以下文档创建样本集合feedback

db.feedback.insertMany([
   { "_id" : 1, comment: "Hi, I'm just reading about MongoDB -- aunt.arc.tica@example.com"  },
   { "_id" : 2, comment: "I wanted to concatenate a string" },
   { "_id" : 3, comment: "How do I convert a date to string? cam@mongodb.com" },
   { "_id" : 4, comment: "It's just me. I'm testing.  fred@MongoDB.com" }
])

The following aggregation uses the $regexFind to extract the email from the comment field (case insensitive).以下聚合使用$regexFindcomment字段中提取电子邮件(不区分大小写)。

db.feedback.aggregate( [
    { $addFields: {
       "email": { $regexFind: { input: "$comment", regex: /[a-z0-9_.+-]+@[a-z0-9_.+-]+\.[a-z0-9_.+-]+/i } }
    } },
    { $set: { email: "$email.match"} }
] )
First Stage第一阶段

The stage uses the $addFields stage to add a new field email to the document. 该阶段使用$addFields阶段向文档添加新的字段电子邮件。The new field contains the result of performing the $regexFind on the comment field:新字段包含对comment字段执行$regexFind的结果:

{ "_id" : 1, "comment" : "Hi, I'm just reading about MongoDB -- aunt.arc.tica@example.com", "email" : { "match" : "aunt.arc.tica@example.com", "idx" : 38, "captures" : [ ] } }
{ "_id" : 2, "comment" : "I wanted to concatenate a string", "email" : null }
{ "_id" : 3, "comment" : "I can't find how to convert a date to string. cam@mongodb.com", "email" : { "match" : "cam@mongodb.com", "idx" : 46, "captures" : [ ] } }
{ "_id" : 4, "comment" : "It's just me. I'm testing.  fred@MongoDB.com", "email" : { "match" : "fred@MongoDB.com", "idx" : 28, "captures" : [ ] } }
Second Stage第二阶段

The stage use the $set stage to reset the email to the current "$email.match" value. 该阶段使用$set阶段将电子邮件重置为当前的"$email.match"值。If the current value of email is null, the new value of email is set to null.如果电子邮件的当前值为null,则email的新值将设置为null

{ "_id" : 1, "comment" : "Hi, I'm just reading about MongoDB -- aunt.arc.tica@example.com", "email" : "aunt.arc.tica@example.com" }
{ "_id" : 2, "comment" : "I wanted to concatenate a string" }
{ "_id" : 3, "comment" : "I can't find how to convert a date to string. cam@mongodb.com", "email" : "cam@mongodb.com" }
{ "_id" : 4, "comment" : "It's just me. I'm testing.  fred@MongoDB.com", "email" : "fred@MongoDB.com" }

Apply $regexFind to String Elements of an Array$regexFind应用于数组的字符串元素

Create a sample collection contacts with the following documents:使用以下文档创建样本集合contacts

db.contacts.insertMany([
   { "_id" : 1, name: "Aunt Arc Tikka", details: [ "+672-19-9999", "aunt.arc.tica@example.com" ] },
   { "_id" : 2, name: "Belle Gium",  details: [ "+32-2-111-11-11", "belle.gium@example.com" ] },
   { "_id" : 3, name: "Cam Bo Dia",  details: [ "+855-012-000-0000", "cam.bo.dia@example.com" ] },
   { "_id" : 4, name: "Fred", details: [ "+1-111-222-3333" ] }
])

The following aggregation uses the $regexFind to convert the details array into an embedded document with an email and phone fields:以下聚合使用$regexFinddetails数组转换为包含emailphone字段的嵌入式文档:

db.contacts.aggregate( [
   { $unwind: "$details" },
   { $addFields: {
      "regexemail": { $regexFind: { input: "$details", regex: /^[a-z0-9_.+-]+@[a-z0-9_.+-]+\.[a-z0-9_.+-]+$/, options: "i" } },
      "regexphone": { $regexFind: { input: "$details", regex: /^[+]{0,1}[0-9]*\-?[0-9_\-]+$/ } }
   } },
   { $project: { _id: 1, name: 1, details: { email: "$regexemail.match", phone: "$regexphone.match" } } },
   { $group: { _id: "$_id", name: { $first: "$name" }, details: { $mergeObjects: "$details"} } },
   { $sort: { _id: 1 } }
])
First Stage第一阶段

The stage $unwinds the array into separate documents:$unwinds阶段将数组展开为单独的文档:

{ "_id" : 1, "name" : "Aunt Arc Tikka", "details" : "+672-19-9999" }
{ "_id" : 1, "name" : "Aunt Arc Tikka", "details" : "aunt.arc.tica@example.com" }
{ "_id" : 2, "name" : "Belle Gium", "details" : "+32-2-111-11-11" }
{ "_id" : 2, "name" : "Belle Gium", "details" : "belle.gium@example.com" }
{ "_id" : 3, "name" : "Cam Bo Dia", "details" : "+855-012-000-0000" }
{ "_id" : 3, "name" : "Cam Bo Dia", "details" : "cam.bo.dia@example.com" }
{ "_id" : 4, "name" : "Fred", "details" : "+1-111-222-3333" }
Second Stage第二阶段

The stage uses the $addFields stage to add new fields to the document that contains the result of the $regexFind for phone number and email:该阶段使用$addFields阶段向包含电话号码和电子邮件$regexFind结果的文档中添加新字段:

{ "_id" : 1, "name" : "Aunt Arc Tikka", "details" : "+672-19-9999", "regexemail" : null, "regexphone" : { "match" : "+672-19-9999", "idx" : 0, "captures" : [ ] } }
{ "_id" : 1, "name" : "Aunt Arc Tikka", "details" : "aunt.arc.tica@example.com", "regexemail" : { "match" : "aunt.arc.tica@example.com", "idx" : 0, "captures" : [ ] }, "regexphone" : null }
{ "_id" : 2, "name" : "Belle Gium", "details" : "+32-2-111-11-11", "regexemail" : null, "regexphone" : { "match" : "+32-2-111-11-11", "idx" : 0, "captures" : [ ] } }
{ "_id" : 2, "name" : "Belle Gium", "details" : "belle.gium@example.com", "regexemail" : { "match" : "belle.gium@example.com", "idx" : 0, "captures" : [ ] }, "regexphone" : null }
{ "_id" : 3, "name" : "Cam Bo Dia", "details" : "+855-012-000-0000", "regexemail" : null, "regexphone" : { "match" : "+855-012-000-0000", "idx" : 0, "captures" : [ ] } }
{ "_id" : 3, "name" : "Cam Bo Dia", "details" : "cam.bo.dia@example.com", "regexemail" : { "match" : "cam.bo.dia@example.com", "idx" : 0, "captures" : [ ] }, "regexphone" : null }
{ "_id" : 4, "name" : "Fred", "details" : "+1-111-222-3333", "regexemail" : null, "regexphone" : { "match" : "+1-111-222-3333", "idx" : 0, "captures" : [ ] } }
Third Stage第三阶段

The stage use the $project stage to output documents with the _id field, the name field and the details field. 该阶段使用$project阶段输出带有_id字段、name字段和details字段的文档。The details field is set to a document with email and phone fields, whose values are determined from the regexemail and regexphone fields, respectively.details字段设置为包含emailphone字段的文档,其值分别由regexemailregexphone字段确定。

{ "_id" : 1, "name" : "Aunt Arc Tikka", "details" : { "phone" : "+672-19-9999" } }
{ "_id" : 1, "name" : "Aunt Arc Tikka", "details" : { "email" : "aunt.arc.tica@example.com" } }
{ "_id" : 2, "name" : "Belle Gium", "details" : { "phone" : "+32-2-111-11-11" } }
{ "_id" : 2, "name" : "Belle Gium", "details" : { "email" : "belle.gium@example.com" } }
{ "_id" : 3, "name" : "Cam Bo Dia", "details" : { "phone" : "+855-012-000-0000" } }
{ "_id" : 3, "name" : "Cam Bo Dia", "details" : { "email" : "cam.bo.dia@example.com" } }
{ "_id" : 4, "name" : "Fred", "details" : { "phone" : "+1-111-222-3333" } }
Fourth Stage第四阶段

The stage uses the $group stage to groups the input documents by their _id value. 该阶段使用$group阶段按输入文档的_id值对其进行分组。The stage uses the $mergeObjects expression to merge the details documents.该阶段使用$mergeObjects表达式合并details文档。

{ "_id" : 3, "name" : "Cam Bo Dia", "details" : { "phone" : "+855-012-000-0000", "email" : "cam.bo.dia@example.com" } }
{ "_id" : 4, "name" : "Fred", "details" : { "phone" : "+1-111-222-3333" } }
{ "_id" : 1, "name" : "Aunt Arc Tikka", "details" : { "phone" : "+672-19-9999", "email" : "aunt.arc.tica@example.com" } }
{ "_id" : 2, "name" : "Belle Gium", "details" : { "phone" : "+32-2-111-11-11", "email" : "belle.gium@example.com" } }
Fifth Stage第五阶段

The stage uses the $sort stage to sort the documents by the _id field.stage使用$sort阶段按_id字段对文档进行排序。

{ "_id" : 1, "name" : "Aunt Arc Tikka", "details" : { "phone" : "+672-19-9999", "email" : "aunt.arc.tica@example.com" } }
{ "_id" : 2, "name" : "Belle Gium", "details" : { "phone" : "+32-2-111-11-11", "email" : "belle.gium@example.com" } }
{ "_id" : 3, "name" : "Cam Bo Dia", "details" : { "phone" : "+855-012-000-0000", "email" : "cam.bo.dia@example.com" } }
{ "_id" : 4, "name" : "Fred", "details" : { "phone" : "+1-111-222-3333" } }

Use Captured Groupings to Parse User Name使用捕获的分组来解析用户名

Create a sample collection employees with the following documents:使用以下文档创建employees集合:

db.employees.insertMany([
   { "_id" : 1, name: "Aunt Arc Tikka", "email" : "aunt.tica@example.com" },
   { "_id" : 2, name: "Belle Gium", "email" : "belle.gium@example.com" },
   { "_id" : 3, name: "Cam Bo Dia", "email" : "cam.dia@example.com" },
   { "_id" : 4, name: "Fred"  }
])

The employee email has the format <firstname>.<lastname>@example.com. 员工电子邮件的格式为<firstname>.<lastname>@example.comUsing the captured field returned in the $regexFind results, you can parse out user names for employees.使用$regexFind结果中返回的captured字段,可以解析出员工的用户名。

db.employees.aggregate( [
    { $addFields: {
       "username": { $regexFind: { input: "$email", regex: /^([a-z0-9_.+-]+)@[a-z0-9_.+-]+\.[a-z0-9_.+-]+$/, options: "i" } },
    } },
    { $set: { username: { $arrayElemAt:  [ "$username.captures", 0 ] } } }
] )
First Stage第一阶段

The stage uses the $addFields stage to add a new field username to the document. 该阶段使用$addFields阶段向文档添加新的字段用户名。The new field contains the result of performing the $regexFind on the email field:新字段包含对email字段执行$regexFind的结果:

{ "_id" : 1, "name" : "Aunt Arc Tikka", "email" : "aunt.tica@example.com", "username" : { "match" : "aunt.tica@example.com", "idx" : 0, "captures" : [ "aunt.tica" ] } }
{ "_id" : 2, "name" : "Belle Gium", "email" : "belle.gium@example.com", "username" : { "match" : "belle.gium@example.com", "idx" : 0, "captures" : [ "belle.gium" ] } }
{ "_id" : 3, "name" : "Cam Bo Dia", "email" : "cam.dia@example.com", "username" : { "match" : "cam.dia@example.com", "idx" : 0, "captures" : [ "cam.dia" ] } }
{ "_id" : 4, "name" : "Fred", "username" : null }
Second Stage第二阶段

The stage use the $set stage to reset the username to the zero-th element of the "$username.captures" array. 该阶段使用$set阶段将用户名重置为"$username.captures"数组的第0个元素。If the current value of username is null, the new value of username is set to null.如果username的当前值为null,则username的新值将设置为null

{ "_id" : 1, "name" : "Aunt Arc Tikka", "email" : "aunt.tica@example.com", "username" : "aunt.tica" }
{ "_id" : 2, "name" : "Belle Gium", "email" : "belle.gium@example.com", "username" : "belle.gium" }
{ "_id" : 3, "name" : "Cam Bo Dia", "email" : "cam.dia@example.com", "username" : "cam.dia" }
{ "_id" : 4, "name" : "Fred", "username" : null }

See also参阅

For more information on the behavior of the captures array and additional examples, see captures Output Behavior.有关captures数组行为和其他示例的更多信息,请参阅捕获输出行为