Collation排序规则

On this page本页内容

New in version 3.4.版本3.4中的新功能。

Collation allows users to specify language-specific rules for string comparison, such as rules for lettercase and accent marks.排序规则允许用户为字符串比较指定特定于语言的规则,例如字母大小写和重音符号的规则。

You can specify collation for a collection or a view, an index, or specific operations that support collation.可以为集合或视图、索引或支持排序的特定操作指定排序规则。

Collation Document排序规则文档

A collation document has the following fields:排序规则文档具有以下字段:

{
   locale: <string>,
   caseLevel: <boolean>,
   caseFirst: <string>,
   strength: <int>,
   numericOrdering: <boolean>,
   alternate: <string>,
   maxVariable: <string>,
   backwards: <boolean>
}

When specifying collation, the locale field is mandatory; all other collation fields are optional. 指定排序规则时,locale字段是必需的;所有其他排序规则字段都是可选的。For descriptions of the fields, see Collation Document.有关这些字段的描述,请参阅排序规则文档

Default collation parameter values vary depending on which locale you specify. 默认排序规则参数值因指定的区域设置而异。For a complete list of default collation parameters and the locales they are associated with, see Collation Default Parameters.有关默认排序规则参数及其关联的地区的完整列表,请参阅排序规则默认参数

Field字段Type类型Description描述
locale string

The ICU locale. ICU区域设置。See Supported Languages and Locales for a list of supported locales.有关支持的语言环境列表,请参阅支持的语言和语言环境

To specify simple binary comparison, specify locale value of "simple".要指定简单的二进制比较,请指定locale值为"simple"

strength integer

Optional.可选。The level of comparison to perform. 要执行的比较级别。Corresponds to ICU Comparison Levels. 对应于ICU比较级别Possible values are:可能的值包括:

ValueDescription描述
1 Primary level of comparison. Collation performs comparisons of the base characters only, ignoring other differences such as diacritics and case.初级比较。排序规则只对基本字符进行比较,而忽略其他差异,如变音符号和大小写。
2 Secondary level of comparison. 二级比较。Collation performs comparisons up to secondary differences, such as diacritics. 排序规则将执行到次要差异(如变音符号)的比较。That is, collation performs comparisons of base characters (primary differences) and diacritics (secondary differences). 也就是说,排序规则执行基本字符(主要差异)和变音符号(次要差异)的比较。Differences between base characters takes precedence over secondary differences.基本字符之间的差异优先于次要差异。
3

Tertiary level of comparison. Collation performs comparisons up to tertiary differences, such as case and letter variants. 三级比较。排序规则执行最多三级差异的比较,例如大小写和字母变体。That is, collation performs comparisons of base characters (primary differences), diacritics (secondary differences), and case and variants (tertiary differences). 也就是说,排序规则执行基本字符(主要差异)、变音符号(次要差异)以及大小写和变体(第三差异)的比较。Differences between base characters takes precedence over secondary differences, which takes precedence over tertiary differences.基本字符之间的差异优先于次要差异,次要差异优先于三级差异。

This is the default level.这是默认级别。

4 Quaternary Level. Limited for specific use case to consider punctuation when levels 1-3 ignore punctuation or for processing Japanese text.四级。限于特定用例,当水平1-3忽略标点符号或处理日语文本时考虑标点符号。
5 Identical Level. 相同级别。Limited for specific use case of tie breaker.仅限于连接断路器的特定使用情况。

See ICU Collation: Comparison Levels for details.有关详细信息,请参阅ICU排序:比较级别

caseLevel boolean

Optional.可选。Flag that determines whether to include case comparison at strength level 1 or 2.确定是否在strength级别12包含案例比较的标志。

If true, include case comparison; i.e.如果为true,包括大小写比较;即

  • When used with strength:1, collation compares base characters and case.当与strength:1一起使用时,排序规则会比较基本字符和大小写。
  • When used with strength:2, collation compares base characters, diacritics (and possible other secondary differences) and case.当与strength:2一起使用时,排序规则会比较基本字符、变音符号(以及其他可能的次要差异)和大小写。

If false, do not include case comparison at level 1 or 2. 如果为false,则不包括1级或2级的案例比较。The default is false.默认值为false

For more information, see ICU Collation: Case Level.有关更多信息,请参阅ICU排序:大小写级别

caseFirst string

Optional.可选。A field that determines sort order of case differences during tertiary level comparisons.在三级比较期间确定大小写差异排序顺序的字段。

Possible values are:可能的值包括:

ValueDescription描述
“upper” Uppercase sorts before lowercase.大写在小写之前排序。
“lower” Lowercase sorts before uppercase.小写在大写之前排序。
“off” Default value. 默认值。Similar to "lower" with slight differences. "lower"类似,但略有不同。See http://userguide.icu-project.org/collation/customization for details of differences.有关差异的详细信息请参阅http://userguide.icu-project.org/collation/customization
numericOrdering boolean

Optional.可选。Flag that determines whether to compare numeric strings as numbers or as strings.确定是将数字字符串作为数字还是字符串进行比较的标志。

If true, compare as numbers; i.e. "10" is greater than "2".

If false, compare as strings; i.e. "10" is less than "2".

Default is false.

alternate string

Optional.可选。Field that determines whether collation should consider whitespace and punctuation as base characters for purposes of comparison.字段,用于确定排序是否应该考虑空白和标点作为基本字符以便进行比较。

Possible values are:可能的值包括:

ValueDescription描述
"non-ignorable" Whitespace and punctuation are considered base characters.空格和标点符号被认为是基本字符。
"shifted" Whitespace and punctuation are not considered base characters and are only distinguished at strength levels greater than 3.空格和标点符号不被视为基本字符,只能在大于3的强度级别进行区分。

See ICU Collation: Comparison Levels for more information.更多信息请参见ICU排序:比较级别

Default is "non-ignorable".

maxVariable string

Optional.可选。Field that determines up to which characters are considered ignorable when alternate: "shifted". 字段,用于确定在替换时最多可忽略哪些字符:alternate: "shifted"Has no effect if alternate: "non-ignorable"如果选择“不可忽略”,则无效

Possible values are:可能的值包括:

ValueDescription描述
"punct" Both whitespaces and punctuation are “ignorable”, i.e. not considered base characters.空格和标点符号都是“可忽略的”,即不被视为基本字符。
"space" Whitespace are “ignorable”, i.e. not considered base characters.空白是“可忽略的”,即不被视为基本字符。
backwards boolean

Optional.可选。Flag that determines whether strings with diacritics sort from back of the string, such as with some French dictionary ordering.标志,用于确定带变音符号的字符串是否从字符串的后面排序,例如使用某些法语词典排序。

If true, compare from back to front.如果为true,则从后到前进行比较。

If false, compare from front to back.如果为false,则从前面到后面进行比较。

The default value is false.默认值为false

normalization boolean

Optional.可选。Flag that determines whether to check if text require normalization and to perform normalization. 确定是否检查文本是否需要标准化以及是否执行标准化的标志。Generally, majority of text does not require this normalization processing.通常,大多数文本不需要这种规范化处理。

If true, check if fully normalized and perform normalization to compare text.如果为true,请检查是否完全规范化,并执行规范化以比较文本。

If false, does not check.如果为false,则不进行检查。

The default value is false.默认值为false

See http://userguide.icu-project.org/collation/concepts#TOC-Normalization for details.

Operations that Support Collation支持排序的操作

You can specify collation for the following operations:可以为以下操作指定排序规则:

Note

You cannot specify multiple collations for an operation. 不能为一个操作指定多个排序规则。For example, you cannot specify different collations per field, or if performing a find with a sort, you cannot use one collation for the find and another for the sort.例如,不能为每个字段指定不同的排序规则,或者如果使用排序执行查找,则不能对查找使用一种排序规则,对排序使用另一种排序规则。

Commands命令mongo Shell Methods方法
create
createIndexes [1] db.collection.createIndex() [1]
aggregate db.collection.aggregate()
distinct db.collection.distinct()
findAndModify
find cursor.collation() to specify collation for db.collection.find()cursor.collation()指定db.collection.find()的排序规则
mapReduce db.collection.mapReduce()
delete
update
shardCollection
count
  Individual update, replace, and delete operations in db.collection.bulkWrite().db.collection.bulkWrite()中的单个更新、替换和删除操作。
[1](1, 2) Some index types do not support collation. 某些索引类型不支持排序规则。See Collation and Unsupported Index Types for details.有关详细信息,请参阅排序规则和不支持的索引类型

Behavior行为

Local Variants局部变异

Some collation locales have variants, which employ special language-specific rules. 一些排序规则区域设置有变体,它们使用特定于语言的特殊规则。To specify a locale variant, use the following syntax:要指定区域设置变量,请使用以下语法:

{ "locale" : "<locale code>@collation=<variant>" }

For example, to use the unihan variant of the Chinese collation:例如,要使用中文排序规则的unihan变体:

{ "locale" : "zh@collation=unihan" }

For a complete list of all collation locales and their variants, see Collation Locales.有关所有排序规则区域设置及其变体的完整列表,请参阅排序规则区域设置

Collation and Views排序规则和视图

  • You can specify a default collation for a view at creation time. 可以在创建时为视图指定默认排序规则。If no collation is specified, the view’s default collation is the “simple” binary comparison collator. 如果未指定排序规则,则视图的默认排序规则是“简单”二进制比较排序器。That is, the view does not inherit the collection’s default collation.也就是说,视图不会继承集合的默认排序规则。
  • String comparisons on the view use the view’s default collation. 视图上的字符串比较使用视图的默认排序规则。An operation that attempts to change or override a view’s default collation will fail with an error.尝试更改或覆盖视图默认排序规则的操作将失败并出现错误。
  • If creating a view from another view, you cannot specify a collation that differs from the source view’s collation.如果从另一个视图创建视图,则不能指定与源视图排序规则不同的排序规则。
  • If performing an aggregation that involves multiple views, such as with $lookup or $graphLookup, the views must have the same collation.如果执行涉及多个视图的聚合,例如使用$lookup$graphLookup,则这些视图必须具有相同的排序规则。

Collation and Index Use排序和索引使用

To use an index for string comparisons, an operation must also specify the same collation. 要使用索引进行字符串比较,操作还必须指定相同的排序规则。That is, an index with a collation cannot support an operation that performs string comparisons on the indexed fields if the operation specifies a different collation.也就是说,如果操作指定了不同的排序规则,则具有排序规则的索引不能支持对索引字段执行字符串比较的操作。

For example, the collection myColl has an index on a string field category with the collation locale "fr".例如,集合myColl在排序规则区域设置为 "fr"的字符串字段category上有一个索引。

db.myColl.createIndex( { category: 1 }, { collation: { locale: "fr" } } )

The following query operation, which specifies the same collation as the index, can use the index:以下查询操作指定了与索引相同的排序规则,可以使用索引:

db.myColl.find( { category: "cafe" } ).collation( { locale: "fr" } )

However, the following query operation, which by default uses the “simple” binary collator, cannot use the index:但是,以下查询操作(默认情况下使用“简单”二进制排序器)不能使用索引:

db.myColl.find( { category: "cafe" } )

For a compound index where the index prefix keys are not strings, arrays, and embedded documents, an operation that specifies a different collation can still use the index to support comparisons on the index prefix keys.对于索引前缀键不是字符串、数组和嵌入文档的复合索引,指定不同排序规则的操作仍然可以使用索引来支持索引前缀键的比较。

For example, the collection myColl has a compound index on the numeric fields score and price and the string field category; the index is created with the collation locale "fr" for string comparisons:例如,集合myColl在数字字段scoreprice以及字符串字段category上有一个复合索引;索引是使用排序规则区域设置"fr"创建的,用于字符串比较:

db.myColl.createIndex(
   { score: 1, price: 1, category: 1 },
   { collation: { locale: "fr" } } )

The following operations, which use "simple" binary collation for string comparisons, can use the index:以下使用"simple"二进制排序规则进行字符串比较的操作可以使用索引:

db.myColl.find( { score: 5 } ).sort( { price: 1 } )
db.myColl.find( { score: 5, price: { $gt: NumberDecimal( "10" ) } } ).sort( { price: 1 } )

The following operation, which uses "simple" binary collation for string comparisons on the indexed category field, can use the index to fulfill only the score: 5 portion of the query:以下操作使用"simple"二进制排序规则对索引category字段进行字符串比较,可以使用索引只完成查询的score: 5部分:

db.myColl.find( { score: 5, category: "cafe" } )

Collation and Unsupported Index Types排序规则和不支持的索引类型

The following indexes only support simple binary comparison and do not support collation:以下索引仅支持简单的二进制比较,不支持排序规则

Tip

To create a text, a 2d, or a geoHaystack index on a collection that has a non-simple collation, you must explicitly specify {collation: {locale: "simple"} } when creating the index.要在具有非简单排序规则的集合上创建text2dgeoHaystack索引,必须在创建索引时显式指定{collation: {locale: "simple"} }