Text Indexes文本索引

On this page本页内容

MongoDB Atlas Search

Atlas Search makes it easy to build fast, relevance-based search capabilities on top of your MongoDB data. 可以轻松地在MongoDB数据的基础上构建快速、基于相关性的搜索功能。Try it today on MongoDB Atlas, our fully managed database as a service.今天就在MongoDB Atlas上试试吧,这是我们全面管理的数据库即服务。

Overview概述

MongoDB provides text indexes to support text search queries on string content. MongoDB提供文本索引以支持对字符串内容的文本搜索查询。text indexes can include any field whose value is a string or an array of string elements.text索引可以包括值为字符串或字符串元素数组的任何字段。

Versions

text Index VersionDescription描述
Version 3 MongoDB introduces a version 3 of the text index. MongoDB引入了text索引的第3版。Version 3 is the default version of text indexes created in MongoDB 3.2 and later.版本3是MongoDB 3.2及更高版本中创建的text索引的默认版本。
Version 2 MongoDB 2.6 introduces a version 2 of the text index. MongoDB 2.6引入了text索引的第2版。Version 2 is the default version of text indexes created in MongoDB 2.6 and 3.0 series.版本2是MongoDB 2.6和3.0系列中创建的text索引的默认版本。
Version 1 MongoDB 2.4 introduces a version 1 of the text index. MongoDB 2.4 can only support version 1.MongoDB 2.4引入了text索引的版本1。MongoDB 2.4只能支持版本1

To override the default version and specify a different version, include the option { "textIndexVersion": <version> } when creating the index.要覆盖默认版本并指定其他版本,请在创建索引时包含选项{ "textIndexVersion": <version> }

Create Text Index创建文本索引

Important

A collection can have at most one text index.一个集合最多只能有一个text索引。

To create a text index, use the db.collection.createIndex() method. 要创建text索引,请使用db.collection.createIndex()方法。To index a field that contains a string or an array of string elements, include the field and specify the string literal "text" in the index document, as in the following example:要为包含字符串或字符串元素数组的字段编制索引,请在索引文档中包含该字段并指定字符串文字"text",如下例所示:

db.reviews.createIndex( { comments: "text" } )

You can index multiple fields for the text index. 可以为text索引的多个字段编制索引。The following example creates a text index on the fields subject and comments:以下示例创建了subjectcommands字段的text索引:

db.reviews.createIndex(
   {
     subject: "text",
     comments: "text"
   }
 )

A compound index can include text index keys in combination with ascending/descending index keys. 复合索引可以包括text索引键和升序/降序索引键。For more information, see Compound Index.有关更多信息,请参阅复合索引

In order to drop a text index, use the index name. 要删除text索引,请使用索引名称。See Use the Index Name to Drop a text Index for more information.有关更多信息,请参阅使用索引名删除文本索引

Specify Weights指定权重

For a text index, the weight of an indexed field denotes the significance of the field relative to the other indexed fields in terms of the text search score.对于text索引,索引字段的权重表示该字段相对于其他索引字段在文本搜索分数方面的重要性。

For each indexed field in the document, MongoDB multiplies the number of matches by the weight and sums the results. 对于文档中的每个索引字段,MongoDB将匹配数乘以权重,并对结果求和。Using this sum, MongoDB then calculates the score for the document. 然后,使用这个总和,MongoDB计算文档的分数。See $meta operator for details on returning and sorting by text scores.有关按文本分数返回和排序的详细信息,请参阅$meta运算符。

The default weight is 1 for the indexed fields. 索引字段的默认权重为1。To adjust the weights for the indexed fields, include the weights option in the db.collection.createIndex() method.要调整索引字段的权重,请在db.collection.createIndex()方法中包含weights选项。

For more information using weights to control the results of a text search, see Control Search Results with Weights.有关使用权重控制文本搜索结果的详细信息,请参阅<使用权重控制搜索结果

Wildcard Text Indexes通配符文本索引

Note

Wildcard Text Indexes are distinct from Wildcard Indexes. 通配符文本索引不同于通配符索引Wildcard indexes cannot support queries using the $text operator.通配符索引无法支持使用$text运算符的查询。

While Wildcard Text Indexes and Wildcard Indexes share the wildcard $** field pattern, they are distinct index types. 虽然通配符文本索引和通配符索引共享通配符$**字段模式,但它们是不同的索引类型。Only Wildcard Text Indexes support the $text operator.只有通配符文本索引支持$text运算符。

When creating a text index on multiple fields, you can also use the wildcard specifier ($**). With a wildcard text index, MongoDB indexes every field that contains string data for each document in the collection. 在多个字段上创建文本索引时,还可以使用通配符说明符($**)。使用通配符文本索引,MongoDB为集合中每个文档包含字符串数据的每个字段编制索引。The following example creates a text index using the wildcard specifier:以下示例使用通配符说明符创建文本索引:

db.collection.createIndex( { "$**": "text" } )

This index allows for text search on all fields with string content. 此索引允许对所有包含字符串内容的字段进行文本搜索。Such an index can be useful with highly unstructured data if it is unclear which fields to include in the text index or for ad-hoc querying.如果不清楚要在文本索引中包含哪些字段或用于特殊查询,那么这种索引对于高度非结构化的数据非常有用。

Wildcard text indexes are text indexes on multiple fields. 通配符文本索引是多个字段上的文本索引。As such, you can assign weights to specific fields during index creation to control the ranking of the results. 因此,可以在创建索引期间为特定字段指定权重,以控制结果的排名。For more information using weights to control the results of a text search, see Control Search Results with Weights.有关使用权重控制文本搜索结果的详细信息,请参阅使用权重控制搜索结果

Wildcard text indexes, as with all text indexes, can be part of a compound indexes. 与所有文本索引一样,通配符文本索引可以是复合索引的一部分。For example, the following creates a compound index on the field a as well as the wildcard specifier:例如,以下内容在字段a和通配符说明符上创建复合索引:

db.collection.createIndex( { a: 1, "$**": "text" } )

As with all compound text indexes, since the a precedes the text index key, in order to perform a $text search with this index, the query predicate must include an equality match conditions a. 与所有复合文本索引一样,由于a位于文本索引键之前,为了使用该索引执行$text搜索,查询谓词必须包含相等匹配条件aFor information on compound text indexes, see Compound Text Indexes.有关复合文本索引的信息,请参阅复合文本索引

Case Insensitivity大小写不敏感

Changed in version 3.2.在版本3.2中更改。

The version 3 text index supports the common C, simple S, and for Turkish languages, the special T case foldings as specified in Unicode 8.0 Character Database Case Folding.

The case foldings expands the case insensitivity of the text index to include characters with diacritics, such as é and É, and characters from non-Latin alphabets, such as “И” and “и” in the Cyrillic alphabet.

Version 3 of the text index is also diacritic insensitive. As such, the index also does not distinguish between é, É, e, and E.

Previous versions of the text index are case insensitive for [A-z] only; i.e. case insensitive for non-diacritics Latin characters only . For all other characters, earlier versions of the text index treat them as distinct.对于所有其他字符,早期版本的文本索引将它们视为不同的字符。

Diacritic Insensitivity变音不敏感

Changed in version 3.2.在3.2版中进行了更改。

With version 3, text index is diacritic insensitive. 对于版本3,text索引不区分重音。That is, the index does not distinguish between characters that contain diacritical marks and their non-marked counterpart, such as é, ê, and e. More specifically, the text index strips the characters categorized as diacritics in Unicode 8.0 Character Database Prop List.

Version 3 of the text index is also case insensitive to characters with diacritics. As such, the index also does not distinguish between é, É, e, and E.

Previous versions of the text index treat characters with diacritics as distinct.以前版本的text索引将带变音符号的字符视为不同的字符。

Tokenization Delimiters标记化分隔符

Changed in version 3.2.在3.2版中进行了更改。

For tokenization, version 3 text index uses the delimiters categorized under Dash, Hyphen, Pattern_Syntax, Quotation_Mark, Terminal_Punctuation, and White_Space in Unicode 8.0 Character Database Prop List.

For example, if given a string "Il a dit qu'il «était le meilleur joueur du monde»", the text index treats «, », and spaces as delimiters.

Previous versions of the index treat « as part of the term "«était" and » as part of the term "monde»".

Index Entries索引项

text index tokenizes and stems the terms in the indexed fields for the index entries. text index stores one index entry for each unique stemmed term in each indexed field for each document in the collection. The index uses simple language-specific suffix stemming.

Supported Languages and Stop Words支持的语言和停止词

MongoDB supports text search for various languages. MongoDB支持各种语言的文本搜索。text indexes drop language-specific stop words (e.g. in English, the, an, a, and, etc.) and use simple language-specific suffix stemming. For a list of the supported languages, see Text Search Languages.

If you specify a language value of "none", then the text index uses simple tokenization with no list of stop words and no stemming.

To specify a language for the text index, see Specify a Language for Text Index.

sparse Property属性

text indexes are always sparse and ignore the sparse option. text索引总是稀疏的,忽略sparse选项。If a document lacks a text index field (or the field is null or an empty array), MongoDB does not add an entry for the document to the text index. 如果文档缺少text索引字段(或字段为null或空数组),MongoDB不会将文档条目添加到文本索引中。For inserts, MongoDB inserts the document but does not add to the text index.对于插入,MongoDB会插入文档,但不会添加到text索引中。

For a compound index that includes a text index key along with keys of other types, only the text index field determines whether the index references a document. 对于包含text索引键和其他类型键的复合索引,只有text索引字段确定索引是否引用文档。The other keys do not determine whether the index references the documents or not.其他键不确定索引是否引用文档。

Restrictions限制

One Text Index Per Collection每个集合一个文本索引

A collection can have at most one text index.一个集合最多只能有一个text索引。

Text Search and Hints文本搜索和提示

You cannot use hint() if the query includes a $text query expression.如果查询包含$text查询表达式,则不能使用hint()

Text Index and Sort文本索引和排序

Sort operations cannot obtain sort order from a text index, even from a compound text index; i.e. sort operations cannot use the ordering in the text index.排序操作无法从text索引中获得排序顺序,甚至无法从复合文本索引中获得排序顺序;亦即,排序操作不能使用文本索引中的排序。

Compound Index复合指数

A compound index can include a text index key in combination with ascending/descending index keys. 复合索引可以包括text索引键和升序/降序索引键。However, these compound indexes have the following restrictions:但是,这些复合索引有以下限制:

  • A compound text index cannot include any other special index types, such as multi-key or geospatial index fields.
  • If the compound text index includes keys preceding the text index key, to perform a $text search, the query predicate must include equality match conditions on the preceding keys.
  • When creating a compound text index, all text index keys must be listed adjacently in the index specification document.创建复合text索引时,索引规范文档中必须相邻列出所有text索引键。

See also Text Index and Sort for additional limitations.有关其他限制,请参阅文本索引和排序

For an example of a compound text index, see Limit the Number of Entries Scanned.有关复合文本索引的示例,请参阅限制扫描的条目数

Drop a Text Index删除文本索引

To drop a text index, pass the name of the index to the db.collection.dropIndex() method. To get the name of the index, run the db.collection.getIndexes() method.

For information on the default naming scheme for text indexes as well as overriding the default name, see Specify Name for text Index.有关文本索引的默认命名方案以及覆盖默认名称的信息,请参阅为文本索引指定名称

Collation Option排序选项

text indexes only support simple binary comparison and do not support collation.text索引只支持简单的二进制比较,不支持排序规则

To create a text index on a a collection that has a non-simple collation, you must explicitly specify {collation: {locale: "simple"} } when creating the index.要在具有非简单排序规则的集合上创建text索引,必须在创建索引时显式指定{collation: {locale: "simple"}}

Storage Requirements and Performance Costs存储要求和性能成本

text indexes have the following storage requirements and performance costs:text索引具有以下存储要求和性能成本:

Text Search Support文本搜索支持

The text index supports $text query operations. For examples of text search, see the $text reference page. For examples of $text operations in aggregation pipelines, see Text Search in the Aggregation Pipeline.