Specify a Language for Text Index为文本索引指定语言

On this page本页内容

This tutorial describes how to specify the default language associated with the text index and also how to create text indexes for collections that contain documents in different languages.本教程介绍如何指定与文本索引关联的默认语言,以及如何为包含不同语言文档的集合创建文本索引

Specify the Default Language for a text Index

The default language associated with the indexed data determines the rules to parse word roots (i.e. stemming) and ignore stop words. 与索引数据关联的默认语言决定了解析词根(即词干)和忽略停止词的规则。The default language for the indexed data is english.索引数据的默认语言为english

To specify a different language, use the default_language option when creating the text index. See Text Search Languages for the languages available for default_language.

The following example creates for the quotes collection a text index on the content field and sets the default_language to spanish:

db.quotes.createIndex(
   { content : "text" },
   { default_language: "spanish" }
)

Create a text Index for a Collection in Multiple Languages用多种语言为集合创建text索引

Specify the Index Language within the Document指定文档中的索引语言

If a collection contains documents or embedded documents that are in different languages, include a field named language in the documents or embedded documents and specify as its value the language for that document or embedded document.如果集合包含不同语言的文档或嵌入文档,请在文档或嵌入文档中包含名为language的字段,并将该文档或嵌入文档的语言指定为其值。

MongoDB will use the specified language for that document or embedded document when building the text index:在构建text索引时,MongoDB将使用该文档或嵌入文档的指定语言:

  • The specified language in the document overrides the default language for the text index.文档中指定的语言将替代text索引的默认语言。
  • The specified language in an embedded document override the language specified in an enclosing document or the default language for the index.嵌入文档中指定的语言将覆盖封闭文档中指定的语言或索引的默认语言。

See Text Search Languages for a list of supported languages.有关支持的语言列表,请参阅文本搜索语言

For example, a collection quotes contains multi-language documents that include the language field in the document and/or the embedded document as needed:例如,集合quotes包含多语言文档,这些文档根据需要包括文档和/或嵌入文档中的language字段:

{
   _id: 1,
   language: "portuguese",
   original: "A sorte protege os audazes.",
   translation:
     [
        {
           language: "english",
           quote: "Fortune favors the bold."
        },
        {
           language: "spanish",
           quote: "La suerte protege a los audaces."
        }
    ]
}
{
   _id: 2,
   language: "spanish",
   original: "Nada hay más surrealista que la realidad.",
   translation:
      [
        {
          language: "english",
          quote: "There is nothing more surreal than reality."
        },
        {
          language: "french",
          quote: "Il n'y a rien de plus surréaliste que la réalité."
        }
      ]
}
{
   _id: 3,
   original: "is this a dagger which I see before me.",
   translation:
   {
      language: "spanish",
      quote: "Es este un puñal que veo delante de mí."
   }
}

If you create a text index on the quote field with the default language of English.如果在quote字段上创建默认语言为英语的text索引。

db.quotes.createIndex( { original: "text", "translation.quote": "text" } )

Then, for the documents and embedded documents that contain the language field, the text index uses that language to parse word stems and other linguistic characteristics.然后,对于包含该language字段的文档和嵌入文档,text索引使用该语言解析词干和其他语言特征。

For embedded documents that do not contain the language field,对于不包含language字段的嵌入式文档,

  • If the enclosing document contains the language field, then the index uses the document’s language for the embedded document.如果随附文档包含language字段,则索引将使用文档的语言作为嵌入文档的语言。
  • Otherwise, the index uses the default language for the embedded documents.否则,索引将使用嵌入文档的默认语言。

For documents that do not contain the language field, the index uses the default language, which is English.

Use any Field to Specify the Language for a Document

To use a field with a name other than language, include the language_override option when creating the index.

For example, give the following command to use idioma as the field name instead of language:

db.quotes.createIndex( { quote : "text" },
                       { language_override: "idioma" } )

The documents of the quotes collection may specify a language with the idioma field:

{ _id: 1, idioma: "portuguese", quote: "A sorte protege os audazes" }
{ _id: 2, idioma: "spanish", quote: "Nada hay más surrealista que la realidad." }
{ _id: 3, idioma: "english", quote: "is this a dagger which I see before me" }