On this page本页内容
Modeling application data for MongoDB should consider various operational factors that impact the performance of MongoDB.为MongoDB建模应用程序数据应该考虑影响MongoDB性能的各种操作因素。For instance, different data models can allow for more efficent queries, increase the throughput of insert and update operations, or distribute activity to a sharded cluster more effectively.例如,不同的数据模型可以允许更高效的查询,增加插入和更新操作的吞吐量,或者更有效地将活动分发到分片集群。
When developing a data model, analyze all of your application’s read and write operations in conjunction with the following considerations.在开发数据模型时,请结合以下注意事项分析应用程序的所有读写操作。
In MongoDB, a write operation is atomic on the level of a single document, even if the operation modifies multiple embedded documents within a single document. 在MongoDB中,写入操作在单个文档级别上是原子的,即使该操作修改单个文档中的多个嵌入文档也是如此。When a single write operation modifies multiple documents (e.g. 当单个写入操作修改多个文档(例如db.collection.updateMany()
), the modification of each document is atomic, but the operation as a whole is not atomic.db.collection.updateMany()
)时,每个文档的修改都是原子的,但整个操作不是原子的。
The embedded data model combines all related data in a single document instead of normalizing across multiple documents and collections. 嵌入式数据模型将所有相关数据合并到单个文档中,而不是跨多个文档和集合进行规范化。This data model facilitates atomic operations.这种数据模型有助于原子操作。
See Model Data for Atomic Operations for an example data model that provides atomic updates for a single document.有关为单个文档提供原子更新的示例数据模型,请参阅原子操作的模型数据。
For data models that store references between related pieces of data, the application must issue separate read and write operations to retrieve and modify these related pieces of data.对于在相关数据段之间存储引用的数据模型,应用程序必须发出单独的读写操作来检索和修改这些相关数据段。
For situations that require atomicity of reads and writes to multiple documents (in a single or multiple collections), MongoDB supports multi-document transactions:对于需要对多个文档(在单个或多个集合中)进行原子性读写的情况,MongoDB支持多文档事务:
For details regarding transactions in MongoDB, see the Transactions page.有关MongoDB中事务的详细信息,请参阅事务页面。
Important重要的
In most cases, multi-document transaction incurs a greater performance cost over single document writes, and the availability of multi-document transactions should not be a replacement for effective schema design. 在大多数情况下,多文档事务比单文档写入带来更大的性能成本,并且多文档事务的可用性不应取代有效的模式设计。For many scenarios, the denormalized data model (embedded documents and arrays) will continue to be optimal for your data and use cases. 对于许多场景,非规范化数据模型(嵌入式文档和数组)将继续适合您的数据和用例。That is, for many scenarios, modeling your data appropriately will minimize the need for multi-document transactions.也就是说,对于许多场景,适当地建模数据将最大限度地减少对多文档事务的需要。
For additional transactions usage considerations (such as runtime limit and oplog size limit), see also Production Considerations.有关其他事务使用注意事项(如运行时限制和oplog大小限制),请参阅生产注意事项。
MongoDB uses sharding to provide horizontal scaling. MongoDB使用分片来提供水平缩放。These clusters support deployments with large data sets and high-throughput operations. 这些集群支持部署大型数据集和高吞吐量操作。Sharding allows users to partition a collection within a database to distribute the collection’s documents across a number of 分片允许用户在数据库中对集合进行分区,以跨多个mongod
instances or shards.mongod
实例或分片分发集合的文档。
To distribute data and application traffic in a sharded collection, MongoDB uses the shard key. 为了在分片集合中分发数据和应用程序流量,MongoDB使用分片密钥。Selecting the proper shard key has significant implications for performance, and can enable or prevent query isolation and increased write capacity. 选择适当的切分键对性能有重要影响,可以启用或防止查询隔离和增加写入容量。It is important to consider carefully the field or fields to use as the shard key.仔细考虑字段或字段用作碎片密钥是很重要的。
See Sharding and Shard Keys for more information.有关详细信息,请参阅分片和分片键。
Use indexes to improve performance for common queries. Build indexes on fields that appear often in queries and for all operations that return sorted results. 使用索引提高常见查询的性能。对查询中经常出现的字段以及返回排序结果的所有操作建立索引。MongoDB automatically creates a unique index on the MongoDB自动在_id
field._id
字段上创建唯一索引。
As you create indexes, consider the following behaviors of indexes:在创建索引时,考虑下列索引行为:
See Indexing Strategies for more information on indexes as well as Analyze Query Performance. 有关索引以及分析查询性能的更多信息,请参阅索引策略。Additionally, the MongoDB database profiler may help identify inefficient queries.此外,MongoDB数据库探查器可能有助于识别效率低下的查询。
In certain situations, you might choose to store related information in several collections rather than in a single collection.在某些情况下,您可能会选择将相关信息存储在多个集合中,而不是存储在单个集合中。
Consider a sample collection 考虑为各种环境和应用程序存储日志文档的示例集合logs
that stores log documents for various environment and applications. logs
。The logs
collection contains documents of the following form:logs
集合包含以下形式的文档:
If the total number of documents is low, you may group documents into collection by type. 如果文档总数较低,则可以按类型将文档分组到集合中。For logs, consider maintaining distinct log collections, such as 对于日志,考虑维护不同的日志集合,如logs_dev
and logs_debug
. logs_dev
和logs_debug
。The logs_dev
collection would contain only the documents related to the dev environment.logs_dev
集合将只包含与开发环境相关的文档。
Generally, having a large number of collections has no significant performance penalty and results in very good performance. 通常,拥有大量集合不会带来显著的性能损失,并且会产生非常好的性能。Distinct collections are very important for high-throughput batch processing.不同的集合对于高通量批处理非常重要。
When using models that have a large number of collections, consider the following behaviors:当使用具有大量集合的模型时,请考虑以下行为:
_id
, requires at least 8 kB of data space._id
上的索引)至少需要8KB的数据空间。<database>.ns
) stores all meta-data for that database, and each index and collection has its own entry in the namespace file. <database>.ns
)存储该数据库的所有元数据,并且每个索引和集合在名称空间文件中都有自己的条目。limits on the size of namespace files
.You should consider embedding for performance reasons if you have a collection with a large number of small documents. 如果您有一个具有大量小文档的集合,则应该考虑嵌入性能的原因。If you can group these small documents by some logical relationship and you frequently retrieve the documents by this grouping, you might consider “rolling-up” the small documents into larger documents that contain an array of embedded documents.如果可以通过一些逻辑关系将这些小文档分组,并且频繁地通过该分组检索文档,则可以考虑将“小文档”卷成更大的文档,这些文档包含嵌入文档的数组。
“Rolling up” these small documents into logical groupings means that queries to retrieve a group of documents involve sequential reads and fewer random disk accesses. 将这些小文档“汇总”为逻辑分组意味着检索一组文档的查询涉及顺序读取和更少的随机磁盘访问。Additionally, “rolling up” documents and moving common fields to the larger document benefit the index on these fields. 此外,“上卷”文档和将公共字段移动到较大的文档有利于这些字段的索引。There would be fewer copies of the common fields and there would be fewer associated key entries in the corresponding index. 公共字段的副本会更少,相应索引中关联的键条目也会更少。See Indexes for more information on indexes.有关索引的更多信息,请参阅索引。
However, if you often only need to retrieve a subset of the documents within the group, then “rolling-up” the documents may not provide better performance. 但是,如果您通常只需要检索组中文档的一个子集,那么“汇总”文档可能无法提供更好的性能。Furthermore, if small, separate documents represent the natural model for the data, you should maintain that model.此外,如果小的、独立的文档表示数据的自然模型,则应该维护该模型。
Each MongoDB document contains a certain amount of overhead. 每个MongoDB文档都包含一定数量的开销。This overhead is normally insignificant but becomes significant if all documents are just a few bytes, as might be the case if the documents in your collection only have one or two fields.这种开销通常是不重要的,但如果所有文档都只有几个字节,那么这种开销就变得很重要,如果集合中的文档只有一个或两个字段,情况可能就是这样。
Consider the following suggestions and strategies for optimizing storage utilization for these collections:考虑以下建议和策略来优化这些集合的存储利用:
_id
field explicitly._id
字段。
MongoDB clients automatically add an MongoDB客户端会自动为每个文档添加一个_id
field to each document and generate a unique 12-byte ObjectId for the _id
field. _id
字段,并为_id
字段生成一个唯一的12字节ObjectId。Furthermore, MongoDB always indexes the 此外,MongoDB始终为_id
field. _id
字段编制索引。For smaller documents this may account for a significant amount of space.对于较小的文档,这可能会占用大量空间。
To optimize storage use, users can specify a value for the 为了优化存储使用,用户可以在将文档插入集合时显式指定_id
field explicitly when inserting documents into the collection. _id
字段的值。This strategy allows applications to store a value in the 此策略允许应用程序在_id
field that would have occupied space in another portion of the document._id
字段中存储一个值,该值可能会占用文档另一部分的空间。
You can store any value in the 您可以在_id
field, but because this value serves as a primary key for documents in the collection, it must uniquely identify them. _id
字段中存储任何值,但由于该值用作集合中文档的主键,因此它必须唯一标识这些文档。If the field’s value is not unique, then it cannot serve as a primary key as there would be collisions in the collection.如果字段的值不是唯一的,则它不能用作主键,因为集合中会发生冲突。
Note
Shortening field names reduces expressiveness and does not provide considerable benefit for larger documents and where document overhead is not of significant concern. 缩短字段名会降低表达能力,对于较大的文档和文档开销不太重要的文档,不会带来很大的好处。Shorter field names do not reduce the size of indexes, because indexes have a predefined structure.较短的字段名不会减少索引的大小,因为索引具有预定义的结构。
In general, it is not necessary to use short field names.通常,不必使用短字段名。
MongoDB stores all field names in every document. MongoDB存储每个文档中的所有字段名。For most documents, this represents a small fraction of the space used by a document; however, for small documents the field names may represent a proportionally large amount of space. 对于大多数文档,这表示文档使用的空间的一小部分;但是,对于小型文档,字段名可能代表相当大的空间。Consider a collection of small documents that resemble the following:考虑以下类似的小文档集合:
If you shorten the field named 如果将名为last_name
to lname
and the field named best_score
to score
, as follows, you could save 9 bytes per document.last_name
的字段缩短为lname
,将名为best_score
的字段缩短为score
,如下所示,则每个文档可以节省9个字节。
In some cases you may want to embed documents in other documents and save on the per-document overhead. 在某些情况下,您可能希望在其他文档中嵌入文档,并节省每个文档的开销。See Collection Contains Large Number of Small Documents.请参阅集合中包含大量小文档。
Data modeling decisions should take data lifecycle management into consideration.数据建模决策应考虑数据生命周期管理。
The Time to Live or TTL feature of collections expires documents after a period of time. 集合的生存时间或TTL功能在一段时间后过期。Consider using the TTL feature if your application requires some data to persist in the database for a limited period of time.如果您的应用程序需要一些数据在数据库中持续一段时间,请考虑使用TTL特性。
Additionally, if your application only uses recently inserted documents, consider Capped Collections. 另外,如果您的应用程序只使用最近插入的文档,请考虑封顶集合。Capped collections provide first-in-first-out (FIFO) management of inserted documents and efficiently support operations that insert and read documents based on insertion order.Capped集合提供了对插入文档的先进先出(FIFO)管理,并有效地支持基于插入顺序插入和读取文档的操作。