Operational Factors and Data Models操作因素和数据模型

On this page本页内容

Modeling application data for MongoDB should consider various operational factors that impact the performance of MongoDB.为MongoDB建模应用程序数据应该考虑影响MongoDB性能的各种操作因素。For instance, different data models can allow for more efficent queries, increase the throughput of insert and update operations, or distribute activity to a sharded cluster more effectively.例如,不同的数据模型可以允许更高效的查询,增加插入和更新操作的吞吐量,或者更有效地将活动分发到分片集群。

When developing a data model, analyze all of your application’s read and write operations in conjunction with the following considerations.在开发数据模型时,请结合以下注意事项分析应用程序的所有读写操作

Atomicity原子性

In MongoDB, a write operation is atomic on the level of a single document, even if the operation modifies multiple embedded documents within a single document. 在MongoDB中,写入操作在单个文档级别上是原子的,即使该操作修改单个文档中的多个嵌入文档也是如此。When a single write operation modifies multiple documents (e.g. db.collection.updateMany()), the modification of each document is atomic, but the operation as a whole is not atomic.当单个写入操作修改多个文档(例如db.collection.updateMany())时,每个文档的修改都是原子的,但整个操作不是原子的。

Embedded Data Model嵌入式数据模型

The embedded data model combines all related data in a single document instead of normalizing across multiple documents and collections. 嵌入式数据模型将所有相关数据合并到单个文档中,而不是跨多个文档和集合进行规范化。This data model facilitates atomic operations.这种数据模型有助于原子操作。

See Model Data for Atomic Operations for an example data model that provides atomic updates for a single document.有关为单个文档提供原子更新的示例数据模型,请参阅原子操作的模型数据

Multi-Document Transaction多文档事务

For data models that store references between related pieces of data, the application must issue separate read and write operations to retrieve and modify these related pieces of data.对于在相关数据段之间存储引用的数据模型,应用程序必须发出单独的读写操作来检索和修改这些相关数据段。

For situations that require atomicity of reads and writes to multiple documents (in a single or multiple collections), MongoDB supports multi-document transactions:对于需要对多个文档(在单个或多个集合中)进行原子性读写的情况,MongoDB支持多文档事务:

  • In version 4.0, MongoDB supports multi-document transactions on replica sets.在版本4.0中,MongoDB支持副本集上的多文档事务。
  • In version 4.2, MongoDB introduces distributed transactions, which adds support for multi-document transactions on sharded clusters and incorporates the existing support for multi-document transactions on replica sets.在版本4.2中,MongoDB引入了分布式事务,它增加了对分片集群上多文档事务的支持,并合并了对副本集上多文档事务的现有支持。

For details regarding transactions in MongoDB, see the Transactions page.有关MongoDB中事务的详细信息,请参阅事务页面。

Important重要的

In most cases, multi-document transaction incurs a greater performance cost over single document writes, and the availability of multi-document transactions should not be a replacement for effective schema design. 在大多数情况下,多文档事务比单文档写入带来更大的性能成本,并且多文档事务的可用性不应取代有效的模式设计。For many scenarios, the denormalized data model (embedded documents and arrays) will continue to be optimal for your data and use cases. 对于许多场景,非规范化数据模型(嵌入式文档和数组)将继续适合您的数据和用例。That is, for many scenarios, modeling your data appropriately will minimize the need for multi-document transactions.也就是说,对于许多场景,适当地建模数据将最大限度地减少对多文档事务的需要。

For additional transactions usage considerations (such as runtime limit and oplog size limit), see also Production Considerations.有关其他事务使用注意事项(如运行时限制和oplog大小限制),请参阅生产注意事项

Sharding分片

MongoDB uses sharding to provide horizontal scaling. MongoDB使用分片来提供水平缩放。These clusters support deployments with large data sets and high-throughput operations. 这些集群支持部署大型数据集和高吞吐量操作。Sharding allows users to partition a collection within a database to distribute the collection’s documents across a number of mongod instances or shards.分片允许用户在数据库中对集合进行分区,以跨多个mongod实例或分片分发集合的文档。

To distribute data and application traffic in a sharded collection, MongoDB uses the shard key. 为了在分片集合中分发数据和应用程序流量,MongoDB使用分片密钥。Selecting the proper shard key has significant implications for performance, and can enable or prevent query isolation and increased write capacity. 选择适当的切分键对性能有重要影响,可以启用或防止查询隔离和增加写入容量。It is important to consider carefully the field or fields to use as the shard key.仔细考虑字段或字段用作碎片密钥是很重要的。

See Sharding and Shard Keys for more information.有关详细信息,请参阅分片分片键

Indexes索引

Use indexes to improve performance for common queries. Build indexes on fields that appear often in queries and for all operations that return sorted results. 使用索引提高常见查询的性能。对查询中经常出现的字段以及返回排序结果的所有操作建立索引。MongoDB automatically creates a unique index on the _id field.MongoDB自动在_id字段上创建唯一索引。

As you create indexes, consider the following behaviors of indexes:在创建索引时,考虑下列索引行为:

See Indexing Strategies for more information on indexes as well as Analyze Query Performance. 有关索引以及分析查询性能的更多信息,请参阅索引策略Additionally, the MongoDB database profiler may help identify inefficient queries.此外,MongoDB数据库探查器可能有助于识别效率低下的查询。

Large Number of Collections大量集合

In certain situations, you might choose to store related information in several collections rather than in a single collection.在某些情况下,您可能会选择将相关信息存储在多个集合中,而不是存储在单个集合中。

Consider a sample collection logs that stores log documents for various environment and applications. 考虑为各种环境和应用程序存储日志文档的示例集合logsThe logs collection contains documents of the following form:logs集合包含以下形式的文档:

{ log: "dev", ts: ..., info: ... }
{ log: "debug", ts: ..., info: ...}

If the total number of documents is low, you may group documents into collection by type. 如果文档总数较低,则可以按类型将文档分组到集合中。For logs, consider maintaining distinct log collections, such as logs_dev and logs_debug. 对于日志,考虑维护不同的日志集合,如logs_devlogs_debugThe logs_dev collection would contain only the documents related to the dev environment.logs_dev集合将只包含与开发环境相关的文档。

Generally, having a large number of collections has no significant performance penalty and results in very good performance. 通常,拥有大量集合不会带来显著的性能损失,并且会产生非常好的性能。Distinct collections are very important for high-throughput batch processing.不同的集合对于高通量批处理非常重要。

When using models that have a large number of collections, consider the following behaviors:当使用具有大量集合的模型时,请考虑以下行为:

Collection Contains Large Number of Small Documents集合包含大量小文档

You should consider embedding for performance reasons if you have a collection with a large number of small documents. 如果您有一个具有大量小文档的集合,则应该考虑嵌入性能的原因。If you can group these small documents by some logical relationship and you frequently retrieve the documents by this grouping, you might consider “rolling-up” the small documents into larger documents that contain an array of embedded documents.如果可以通过一些逻辑关系将这些小文档分组,并且频繁地通过该分组检索文档,则可以考虑将“小文档”卷成更大的文档,这些文档包含嵌入文档的数组。

“Rolling up” these small documents into logical groupings means that queries to retrieve a group of documents involve sequential reads and fewer random disk accesses. 将这些小文档“汇总”为逻辑分组意味着检索一组文档的查询涉及顺序读取和更少的随机磁盘访问。Additionally, “rolling up” documents and moving common fields to the larger document benefit the index on these fields. 此外,“上卷”文档和将公共字段移动到较大的文档有利于这些字段的索引。There would be fewer copies of the common fields and there would be fewer associated key entries in the corresponding index. 公共字段的副本会更少,相应索引中关联的键条目也会更少。See Indexes for more information on indexes.有关索引的更多信息,请参阅索引

However, if you often only need to retrieve a subset of the documents within the group, then “rolling-up” the documents may not provide better performance. 但是,如果您通常只需要检索组中文档的一个子集,那么“汇总”文档可能无法提供更好的性能。Furthermore, if small, separate documents represent the natural model for the data, you should maintain that model.此外,如果小的、独立的文档表示数据的自然模型,则应该维护该模型。

Storage Optimization for Small Documents小型文档的存储优化

Each MongoDB document contains a certain amount of overhead. 每个MongoDB文档都包含一定数量的开销。This overhead is normally insignificant but becomes significant if all documents are just a few bytes, as might be the case if the documents in your collection only have one or two fields.这种开销通常是不重要的,但如果所有文档都只有几个字节,那么这种开销就变得很重要,如果集合中的文档只有一个或两个字段,情况可能就是这样。

Consider the following suggestions and strategies for optimizing storage utilization for these collections:考虑以下建议和策略来优化这些集合的存储利用:

Data Lifecycle Management数据生命周期管理

Data modeling decisions should take data lifecycle management into consideration.数据建模决策应考虑数据生命周期管理。

The Time to Live or TTL feature of collections expires documents after a period of time. 集合的生存时间或TTL功能在一段时间后过期。Consider using the TTL feature if your application requires some data to persist in the database for a limited period of time.如果您的应用程序需要一些数据在数据库中持续一段时间,请考虑使用TTL特性。

Additionally, if your application only uses recently inserted documents, consider Capped Collections. 另外,如果您的应用程序只使用最近插入的文档,请考虑封顶集合Capped collections provide first-in-first-out (FIFO) management of inserted documents and efficiently support operations that insert and read documents based on insertion order.Capped集合提供了对插入文档的先进先出(FIFO)管理,并有效地支持基于插入顺序插入和读取文档的操作。