On this page本页内容
A replica set in MongoDB is a group of MongoDB中的副本集是维护相同数据集的一组mongod
processes that maintain the same data set. mongod
进程。Replica sets provide redundancy and high availability, and are the basis for all production deployments. 副本集提供冗余和高可用性,是所有生产部署的基础。This section introduces replication in MongoDB as well as the components and architecture of replica sets. 本节介绍MongoDB中的复制以及副本集的组件和体系结构。The section also provides tutorials for common tasks related to replica sets.本节还提供了与副本集相关的常见任务的教程。
Replication provides redundancy and increases data availability. 复制提供了冗余并提高了数据可用性。With multiple copies of data on different database servers, replication provides a level of fault tolerance against the loss of a single database server.由于不同数据库服务器上有多个数据副本,复制提供了一定程度的容错能力,可以防止丢失单个数据库服务器。
In some cases, replication can provide increased read capacity as clients can send read operations to different servers. 在某些情况下,复制可以提供更大的读取容量,因为客户端可以向不同的服务器发送读取操作。Maintaining copies of data in different data centers can increase data locality and availability for distributed applications. 在不同的数据中心维护数据副本可以提高分布式应用程序的数据局部性和可用性。You can also maintain additional copies for dedicated purposes, such as disaster recovery, reporting, or backup.您还可以出于专用目的(如灾难恢复、报告或备份)维护其他副本。
A replica set is a group of 副本集是维护相同数据集的一组mongod
instances that maintain the same data set. mongod
实例。A replica set contains several data bearing nodes and optionally one arbiter node. 副本集包含多个数据承载节点和一个仲裁器节点(可选)。Of the data bearing nodes, one and only one member is deemed the primary node, while the other nodes are deemed secondary nodes.在数据承载节点中,一个且只有一个成员被视为主节点,而其他节点被视为次节点。
The primary node receives all write operations. 主节点接收所有写入操作。A replica set can have only one primary capable of confirming writes with 副本集只能有一个主副本,能够确认具有{ w: "majority" }
write concern; although in some circumstances, another mongod instance may transiently believe itself to also be primary. { w: "majority" }
写关注点的写操作;尽管在某些情况下,另一个mongod实例可能会暂时认为自己也是主要的。[1] The primary records all changes to its data sets in its operation log, i.e. oplog. 主服务器在其操作日志(即oplog)中记录对其数据集的所有更改。For more information on primary node operation, see Replica Set Primary.有关主节点操作的更多信息,请参阅副本集主节点。
The secondaries replicate the primary’s oplog and apply the operations to their data sets such that the secondaries’ data sets reflect the primary’s data set. 辅助设备复制主设备的oplog并将操作应用于其数据集,以便辅助设备的数据集反映主设备的数据集。If the primary is unavailable, an eligible secondary will hold an election to elect itself the new primary. 如果初选不可用,合格的中学将举行选举以选出新的初选。For more information on secondary members, see Replica Set Secondary Members.有关辅助成员的详细信息,请参阅副本集辅助成员。
In some circumstances (such as you have a primary and a secondary but cost constraints prohibit adding another secondary), you may choose to add a 在某些情况下(例如您有一个主副本和一个辅助副本,但成本限制禁止添加另一个辅助副本),您可以选择将mongod
instance to a replica set as an arbiter. mongod
实例作为仲裁器添加到副本集。An arbiter participates in elections but does not hold data (i.e. does not provide data redundancy). 仲裁员参与选举但不保存数据(即不提供数据冗余)。For more information on arbiters, see Replica Set Arbiter.有关仲裁器的更多信息,请参阅副本集仲裁器。
An arbiter will always be an arbiter whereas a primary may step down and become a secondary and a secondary may become the primary during an election.
Secondaries replicate the primary’s oplog and apply the operations to their data sets asynchronously. 辅助设备复制主设备的oplog,并将操作异步应用于其数据集。By having the secondaries’ data sets reflect the primary’s data set, the replica set can continue to function despite the failure of one or more members.通过使辅助数据集反映主数据集,副本集可以在一个或多个成员出现故障的情况下继续运行。
For more information on replication mechanics, see Replica Set Oplog and Replica Set Data Synchronization.有关复制机制的更多信息,请参阅副本集Oplog和副本集数据同步。
Starting in version 4.2 (also available starting in 4.0.6), secondary members of a replica set now log oplog entries that take longer than the slow operation threshold to apply. 从版本4.2开始(也可从4.0.6开始使用),副本集的辅助成员现在会记录需要比慢速操作阈值更长时间才能应用的oplog条目。These slow oplog messages are logged for the secondaries in the diagnostic log
under the REPL
component with the text applied op: <oplog entry> took <num>ms
. These slow oplog entries depend only on the slow operation threshold. They do not depend on the log levels (either at the system or component level), or the profiling level, or the slow operation sample rate. The profiler does not capture slow oplog entries.
Replication lag refers to the amount of time that it takes to copy (i.e. replicate) a write operation on the primary to a secondary. Some small delay period may be acceptable, but significant problems emerge as replication lag grows, including building cache pressure on the primary.
Starting in MongoDB 4.2, administrators can limit the rate at which the primary applies its writes with the goal of keeping the majority committed
lag under a configurable maximum value flowControlTargetLagSeconds
.
By default, flow control is enabled
.
Note
For flow control to engage, the replica set/sharded cluster must have: featureCompatibilityVersion (FCV) of 4.2
and read concern majority enabled
. That is, enabled flow control has no effect if FCV is not 4.2
or if read concern majority is disabled.
With flow control enabled, as the lag grows close to the flowControlTargetLagSeconds
, writes on the primary must obtain tickets before taking locks to apply writes. By limiting the number of tickets issued per second, the flow control mechanism attempts to keep the the lag under the target.
For more information, see Check the Replication Lag and Flow Control.
When a primary does not communicate with the other members of the set for more than the configured electionTimeoutMillis
period (10 seconds by default), an eligible secondary calls for an election to nominate itself as the new primary. The cluster attempts to complete the election of a new primary and resume normal operations.
The replica set cannot process write operations until the election completes successfully. 在选择成功完成之前,副本集无法处理写入操作。The replica set can continue to serve read queries if such queries are configured to run on secondaries while the primary is offline.
The median time before a cluster elects a new primary should not typically exceed 12 seconds, assuming default replica configuration settings
. This includes time required to mark the primary as unavailable and call and complete an election. You can tune this time period by modifying the settings.electionTimeoutMillis
replication configuration option. Factors such as network latency may extend the time required for replica set elections to complete, which in turn affects the amount of time your cluster may operate without a primary. These factors are dependent on your particular cluster architecture.这些因素取决于特定的集群体系结构。
Lowering the electionTimeoutMillis
replication configuration option from the default 10000
(10 seconds)
can result in faster detection of primary failure. However, the cluster may call elections more frequently due to factors such as temporary network latency even if the primary is otherwise healthy. This can result in increased rollbacks for w : 1 write operations.
Your application connection logic should include tolerance for automatic failovers and the subsequent elections. Starting in MongoDB 3.6, MongoDB drivers can detect the loss of the primary and automatically retry certain write operations a single time, providing additional built-in handling of automatic failovers and elections:
retryWrites=true
in the connection string.Starting in version 4.4, MongoDB provides mirrored reads to pre-warm electable secondary members’ cache with the most recently accessed data. Pre-warming the cache of a secondary can help restore performance more quickly after an election.
To learn more about MongoDB’s failover process, see:要了解有关MongoDB故障切换过程的更多信息,请参阅:
By default, clients read from the primary [1]; however, clients can specify a read preference to send read operations to secondaries.
Asynchronous replication to secondaries means that reads from secondaries may return data that does not reflect the state of the data on the primary.
Multi-document transactions that contain read operations must use read preference primary
. All operations in a given transaction must route to the same member.
For information on reading from replica sets, see Read Preference.
Depending on the read concern, clients can see the results of writes before the writes are durable:
"local"
or "available"
read concern can see the result of a write operation before the write operation is acknowledged to the issuing client."local"
or "available"
read concern can read data which may be subsequently rolled back during replica set failovers.For operations in a multi-document transaction, when a transaction commits, all data changes made in the transaction are saved and visible outside the transaction. That is, a transaction will not commit some of its changes while rolling back others.也就是说,事务在回滚其他更改时不会提交其部分更改。
Until a transaction commits, the data changes made in the transaction are not visible outside the transaction.在事务提交之前,事务中所做的数据更改在事务外部不可见。
However, when a transaction writes to multiple shards, not all outside read operations need to wait for the result of the committed transaction to be visible across the shards. 但是,当事务写入多个分片时,并非所有外部读取操作都需要等待提交的事务的结果在分片中可见。For example, if a transaction is committed and write 1 is visible on shard A but write 2 is not yet visible on shard B, an outside read at read concern 例如,如果事务已提交,且写入1在碎片a上可见,但写入2在碎片B上尚不可见,则外部读取-读取关注点"local"
can read the results of write 1 without seeing write 2."local"
可以读取写入1的结果,而不查看写入2。
For more information on read isolations, consistency and recency for MongoDB, see Read Isolation, Consistency, and Recency.有关MongoDB的读取隔离、一致性和最近性的更多信息,请参阅读取隔离、一致性和最近性。
Starting in version 4.4, MongoDB provides mirrored reads to pre-warm the cache of electable secondary members (i.e. members with priority greater than 0
). With mirrored reads (which is enabled by default), the primary can mirror a subset of operations that it receives and send them to a subset of electable secondaries. The size of the subset is configurable.
Note
The primary’s response to the client is not affected by the mirror reads. 主服务器对客户端的响应不受镜像读取的影响。The mirrored reads are “fire-and-forget” operations by the primary; i.e., the primary does not await the response for the mirrored reads.镜像读取是主服务器的“触发并忘记”操作;i、 例如,主服务器不等待镜像读取的响应。
Mirrored reads are supported for the following operations:以下操作支持镜像读取:
count
distinct
find
findAndModify
(Specifically, the filter is sent as a mirrored read)update
(Specifically, the filter is sent as a mirrored read)With MongoDB 4.4, mirrored reads are enabled by default and use a default sampling rate
of 0.01
. That is, the primary mirrors reads to each electable (i.e. priority greater than 0
) secondary at the sampling rate of 1 percent.
For example, given a replica set with a primary and two electable secondaries and a sampling rate of 0.01
, if the primary receives 100
operations that can be mirrored, the sampling may result in 1
reads being mirrored to one secondary and 0
reads to the other or 0
to each, etc.
To modify the sampling rate, use the mirrorReads
parameter:
0.0
disables mirrored reads.0.0
enables mirrored reads.1.0
.For details, see mirrorReads
.
Starting in MongoDB 4.4, the command serverStatus
and its corresponding mongo
shell method db.serverStatus()
return mirroredReads
if you specify the field’s inclusion in the operation. For example,
Starting in MongoDB 4.0, multi-document transactions are available for replica sets.
Multi-document transactions that contain read operations must use read preference primary
. All operations in a given transaction must route to the same member.
Until a transaction commits, the data changes made in the transaction are not visible outside the transaction.
However, when a transaction writes to multiple shards, not all outside read operations need to wait for the result of the committed transaction to be visible across the shards. For example, if a transaction is committed and write 1 is visible on shard A but write 2 is not yet visible on shard B, an outside read at read concern "local"
can read the results of write 1 without seeing write 2.
Starting in MongoDB 3.6, change streams are available for replica sets and sharded clusters. Change streams allow applications to access real-time data changes without the complexity and risk of tailing the oplog. Applications can use change streams to subscribe to all data changes on a collection or collections.
Replica sets provide a number of options to support application needs. For example, you may deploy a replica set with members in multiple data centers, or control the outcome of elections by adjusting the members[n].priority
of some members. Replica sets also support dedicated members for reporting, disaster recovery, or backup functions.
See Priority 0 Replica Set Members, Hidden Replica Set Members and Delayed Replica Set Members for more information.
[1] | (1, 2) In some circumstances, two nodes in a replica set may transiently believe that they are the primary, but at most, one of them will be able to complete writes with { w:
"majority" } write concern. The node that can complete { w: "majority" } writes is the current primary, and the other node is a former primary that has not yet recognized its demotion, typically due to a network partition. When this occurs, clients that connect to the former primary may observe stale data despite having requested read preference primary , and new writes to the former primary will eventually roll back. |