When a joining member connects to an online existing member for state transfer during distributed recovery, the joining member acts as a client on the connection and the existing member acts as a server. When state transfer from the donor's binary log is in progress over this connection (using the asynchronous replication channel group_replication_recovery
), the joining member acts as the replica and the existing member acts as the source. When a remote cloning operation is in progress over this connection, the joining member acts as a recipient and the existing member acts as a donor. Configuration settings that apply to those roles outside the Group Replication context can apply for Group Replication also, unless they are overridden by a Group Replication-specific configuration setting or behavior.
The connection that an existing member offers to a joining member for distributed recovery is not the same connection that is used by Group Replication for communication between online members of the group.
The connection used by the group communication engine for Group Replication (XCom, a Paxos variant) for TCP communication between remote XCom instances is specified by the group_replication_local_address
system variable. This connection is used for TCP/IP messages between online members. Communication with the local instance is over an input channel using shared memory.
For distributed recovery, up to MySQL 8.0.20, group members offer their standard SQL client connection to joining members, as specified by MySQL Server's hostname
and port
system variables. If an alternative port number is specified by the report_port
system variable, that one is used instead.
From MySQL 8.0.21, group members may advertise an alternative list of distributed recovery endpoints as dedicated client connections for joining members, allowing you to control distributed recovery traffic separately from connections by regular client users of the member. You specify this list using the group_replication_advertise_recovery_endpoints
system variable, and the member transmits their list of distributed recovery endpoints to the group when they join the group. The default is that the member continues to offer the standard SQL client connection as in earlier releases.
Distributed recovery can fail if a joining member cannot correctly identify the other members using the host name as defined by MySQL Server's hostname
system variable. It is recommended that operating systems running MySQL have a properly configured unique host name, either using DNS or local settings. The host name that the server is using for SQL client connections can be verified in the Member_host
column of the Performance Schema table replication_group_members
. If multiple group members externalize a default host name set by the operating system, there is a chance of the joining member not resolving it to the correct member address and not being able to connect for distributed recovery. In this situation you can use MySQL Server's report_host
system variable to configure a unique host name to be externalized by each of the servers.
The steps for a joining member to establish a connection for distributed recovery are as follows:
When the member joins the group, it connects with one of the seed members included in the list in its group_replication_group_seeds
system variable, initially using the group_replication_local_address
connection as specified in that list. The seed members might be a subset of the group.
Over this connection, the seed member uses Group Replication's membership service to provide the joining member with a list of all the members that are online in the group, in the form of a view. The membership information includes the details of the distributed recovery endpoints or standard SQL client connection offered by each member for distributed recovery.
The joining member selects a suitable group member from this list to be its donor for distributed recovery, following the behaviors described in Section 18.5.3.4, “Fault Tolerance for Distributed Recovery”.
The joining member then attempts to connect to the donor using the donor's advertised distributed recovery endpoints, trying each in turn in the order they are specified in the list. If the donor provides no endpoints, the joining member attempts to connect using the donor's standard SQL client connection. The SSL requirements for the connection are as specified by the group_replication_recovery_ssl_*
options described in Section 18.5.3.1.4, “SSL and Authentication for Distributed Recovery”.
If the joining member is not able to connect to the selected donor, it retries with other suitable donors, following the behaviors described in Section 18.5.3.4, “Fault Tolerance for Distributed Recovery”. Note that if the joining member exhausts the list of advertised endpoints without making a connection, it does not fall back to the donor's standard SQL client connection, but switches to another donor.
When the joining member establishes a distributed recovery connection with a donor, it uses that connection for state transfer as described in Section 18.5.3, “Distributed Recovery”. The host and port for the connection that is used are shown in the joining member's log. Note that if a remote cloning operation is used, when the joining member has restarted at the end of the operation, it establishes a connection with a new donor for state transfer from the binary log. This might be a connection to a different member from the original donor used for the remote cloning operation, or it might be a different connection to the original donor. In any case, the distributed recovery process continues in the same way as it would have with the original donor.