MySQL Shell 8.4  /  ...  /  在 InnoDB ClusterSet 中隔离集群

8.9.1 在 InnoDB ClusterSet 中隔离集群

紧急故障转移 后,如果 ClusterSet 的各个部分之间存在事务集不同的风险,则必须隔离集群,使其无法进行写入操作,或使其无法进行任何操作。

如果发生网络分区,则可能出现脑裂情况,在这种情况下,实例会失去同步,并且无法正常通信来确定同步状态。脑裂可能发生在以下情况:DBA 决定强制选择一个副本集群成为主集群,从而创建多个主集群,从而导致脑裂情况。

在这种情况下,DBA 可以选择从以下方面隔离原始主集群:

  • 写入。

  • 所有流量。

提供三种隔离操作

  • <Cluster>.fenceWrites(): 停止对 ClusterSet 的主集群的写入流量。副本集群不接受写入操作,因此此操作对它们没有影响。

    可以对无效的副本集群使用。此外,如果针对禁用 super_read_only 的副本集群运行,它将启用它。

  • <Cluster>.unfenceWrites(): 恢复写入流量。此操作可以在之前使用 <Cluster>.fenceWrites() 操作从写入流量中隔离的集群上运行。

    无法对副本集群使用 cluster.unfenceWrites()

  • <Cluster>.fenceAllTraffic(): 隔离集群及其中的所有只读副本,使其无法进行任何操作。如果使用 <Cluster>.fenceAllTraffic() 从所有流量中隔离了集群,则必须使用 dba.rebootClusterFromCompleteOutage() MySQL Shell 命令重新启动集群。

    有关 dba.rebootClusterFromCompleteOutage() 的更多信息,请参见 第 7.8.3 节“从重大故障中重新启动集群”

fenceWrites()

在副本集群上发出 .fenceWrites() 会返回错误

ERROR: Unable to fence Cluster from write traffic: 
operation not permitted on REPLICA Clusters
Cluster.fenceWrites: The Cluster '<Cluster>' is a REPLICA Cluster 
of the ClusterSet '<ClusterSet>' (MYSQLSH 51616)

虽然主要在属于集群集的集群上使用隔离,但也可以使用 <Cluster>.fenceAllTraffic() 隔离独立集群。

  1. 要从写入流量中隔离主集群,请使用 Cluster.fenceWrites 命令,如下所示

            <Cluster>.fenceWrites()

    运行命令后

    • 集群上的自动 super_read_only 管理被禁用。

    • super_read_only 在集群中的所有实例上启用。

    • 所有应用程序都被阻止对集群执行写入操作。

    cluster.fenceWrites()
        The Cluster 'primary' will be fenced from write traffic
    
    	  * Disabling automatic super_read_only management on the Cluster...
    	  * Enabling super_read_only on '127.0.0.1:3311'...
    	  * Enabling super_read_only on '127.0.0.1:3312'...
    	  * Enabling super_read_only on '127.0.0.1:3313'...
    
    	  NOTE: Applications will now be blocked from performing writes on Cluster 'primary'. 
        Use <Cluster>.unfenceWrites() to resume writes if you are certain a split-brain is not in effect.
    
    	  Cluster successfully fenced from write traffic
  2. 要检查是否已从写入流量中隔离主集群,请使用 <Cluster>.status 命令,如下所示

          <Cluster>.clusterset.status()

    输出如下所示

    clusterset.status()
            {
            "clusters": {
            "primary": {
            "clusterErrors": [
            "WARNING: Cluster is fenced from Write traffic. 
             Use cluster.unfenceWrites() to unfence the Cluster."
            ],
            "clusterRole": "PRIMARY",
            "globalStatus": "OK_FENCED_WRITES",
            "primary": null,
            "status": "FENCED_WRITES",
            "statusText": "Cluster is fenced from Write Traffic."
            },
            "replica": {
            "clusterRole": "REPLICA",
            "clusterSetReplicationStatus": "OK",
            "globalStatus": "OK"
            }
            },
            "domainName": "primary",
            "globalPrimaryInstance": null,
            "primaryCluster": "primary",
            "status": "UNAVAILABLE",
            "statusText": "Primary Cluster is fenced from write traffic."
  3. 要解除隔离集群并恢复对主集群的写入流量,请使用 Cluster.fenceWrites 命令,如下所示

            <Cluster>.unfenceWrites()

    主集群上的自动 super_read_only 管理被启用,并且主集群实例上的 super_read_only 状态被启用。

            cluster.unfenceWrites()
            The Cluster 'primary' will be unfenced from write traffic
    
            * Enabling automatic super_read_only management on the Cluster...
            * Disabling super_read_only on the primary '127.0.0.1:3311'...
    
            Cluster successfully unfenced from write traffic
  4. 要从所有流量中隔离集群,请使用 Cluster.fenceAllTraffic 命令,如下所示

          <Cluster>.fenceAllTraffic()

    集群实例的主实例上启用了 super_read_only 状态。在所有集群实例上启用 offline_mode 之前

          cluster.fenceAllTraffic()
            The Cluster 'primary' will be fenced from all traffic
    
            * Enabling super_read_only on the primary '127.0.0.1:3311'...
            * Enabling offline_mode on the primary '127.0.0.1:3311'...
            * Enabling offline_mode on '127.0.0.1:3312'...
            * Stopping Group Replication on '127.0.0.1:3312'...
            * Enabling offline_mode on '127.0.0.1:3313'...
            * Stopping Group Replication on '127.0.0.1:3313'...
            * Stopping Group Replication on the primary '127.0.0.1:3311'...
    
            Cluster successfully fenced from all traffic
  5. 要解除隔离集群,使其可以进行所有操作,请使用 dba.rebootClusterFromCompleteOutage() MySQL Shell 命令。当您恢复集群后,通过在询问是否要将实例重新加入集群时选择 Y 来将实例重新加入集群

    cluster = dba.rebootClusterFromCompleteOutage()
    		Restoring the cluster 'primary' from complete outage...
    
    		The instance '127.0.0.1:3312' was part of the cluster configuration.
    		Would you like to rejoin it to the cluster? [y/N]: Y
    
    		The instance '127.0.0.1:3313' was part of the cluster configuration.
    		Would you like to rejoin it to the cluster? [y/N]: Y
    
    		* Waiting for seed instance to become ONLINE...
    		127.0.0.1:3311 was restored.
    		Rejoining '127.0.0.1:3312' to the cluster.
    		Rejoining instance '127.0.0.1:3312' to cluster 'primary'...
    
    		The instance '127.0.0.1:3312' was successfully rejoined to the cluster.
    
    		Rejoining '127.0.0.1:3313' to the cluster.
    		Rejoining instance '127.0.0.1:3313' to cluster 'primary'...
    
    		The instance '127.0.0.1:3313' was successfully rejoined to the cluster.
    
    		The cluster was successfully rebooted.
    
    		<Cluster:primary>