Split Brain Syndrome in RAC

Where two or more instances attempt to control a cluster database.

Each node in this cluster are interconnected through private interconnect  and end users connects to cluster through public network. When nodes are physically up and running and Database Instance on each of these servers is also running but private interconnect fails between  two or more nodes, Instance member in RAC cluster fail to ping or connect to each other, then due to lack of communication in private interconnect , instance thinks that the other instance that is not able to connect  is down and both instance works independently. The individual nodes are running fine and can accept user connections and work independently.

So, when a node fails, the failed node is prevented from accessing all the shared disk devices and groups. This methodology is called I/O Fencing, Disk Fencing or Failure Fencing.

You will see oracle error: ORA-29740, when there is a node eviction in RAC.

There are many reasons for a node eviction like heart beat not received by the controlfile, unable to communicate with the Clusterware etc.

The CSS (Cluster Synchronization Service) daemon in the clusterware maintains the heart beat to the voting disk.

What is I/O fencing

It is provided by the kernel-based fencing module (vxfen), performs identically on node failures and communications  failures. The node tries to eject the key for departed nodes from the coordinator disks using the pre-empt and abort command. When the node successfully ejects the departed nodes from the coordinator disks, it also ejects the departed nodes from the data disks. In a split-brain scenario, both sides of the split would race for control of the coordinator disks. The side winning the majority of the coordinator disks wins the race and fences the loser. The loser then panics and restarts the system.

Now, who will decide which node will survive and which node will face fencing ??

The answer is Voting Disk.

In a split brain situation, voting disk will be used to determine which node(s) survive and which node(s) will be evicted.

Check for number of nodes in RAC Server:

[root@rac1 bin ~]# olsnodes -s -n
host01  1       Active
host02  2       Active
host03  3       Inactive

Now we are in a state to understand the use of voting disk in case of heartbeat failure.

Suppose in a 3 node cluster with 3 voting disks, a network heartbeat fails between Node 1 and Node 3 & Node 2 and Node 3 whereas Node 1 and Node 2 are able to communicate via interconnect, and from the Voting Disk CSSD notices that all the nodes are able to write to Voting Disks thus split-brain, so the healthy nodes Node 1 & Node 2 would would update the kill block in the voting disk for Node 3. Then when during pread() system call of CSSD of Node 3, it sees a self kill flag set and thus the CSSD of Node 3 evicts itself. And then the I/O fencing and finally the OHASD will finally attempt to restart the stack after graceful shutdown.

M	T	W	T	F	S	S
						1
2	3	4	5	6	7	8
9	10	11	12	13	14	15
16	17	18	19	20	21	22
23	24	25	26	27	28	29
30