The CSS misscount parameter represents the maximum time, in seconds, that a network heartbeat can be missed before entering into a cluster reconfiguration to evict the node. The following are the default values for the misscount parameter and their respective versions when using Oracle Clusterware* in seconds:
OS 10g (R1 &R2) 11g
Linux 60 30
Unix 30 30
VMS 30 30
Windows 30 30
*CSS misscount default value when using vendor (non-Oracle) clusterware is 600 seconds. This is to allow the vendor clusterware ample time to resolve any possible split brain scenarios.
CSS HEARTBEAT MECHANISMS AND THEIR INTERRELATIONSHIP
The synchronization services component (CSS) of the Oracle Clusterware maintains two heartbeat mechanisms
1.) the disk heartbeat to the voting device and
2.) the network heartbeat across the interconnect which establish and confirm valid node membership in the cluster.
Both of these heartbeat mechanisms have an associated timeout value. The disk heartbeat has an internal i/o timeout interval (DTO Disk TimeOut), in seconds, where an i/o to the voting disk must complete. The misscount parameter (MC), as stated above, is the maximum time, in seconds, that a network heartbeat can be missed. The disk heartbeat i/o timeout interval is directly related to the misscount parameter setting. There has been some variation in this relationship
between versions as described below:
9.x.x.x
NOTE, MISSCOUNT WAS A DIFFERENT ENTITY IN THIS RELEASE
10.1.0.2
No one should be on this version
10.1.0.3
DTO = MC - 15 seconds
10.1.0.4
DTO = MC - 15 seconds
10.1.0.4+Unpublished Bug 3306964
DTO = MC - 3 seconds
10.1.0.4 with CRS II Merge patch
DTO =Disktimeout (Defaults to 200 seconds) Normally OR Misscount seconds only during initial Cluster formation or Slightly before reconfiguration
10.1.0.5
IOT = MC - 3 seconds
10.2.0.1 +Fix for unpublished Bug 4896338
IOT=Disktimeout (Defaults to 200 seconds) Normally OR Misscount seconds only during initial Cluster formation or Slightly before reconfiguration
10.2.0.2
Same as above (10.2.0.1 with Patch Bug:4896338
10.1 - 11.2
During node join and leave (reconfiguration) in a cluster we need to reconfigure, in that particular case we use Short Disk TimeOut (SDTO) which is in all versions SDTO = MC – reboottime (usually 3 seconds)
Misscount drives cluster membership reconfigurations and directly effects the availability of the cluster. In most cases, the default settings for MC should be acceptable. Modifying the default value of misscount not only influences the timeout interval for the i/o to the voting disk, but also influences the tolerance for missed network heartbeats across the interconnect.
OS 10g (R1 &R2) 11g
Linux 60 30
Unix 30 30
VMS 30 30
Windows 30 30
*CSS misscount default value when using vendor (non-Oracle) clusterware is 600 seconds. This is to allow the vendor clusterware ample time to resolve any possible split brain scenarios.
CSS HEARTBEAT MECHANISMS AND THEIR INTERRELATIONSHIP
The synchronization services component (CSS) of the Oracle Clusterware maintains two heartbeat mechanisms
1.) the disk heartbeat to the voting device and
2.) the network heartbeat across the interconnect which establish and confirm valid node membership in the cluster.
Both of these heartbeat mechanisms have an associated timeout value. The disk heartbeat has an internal i/o timeout interval (DTO Disk TimeOut), in seconds, where an i/o to the voting disk must complete. The misscount parameter (MC), as stated above, is the maximum time, in seconds, that a network heartbeat can be missed. The disk heartbeat i/o timeout interval is directly related to the misscount parameter setting. There has been some variation in this relationship
between versions as described below:
9.x.x.x
NOTE, MISSCOUNT WAS A DIFFERENT ENTITY IN THIS RELEASE
10.1.0.2
No one should be on this version
10.1.0.3
DTO = MC - 15 seconds
10.1.0.4
DTO = MC - 15 seconds
10.1.0.4+Unpublished Bug 3306964
DTO = MC - 3 seconds
10.1.0.4 with CRS II Merge patch
DTO =Disktimeout (Defaults to 200 seconds) Normally OR Misscount seconds only during initial Cluster formation or Slightly before reconfiguration
10.1.0.5
IOT = MC - 3 seconds
10.2.0.1 +Fix for unpublished Bug 4896338
IOT=Disktimeout (Defaults to 200 seconds) Normally OR Misscount seconds only during initial Cluster formation or Slightly before reconfiguration
10.2.0.2
Same as above (10.2.0.1 with Patch Bug:4896338
10.1 - 11.2
During node join and leave (reconfiguration) in a cluster we need to reconfigure, in that particular case we use Short Disk TimeOut (SDTO) which is in all versions SDTO = MC – reboottime (usually 3 seconds)
Misscount drives cluster membership reconfigurations and directly effects the availability of the cluster. In most cases, the default settings for MC should be acceptable. Modifying the default value of misscount not only influences the timeout interval for the i/o to the voting disk, but also influences the tolerance for missed network heartbeats across the interconnect.
No comments:
Post a Comment