Fixing Clusters Marked as Health Down in Highly Loaded Environments


This article helps users avoid failovers and undesired behavior due to high load.


Affected Versions

ScaleArc all versions, MySQL cluster with very highly loaded dataservers



Access to ScaleArc UI


Root Cause

In very highly loaded environments, dataservers that are part of a cluster may stop responding to "SHOW SLAVE STATUS" queries in a timely fashion. These queries are run continuously to validate replication status and health. Failure to respond to these queries causes ScaleArc to consider the dataserver as down/unhealthy with the consequences (not sending traffic to that server if Standby-Read, or even trigger a failover if this happens on the master Read+Write server.



  1. Login to ScaleArc UI
  2. On the Clusters Section for the cluster to modify click Cluster Settings button
  3. On the Server tab change Health Check interval from 3 to 10 or 20, depending on the observed response time to "SHOW SLAVE STATUS" queries as shown below. This slow down the failover detection time in favor of stable operation. However, ScaleArc still detects real server down or replication broken situations and acts accordingly.





If successful, users should not get any alerts or failovers. At a minimum, the number of incidents should decrease.


Content Author: Miguel Molina



Please sign in to leave a comment.