ScaleArc High Availability (HA) deployments in MSSQL environments reported unsuccessful failover and failing cluster traffic in both nodes after upgrading to the v2020.6 release.
This article describes the root cause and solution for ScaleArc customers running the affected version.
- ScaleArc 2020.6
- HA pair setup
- MSSQL cluster
Customers upgrading to ScaleArc v2020.6 for MSSQL in HA deployments will experience unsuccessful failover when attempting to do a failover to the secondary instance or when promoting a secondary instance to be primary.
Navigate to SETTINGS > HA Settings on the ScaleArc dashboard on the primary node and click on Switch to Secondary. Alternatively, navigate to the same screen from the secondary node and click on Force to be Primary.
In both instances, the HA failover will be unsuccessful and the following error is displayed:
The issue was confirmed to be a customer defect affecting HA deployments (SCALEARC-16233) which was resolved in ScaleArc v2020.8 release.
The defect was causing new connections to be created between Primary and Secondary nodes at a rate of about 1-2 connections per second, which are never torn down. This continues until all the available network connections in the system are depleted causing the outage in the primary node.
A quick workaround to stop the connections from increasing out of control is to stop the ScaleArc services in the Secondary node and leave the primary node running in as Standalone, thereby effectively losing the HA configuration.
Implement this workaround to stop ScaleArc on the Secondary node by navigating to SETTINGS > System Settings and clicking on the STOP ScaleArc button under the ScaleArc Commands section
The above workaround will restore services but without HA (High Availability) as ScaleArc is now running as Standalone and should therefore only be a temporary fix pending upgrade.
To restore HA, it is necessary to upgrade both instances to ScaleArc v2020.8 or later as this release included a fix for this defect.
Obtain v2020.8 or later from the Release Portal and carry out the upgrade on both primary and secondary nodes before restoring the HA configuration.