The Cluster Settings screen is divided into 9 sections/tabs which are: CLUSTER, CLIENT, SERVER, ScaleArc, CACHE, AUTO FAILOVER, SSL, Advanced Settings, and DEBUG.
This article explains the configuration settings under the AUTO FAILOVER tab which provides various settings to simplify the failover process by automating it to a slave, or another master within the same ScaleArc cluster without having to manage conflicts.
Whenever the last Read/Write server goes down in a particular cluster, Auto-failover promotes the standby server to become the master. A slave could be added as Standby Traffic or Standby No-Traffic. A Standby Traffic is a server that is used in load balancing under normal operation but is promoted to a Write server when the primary server goes down. A Standby No-Traffic server is meant to receive no traffic during normal operation until it is promoted to a Write server. The choice taken will depend on the number of database servers in your environment.
Depending on your specific circumstances, you may choose to use either ScaleArc-based Auto-Failover or leverage one of the available External APIs to manage failover.
Review these options carefully to select the best configuration for your environment:
- On the ScaleArc dashboard, locate the Status column and click on the Cluster Settings button.
- Click the AUTO FAILOVER tab in the Cluster Settings dashboard to configure this feature.
Then configure the fields as follows:
Field Description Default/User input Auto-Failover
When a principal server fails, Auto-Failover aims to simplify and automate the failover process to a standby server or another principal in the same cluster.
Turn ON/OFF. Failure Timeout (seconds)
The amount of time (in seconds) a principal should be detected as down before failover is triggered. If the principal remains down past the specified time, the system triggers the failover and promotes either a slave or another principal to take over the job of the downed principal server. Make sure the timeout entries are appropriate for your environment.
Default value is 2 seconds Flip Flop Timeout (seconds)
The minimum time (in seconds) to wait between fail-over events. This can be used to reduce flip-flops and avoid frequent failovers. Although making this timeout too high can result in increased downtime for the cluster, in which case the failover takes place after the specified flipflop timeout. Override the timeout by using the switchover functionality by which failover can be manually forced, regardless of the flip-flop timeout.
Default value is 600 seconds. Failover Type
ScaleArc supports ScaleArc Based auto-failover. ScaleArc also supports custom scripts (External API) that the system can trigger as part of the Auto Failover.
Select ScaleArc Based OR External API
ScaleArc Based failover
ScaleArc based failover lets you change roles within the ScaleArc system in case your servers are already setup as Active and Secondary Read-Write servers. By using this functionality you can rely on ScaleArc to do the failover and upgrade the Secondary Read-Write servers as Primary when the master is down.
- Select the ScaleArc Based button for failure type.
Configure as follows:
Field Description Default/User input ScaleArc Based ScaleArc performs the failover and upgrades the Secondary Read-Write server as Primary in case the master is down. Select the ScaleArc based option. Switch Delay Time
The time (in seconds) to wait before the standby database is promoted to active read-write after demoting the current active read-write server within the given cluster. A high value of the switch delay time will result in higher downtime for the particular cluster.
Default is 5 seconds. Wait For Sync
This functionality makes sure that in case of a failover event, the newly promoted active read-write server is in sync with the original read-write server. The failover waits till this process is carried out and so this can cause increased failover time.
Default setting is ON. Retry Attempts
The number of attempts to check if the replication is in sync between the original read-write server and the to be promoted active stand by server.
Default is 3 attempts. Retry Interval
The retry interval (in seconds) between consecutive retry attempts.
Default is 1 second. Force Failover
Performs failover in the case of unsuccessful wait for sync or SQL I/O errors or replication errors. This option can result in data loss and increase recovery time.
Default setting is OFF. Switchover
This is used to trigger a manual failover of Active Read-Write server to the next available standby server. This option is used for zero-downtime maintenance of database servers.
The External API lets you call out to an external script which in turn promotes one of your existing slaves to a principal and informs ScaleArc which server to promote to Active Read/Write role.
If you chose this option, follow these steps to configure it.
- Select External API as the Failover Type.
Configure as follows:
Field Description Default/User input External API call Makes a call to an external script to promote one of the existing slaves to a master. Select the external script from the drop-down list. External Script Uploads/Downloads an external API script. This script is invoked when the system triggers a failover in place of ScaleArc's pre-configured failover logic. See below. Retry Attempts Indicates the number of retries the system executes. Default is 3 attempts. API timeout (seconds) The script executes within the API timeout specified. Default is 60 seconds. SwitchOver to Other DC
This option will send a flag to the external failover API indicating switchover to the DataCenter(DC).
Trigger When Standby or Read Server is Down
When enabled the failover module will get triggered in case of any server being detected as down.
Default setting is OFF.
3. Click APPLY ALL to save the changes or click LOAD DEFAULTS to reset to default values.
Initiate an External API script
You need to locate and edit the External API script to tailor it to your environment before you can upload it.
- Click on the External API field to locate the script from the drop-down menu.
- Click on the Download button to save the script named PGSQL_master_slave_template.php.zip to your local drive.
- Extract the compressed archive to obtain a PGSQL_master_slave_template.php file. Edit it carefully with your preferred text editor to show the following: your ScaleArc appliance's IP address, the username and password for ScaleArc, and the server API key.
- Rename and save the file once you have completed editing.
- Use the Upload button to upload the edited script then click Yes to confirm. A successful upload displays this message:
- Complete the remaining fields.
- Click Switchover.
- Enter the Connection Bleedoff Time manually. This represents the time allowed for client connections to complete their outstanding queries to allow for a graceful server connection switchover. Client connections with outstanding workload/transactions that continue beyond the connection bleed-off time will be reset and need to reconnect.
- ScaleArc posts this message when it initiates the switchover:
- If the switchover fails, you'll receive an error notification in the Events tab on the ScaleArc dashboard:
- Correct the error and re-initiate the switchover.