HA Service Failure Resulting in Core dumping

Overview

Proper DNS resolution is vital for the ScaleArc HA (High Availability) service as well as the fencing mechanism to function properly. If any database servers are added to a ScaleArc fencing cluster using a hostname or Fully Qualified Domain Name (FQDN) that ScaleArc is unable to resolve to an IP address, the DNS resolution failure will affect the fencing mechanism of the HA service and ScaleArc will begin core dumping continuously after server reboot.

This article provides the symptoms and resolution when the HA service is unable to determine the HA status resulting in ScaleArc core dumping caused by DNS resolution errors.

 

Diagnosis

Fencing (Split-brain resolution) is an important mechanism for successful failover when the primary node in a ScaleArc HA setup becomes unavailable.

Failover in a HA configuration will not work when ScaleArc is unable to successfully resolve the hostname or FQDN of the database servers configured in the fencing cluster and ScaleArc will report that the HA service is stopped when you navigate to SETTINGS > HA Settings on the UI.

HA_down.png

Further details on the HA failure root cause should be obtained by examining the core dumping logs located in /data/logs/services/error.log.

Below is an extract of a sample error.log file with typical DNS resolution errors that can result in the core dumping after server reboot:

ERROR _resolve_hostname():1645 ClusterMonitor(1): Error resolving host db01.ignitetech.com: [Errno -2] Name or service not known
ERROR _assign_order_to_imported_mysql_accounts():1445 ClusterMonitor(1): Problem in resolving host address for user ptc with host db01.ignitetech.com
ERROR _resolve_hostname():1645 ClusterMonitor(1): Error resolving host db02.ignitetech.com: [Errno -2] Name or service not known
ERROR _assign_order_to_imported_mysql_accounts():1445 ClusterMonitor(1): Problem in resolving host address for user ptc with host db02.ignitetech.com
ERROR _resolve_hostname():1645 ClusterMonitor(1): Error resolving host db01.ignitetech.com: [Errno -2] Name or service not known
ERROR _assign_order_to_imported_mysql_accounts():1445 ClusterMonitor(1): Problem in resolving host address for user ptc with host db01.ignitetech.com
ERROR _resolve_hostname():1645 ClusterMonitor(1): Error resolving host db02.ignitetech.com: [Errno -2] Name or service not known
ERROR _assign_order_to_imported_mysql_accounts():1445 ClusterMonitor(1): Problem in resolving host address for user ptc with host db02.ignitetech.com
...
ERROR _assign_order_to_imported_mysql_accounts():1445 ClusterMonitor(1): Problem in resolving host address for user ptc with host db01.ignitetech.com
ERROR _resolve_hostname():1645 ClusterMonitor(1): Error resolving host db02.ignitetech.com: [Errno -2] Name or service not known
ERROR _assign_order_to_imported_mysql_accounts():1445 ClusterMonitor(1): Problem in resolving host address for user ptc with host db02.ignitetech.com
ERROR _resolve_hostname():1645 ClusterMonitor(1): Error resolving host db01.ignitetech.com: [Errno -2] Name or service not known
ERROR _assign_order_to_imported_mysql_accounts():1445 ClusterMonitor(1): Problem in resolving host address for user ptc with host db01.ignitetech.com
ERROR _resolve_hostname():1645 ClusterMonitor(1): Error resolving host db02.ignitetech.com: [Errno -2] Name or service not known
ERROR _assign_order_to_imported_mysql_accounts():1445 ClusterMonitor(1): Problem in resolving host address for user ptc with host db02.ignitetech.com

 

Solution

This issue occurs when the configured external DNS servers become unreachable or unable to resolve the database server hostnames / FQDN.

Ensure that the configured external DNS servers are reachable from ScaleArc and that they can successfully resolve the configured ScaleArc node FQDNs or hostnames on both the primary and secondary nodes. You can test DNS lookups by querying the external DNS servers from ScaleArc using hostor the nslookupcommands.

Correct any name resolution errors by configuring the correct DNS server or fixing any network connectivity issues to the external name servers that may be present.

You can review the DNS configuration by navigating to SETTINGS > Network Settings on the ScaleArc dashboard.

Settings_-_Network_Settings.png

To avoid ScaleArc name resolution issues due to external DNS dependencies, you can configure local DNS entries for the database, fencing, and ScaleArc nodes in the /etc/hosts file in both the primary and secondary ScaleArc instances so that ScaleArc is able to do the name resolution locally without having to depend on the external DNS servers, for example:

<IP_address> mysql-db01.mydomain.com mysql-db01
<IP_address> mysql-db02.mydomain.com mysql-db02
<IP_address> fencingdb.mydomain.com fencingdb
<IP_address> scalearc01.mydomain.com scalearc01
<IP_address> scalearc02.mydomain.com scalearc02

Adding these entries removes the DNS dependency from the external servers and will ensure ScaleArc HA continues to work uninterrupted even when the external DNS lookups fail, provided the IP addresses of the database servers remain unchanged.

Refer to Configuring Local DNS Entries for more information on how to configure local DNS using the /etc/hosts file.

Back to top

Comments

0 comments

Please sign in to leave a comment.