Best Practices for VM Migration

VM Migration

Below are some special considerations to keep in mind when Auto-Migrating ScaleArc appliances in virtualized environments (Hyper-V and VMware ESX):

  • ScaleArc does not recommend or support voluntary or automatic Live Migration (vMotion in the case of VMWare) of an active (that is, HA Primary or standalone ScaleArc) appliance from one ESX or Hyper-V server to another. Testing shows that the live migration process results in lost database updates, application server errors, and potentially-corrupted database records. Live Migration/vMotion is not fast enough to support real-time operations that ScaleArc counts on.
  • ScaleArc supports cold migration. First, the ScaleArc appliance should be shut down before you begin the migration. Next, you need to configure static MAC addresses on all virtual NICs on the appliance. Virtual switches must be configured to both accept and propagate ARP updates, or the appliance may not be accessible from the network, following migration.
  • Note that this configuration is not available in Hyper-V Manager; it can only be completed through Microsoft System Center's Virtual Machine Manager, SCVMM. For details on configuring both the network interfaces and the virtual switches, see Network Configuration of ScaleArc VMs in Hyper-V
  • If you have configured ScaleArc for HA, the HA Secondary may be Live Migrated (or vMotioned), provided you have configured Static MAC and Static IP and the virtual switches to accept and propagate routing/forwarding ARP updates. To move both members of a HA pair, move the HA Secondary, perform a manual HA switchover, then move the newly-demoted HA Secondary.
  • For maximum availability in the event of system or network failures, ScaleArc recommends you separate the two appliances of a HA pair onto two Hyper-V servers (or two ESX servers for VMware environments). In case of a failure, rather than attempting to move the HA Primary off of a failing Hyper-V/ESX server, perform an HA switchover from the UI of the HA Secondary, then attempt to shut down the newly demoted HA Secondary on the failing Hyper-V/ESX server. Before bringing up the appliance that was running on the failed Hyper-V/ESX server, contact ScaleArc Support for assistance to avoid problems. In an emergency, the HA Secondary detects the failure of the HA Primary and takes over the traffic. This happens much faster than a Live Migration/vMotion.
  • Should you encounter an emergency on a Hyper-V/ESX server running a standalone ScaleArc you have little choice but to attempt a Live Migration/vMotion. In this case, contact ScaleArc Support to assist you in recovery immediately. Make sure you immediately collect a log dump using the instructions in Capturing ScaleArc logs; use the --date syntax, with no hour specified. The logs should be collected within an hour of the problem occurring to ensure the best assistance by avoiding the logs rolling over and being overwritten.
  • ScaleArc does not recommend or support Automatic VM migration through VMware DRS (Distributed Resource Scheduler.
  • Linux-based operating systems such as ScaleArc's base OS use udev rules to associate logical device names for network interfaces (that is, eth0) with physical device names or UUIDs (which translate to MAC addresses). During the migration of a virtual machine from one Hyper-V server to another, the new Hyper-V server, when it is configured by default, changes the MAC address of the Ethernet interface to a new randomly-generated address. This prompts the Linux operating system to ignore the network interface, which fails to restart along with the virtual machine following a migration, resulting in the ScaleArc appliance becoming incapable of communicating on the network. You can avoid this problem by assigning a static MAC address to the network interface. 

ScaleArc VM provisioning at the ESX level

It is important to consider how ScaleArc VMs are set up at the Hypervisor level to use "Resource Pools" that has "Reservation."

For instance, on VMware, Guaranteed CPU or memory allocation for a given resource pool can be set up. A non-zero reservation is subtracted from the unreserved resources of the parent (host or resource pool). The resources are considered reserved, regardless of whether virtual machines are associated with the resource pool. Defaults to 0. Such a reservation would avoid overloading of ESX leading to a situation of triggering vMotion.

For instance, if an unrelated, non-ScaleArc VM, is manually moved from ESX-B to ESX-A where ESX-A is hosting ScaleArc and if this VM being moved is large and comes online, it might overload ESX-A to the extent of halting the CPU queue and starving all the VMs on ESX-A (including ScaleArc).

Furthermore, if ScaleArc is not excluded as the candidate for vMotion, such a situation might trigger a vMotion of ScaleArc to another ESX host.

vMotion and similar technologies with ScaleArc HA

With vMotion and similar technologies, it is very risky moving ScaleArc nodes in HA, as it could very easily lead to a 'split-brain' situation. For this reason, ScaleArc strongly discourages migrating ScaleArc HA Primary server. Manual monitoring would be needed at all times till the activity is complete.

ScaleArc recommends the following approach:

  1. Ensure the machine being moved is a Secondary (if you require to move the Primary server, then HA Switch it to become the Secondary before vMotion).
  2. Stop all ScaleArc services and HA service on this machine.
  3. SSH into the ScaleArc machine to run the following commands:

     sudo service heartbeat stop
     /etc/init.d/idblb stop
     /etc/init.d/idb_watchdog stop
     /etc/init.d/analytics stop  
  4. Complete the vMotion or similar activity with this machine. Ensure the machine is not rebooted. (If rebooted, stop the HA service).
  5. Check all network connectivity, including network connectivity with the database servers and the Primary ScaleArc.
  6. Monitor the response lag between the Primary and Secondary ScaleArc. If it is within the HA-configured parameters, proceed; else, fix the networking issue or modify the HA configuration parameters to match the new setup.
  7. If all of the above steps have been checked and verified, start the HA service and the ScaleArc services on the Secondary machine, or preferably reboot the Secondary machine.

     sudo service heartbeat start
     /etc/init.d/idblb start
     /etc/init.d/idb_watchdog start
     /etc/init.d/analytics start
  8. Once the services are up, this machine is in HA as the Secondary.

Using VMware snapshots with ScaleArc appliances

VMware disk snapshot technology with an active HA Primary or standalone ScaleArc appliance can compromise performance and requires that the virtual disk be quiesced briefly during snapshot consolidation. This can cause the ScaleArc virtual machine to pause during consolidation, resulting in application server errors. Therefore, ScaleArc recommends that you keep snapshots only briefly, such as during backup of the virtual disks. This should be done at a time of low activity to reduce the possibility of errors. A brief snapshot also consolidates more quickly. 

Neither VMware nor ScaleArc recommends using snapshots as a backup strategy.

We do not recommend reverting a ScaleArc appliance to a snapshot because of the complex interplay between live system configuration and configuration files on disk.

Back to top

Comments

0 comments

Please sign in to leave a comment.