Ambari agents unable to communicate with Ambari server

Overview

Ambari agents are suddenly unable to communicate with the Ambari server and report Agent Heartbeat. As a result, all services/components become flagged in a yellow state in the Ambari web UI, with grayed-out/unavailable start and stop buttons, as in the image below.

Environment/Affected versions: 

Sensage AP 6.X/2017.X/2017.8X

 

Solution

Root Cause

The hawkeye-deploy commands, that are used to perform operations via the command-line, are not being successful because the ambari-server service is unable to communicate properly with the ambari-agent. The root cause is that the internal Ambari SSL certificates used to secure communication between Ambari-server and Ambar-agents were either corrupted or expired.

Resolution

This procedure below assumes that the certificates signed by the Ambari CA are replaceable, which is generally the case for certificates used by Ambari agents for 2-way SSL connections. At the end, the Ambari server and all the agents will be restarted, causing a new CA certificate to be created along with new SSL certificates for each of the Ambari agents.

On the Ambari server:

  • Stop the Ambari server
  • Backup /var/lib/ambari-server/keys and it child directories
  • Delete the following files from /var/lib/ambari-server/keys (WARNING: do not remove ca.config file)
    • ca.key
    • ca.csr
    • ca.crt
    • pass.txt
    • keystore.p12
    • *.csr
    • *.crt
  • Delete the following files from /var/lib/ambari-server/keys/db
    • index.txt.old
    • index.txt.attr.old
    • serial.old
  • Truncate the following files from /var/lib/ambari-server/keys/db
    • index.txt
    • index.txt.attr
  • Edit the following files from /var/lib/ambari-server/keys/db
    • serial
      • set the contents to be exactly
        • 00
  • Delete all files under /var/lib/ambari-server/keys/db/newcerts
  • Restart Ambari server.

On each Ambari agent host:

  • Stop the Ambari agent
  • Backup /var/lib/ambari-agent/keys and it child directories
  • Delete the following files from /var/lib/ambari-agent/keys
    • ca.crt
    • *.crt
    • *.csr
    • *.key
  • Restart Ambari agent

The Ambari web UI should then resume reporting the node component status correctly in green, as shown below.

​​

Related articles:

Content Author: Miguel Molina

Comments

0 comments

Please sign in to leave a comment.