Apache Ambari Issues

Troubleshoot Apache Ambari issues for Big Data Service clusters.

Ambari Showing Already Removed DataNode as Dead in HDFS Summary and Host Level Health Alarm Opened, Causing LCM Operations to Fail

Troubleshoot a removed DataNode causing LCM operations failure in Ambari for a Big Data Service cluster.

After decommissioning and deleting a DataNode on a host manually, Ambari shows a critical alert with the message, DataNode Health Summary [live='x' stale='0' dead='1']. Because of the host level alarm, it's considered as unhealthy and causes LCM operations failure.

NameNode doesn't clean up the dead DataNode by itself. Cleanup the Dead node manually:

  1. Restart NameNodes to clear up the Dead DataNode:
    1. Access Apache Ambari.
    2. From the side toolbar, under Services select HDFS.
    3. Summary tab, select NameNode.
  2. For the name node, select the ... icon under Action, and then select Restart.
    Note

    If the cluster is an HA cluster, complete the previous steps for each Name Nodes.
  3. Refresh configurations.
    1. Go to the Hosts section and select the affected host.
    2. Select Actions > Selected Hosts > Hosts > Refresh All Configs.