Knova Is Not Loading for Users or Admins

Overview

You experience an outage - Knova is not loading for users or admins and multiples service restarts are not fixing the issue. Additional possible symptoms:

  • KRetriever.exe is not starting and Admin Interface is not loading.
    The following error is found in KRetriever logs:
    GroupManagerDove not found by KNameservice
  • The Knova Knowledge Central webpage is continually spinning without loading.
  • Additionally, Wildfly CPU usage may increase significantly (e.g. up to 50%).

Solution

This issue may be caused due to the following reasons:

  • Antivirus or a firewall may be blocking TCP/IP ports that are used by Knova.
    To resolve this issue, follow the next steps:
    1. Stop Knova Tomcat.
    2. Update the reserved ports to 1024-65534.
    3. Start the Tomcat service.
      Note: If Tomcat is stuck (not restarting), you need to restart the server where Knova is running.

  • The leader role was reassigned to an incorrect Solr instance (replica).

    Solr service is not controlled by the restart scripts like all other Knova services. Instead, it is automated to start whenever Windows starts. Each server has a Solr service instance running on it. Solr instances coordinate their work and are led by a leader instances. Normally, that leader is the Solr instance installed on the admin server. However, this leadership can shift to another instance if the leading Solr is not available even for a very short time (e.g. the admin server gets bounced while an application server is not).

    Wildfly services on each server communicate with leader Solr constantly and if leadership is shifted, the communication breaks because the non-leader Solr instances (replicas) don’t have the required resources or configuration information (e.g. database connection).

    You need to fix the Solr leadership:
    1. Stop Solr replica instances.
    2. Run the following API to assign the leader role to the correct Solr instance:
      /admin/collections?action=FORCELEADER&collection=<collectionName>&shard=<shardName> 
      In a few minutes, the correct instance should be assigned as a leader, which will restore the normal communication between Wildfly and Solr services.

Comments

0 comments

Please sign in to leave a comment.