The SenSage AP collector Loader's speed is degrading. It cannot keep up with the incoming load stream, resulting in a major backlog. Files in the log queue are taking too long to process/load times are very long.
Each troubleshooting section below corresponds to action(s) in the flowchart above. After each step, check whether it resolves the issue. If not, move on to the next step.
Check SLS Logs and Fix iconv Errors
Check the SLS/EDW logs i.e.
<sensage_path>/var/log/sls/sls-<date>.logfor errors. Is there an iconv error?
<RawLineSplitterEp.cpp:202:TF_COMMON:ERROR! (t=1621556304951859) throw CSysErr( CRawLineSplitterEp::convert, iconv, 84 (Invalid or incomplete multibyte or wide character), Lines completed: 896, 0090097)
Iconv errors can prevent the collector and SLS from loading files correctly so they should be fixed using this KB article: Fixing collector not loading some files with an iconv error.
Do manual file loads to identify degraded nodes
Shut down the collector and do some manual file loads and compacts to check which nodes are taking longer. When some nodes in the cluster are degraded, their average load times will be very high compared to the normal nodes. For example, if the load average is less than 10 in all nodes, in the affected nodes it might spike to 25 or even 100. CPU wait time can go up to 2-4%, whereas other nodes have only 0.1% wait time.
Reboot affected nodes
Stop the collector, the SLS cluster, and with the help of the SA (SysAdmin) team, reboot the affected nodes and see if load performance improves.
Stop external processes
Review the running processes in the affected nodes and stop any processes like antiviruses or other modules that might be affecting performance.
Fix storage/IO performance
If the above troubleshooting steps do not resolve the issue, it is most likely an issue with the storage configuration/ IO performance on the affected nodes. Storage issues are resolved by the Storage/IO team by either correcting the storage configuration or, as a last resort, migrating to new storage, as explained in Loader Speed Degrading Due to IO Performance Issues.