We are currently running TA version 1.3.33 and we are seeing the following "processing error" from our Linux hosts.
ERROR: hostname: test01Detected Bad Nmon structure, found ZZZZ lines truncated! (ZZZZ lines contains the event timestamp and should always begin the line)
addon type: /opt/splunkforwarder/etc/apps/TA-nmon
addon version: 1.3.33
nmon2csv version: 1.2.44
Guest Operating System: linux
NMON OStype: Linux
Perl version: 5.010000
NMON VERSION: 16g
TIME of Nmon Data: 10:40.50
DATE of Nmon Data: 09-JUL-2018
INTERVAL: 60
SNAPSHOTS: 1440
Hello,
Right, that looks strange, some questions:
Can you try on a box where this is happening:
pkill nmon && rm -rf /opt/splunk/var/log/nmon
And wait a few minutes, a new Nmon process would be restarted and processing achieved, check logs
A good test would be to run a manual instance, locale the nmon binary being run by the TA, example:
root 18382 1 0 19:54 ? 00:00:00 /opt/splunkforwarder/var/log/nmon/bin/linux/ubuntu/nmon_x86_64_ubuntu1404 -F /opt/splunkforwarder/var/log/nmon/var/nmon_repository/fifo1/nmon.fifo -T -s 60 -c 1440 -d 1500 -g auto -D -p
From a directory of your choice, run the binary with the "-f" switch, such as:
/opt/splunkforwarder/var/log/nmon/bin/linux/ubuntu/nmon_x86_64_ubuntu1404 -f -T -s 60 -c 1440 -d 1500 -g auto -D -p
This will start in the current directory an *.nmon file, this would be interesting to check its structure, what the processing refuses is the apparently a truncated line containing the Nmon timestamp "ZZZZ" but not starting by it:
grep ZZZZ <nmon_file.nmon>
If you find a ZZZZ line which does not start the line, this would be the root cause of the issue, a more adapted binary version for your particular OS could be required.
Optionally, on your testing box, if you install Python 2.7.x minimal, the processing will switch automatically to Python, which might not be affected, although Perl shouldn't be neither unless there is an unexpected issue with the binary running on this particular host.
Let me know.
Guilhem
Hello Guilhem,
Thanks for the quick reply, below is the answers to your questions. I am working on some of your suggestions now.
Is this is happening on multiple boxes ?
yes, about 116 boxes out of 170
Which Linux distribution and version ?
SUSE Linux Enterprise Server 11 (x86_64) kernal version = 3.0
Is this happening continuously or sporadically ?
We just started testing this app on the 3rd of July, its been up and down since then. 557k events since the 3rd over 100 plus boxes.
Splunk version.
7.0.2 forwarders are 6.4.4
Hi,
I have not been able to reproduce this yet, having deployed several SLES 11.4 (couldn't get closer) and same UF version.
Let me know any outcome of your investigations, I can work on giving you specific instructions to troubleshoot that.
Guilhem
Right thanks, feel free to let me know, I am not aware of any such issue.
Can you have a look at the Addon reporting dashboard, you will see which portion of hosts are running Python versus Perl, in case of the issue affecting Perl converter.
We can chat on Splunk Slack if this can help (https://splunk-usergroups.slack.com/messages/general/)