Hello,
After upgrading nmon to 1.5.17 and TA-nmon to 1.2.09 , splunk doesnt get any data from Soalris hosts. After looking into the logs , I can see the error below
05-01-2015 16:10:49.083 +1000 ERROR ExecProcessor - message from "/opt/splunkforwarder/etc/apps/TA-nmon/bin/nmon_helper.sh" pfiles: cannot examine 5285: no such process
05-01-2015 16:10:49.096 +1000 ERROR ExecProcessor - message from "/opt/splunkforwarder/etc/apps/TA-nmon/bin/nmon_helper.sh" /opt/splunkforwarder/etc/apps/TA-nmon/bin/nmon_helper.sh: test: argument expected
05-01-2015 16:11:49.097 +1000 ERROR ExecProcessor - message from "/opt/splunkforwarder/etc/apps/TA-nmon/bin/nmon_helper.sh" pfiles: cannot examine 5285: no such process
Could you please fix this in next release or please explain what has been changed in the latest release?
Hi,
A corrective release has been published under the version 1.5.19, available on Splunk App, it has confirmed to solve the issue.
The bug has been introduced in Version 1.5.17 and was due to new options added in nmon.conf, specially for Solaris hosts, the way default and local nmon.conf were sourced by the third party input script nmon_helper.sh has been changed to prevent this issue from affecting Nmon data generation.
Thank you for you help.
Guilhem
Hi,
A corrective release has been published under the version 1.5.19, available on Splunk App, it has confirmed to solve the issue.
The bug has been introduced in Version 1.5.17 and was due to new options added in nmon.conf, specially for Solaris hosts, the way default and local nmon.conf were sourced by the third party input script nmon_helper.sh has been changed to prevent this issue from affecting Nmon data generation.
Thank you for you help.
Guilhem
Hello,
A new release V1.5.18 has published on Splunk App, this version includes a new TA-nmon version with the V1.2.10.
This new TA version will prevent from verifying non existing processes, by first checking proc fs.
Now if you don't receive anymore data from your Solaris host, which is surprising and very annoying, could you please:
1. Verify splunkd log of one Solaris host
If there is any critical failure from nmon2csv converters for example, that should be visible here (archive broken pipe message)
Directly on the host log file, or better with splunk:
index=_internal sourcetype=splunkd host=<myhost>
2. Verify "nmon_collect" sourcetype of that host, this sourcetype contains output of nmon_helper.sh which generates the nmon raw file
index=nmon sourcetype=nmon_collect
2.1 if you can, verify directly on the host that a sarmon process exists
ps -ef | grep sarmon
3. Verify "nmon_processing" sourcetype, it contains the processing output of nmon2csv converters, scripts that generate perf data from raw nmon file
There should messages indicating the number of line per perf metric, plus useful information for that host
4. Finally ensure you don't have the perf data, and config data in:
index=nmon sourcetype=nmon_data hostname=<myhost>
index=nmon sourcetype=nmon_config hostname=<myhost>
Note: you can use the field "host" or "hostname", it's the same.
Have you tried to restart a UF on one Solaris host, for testing purposes ?
Is splunkd set to restart in the App configuration of your deployment server ?
I will be happy to help you if you are still in trouble, we would exchange by mail, you'll get my mail on the Help page accessible from the icon marker on the App home page.
I have recently made important updates for Solaris in last versions, and so i am very interested in having people using the App for Solaris management 🙂
Guilhem
Hi sohnaeo,
That's interesting, for sure i will get this fixed.
One first question, is performance data correctly generated for your Solaris hosts ? (even with this error being reported)
Since the version 1.2.07 of the TA-nmon, the nmon_helper.sh script tries to better identifying Nmon instances that are related to the Nmon App to prevent from killing non app Nmon insances. (for example if people also collects nmon data with their own third party process)
On Solaris, this uses pfiles to identify resources in use by Nmon instances, it looks like it is trying to gain information of a non existing process.
When this error occurs, is the PID reported the same than the Nmon PID running ? (look in nmon_collect sourcetype to get the current PID, or directly on the host in $SPLUNK_HOME/var/run/nmon/nmon.pid)
Guilhem