Solved: After upgrading to 1.5.17, Solaris hosts throwing ...

sohnaeo · ‎04-30-2015

Hello,

After upgrading nmon to 1.5.17 and TA-nmon to 1.2.09 , splunk doesnt get any data from Soalris hosts. After looking into the logs , I can see the error below

05-01-2015 16:10:49.083 +1000 ERROR ExecProcessor - message from "/opt/splunkforwarder/etc/apps/TA-nmon/bin/nmon_helper.sh" pfiles: cannot examine 5285: no such process

05-01-2015 16:10:49.096 +1000 ERROR ExecProcessor - message from "/opt/splunkforwarder/etc/apps/TA-nmon/bin/nmon_helper.sh" /opt/splunkforwarder/etc/apps/TA-nmon/bin/nmon_helper.sh: test: argument expected

05-01-2015 16:11:49.097 +1000 ERROR ExecProcessor - message from "/opt/splunkforwarder/etc/apps/TA-nmon/bin/nmon_helper.sh" pfiles: cannot examine 5285: no such process

Could you please fix this in next release or please explain what has been changed in the latest release?

guilmxm · ‎05-05-2015

Hi,

A corrective release has been published under the version 1.5.19, available on Splunk App, it has confirmed to solve the issue.

The bug has been introduced in Version 1.5.17 and was due to new options added in nmon.conf, specially for Solaris hosts, the way default and local nmon.conf were sourced by the third party input script nmon_helper.sh has been changed to prevent this issue from affecting Nmon data generation.

Thank you for you help.

Guilhem

View solution in original post

guilmxm · ‎05-05-2015

Hi,

A corrective release has been published under the version 1.5.19, available on Splunk App, it has confirmed to solve the issue.

The bug has been introduced in Version 1.5.17 and was due to new options added in nmon.conf, specially for Solaris hosts, the way default and local nmon.conf were sourced by the third party input script nmon_helper.sh has been changed to prevent this issue from affecting Nmon data generation.

Thank you for you help.

Guilhem

guilmxm · ‎05-03-2015

Hello,

A new release V1.5.18 has published on Splunk App, this version includes a new TA-nmon version with the V1.2.10.
This new TA version will prevent from verifying non existing processes, by first checking proc fs.

Now if you don't receive anymore data from your Solaris host, which is surprising and very annoying, could you please:

1. Verify splunkd log of one Solaris host

If there is any critical failure from nmon2csv converters for example, that should be visible here (archive broken pipe message)

Directly on the host log file, or better with splunk:

index=_internal sourcetype=splunkd host=<myhost>

2. Verify "nmon_collect" sourcetype of that host, this sourcetype contains output of nmon_helper.sh which generates the nmon raw file

index=nmon sourcetype=nmon_collect

2.1 if you can, verify directly on the host that a sarmon process exists

ps -ef | grep sarmon

3. Verify "nmon_processing" sourcetype, it contains the processing output of nmon2csv converters, scripts that generate perf data from raw nmon file

There should messages indicating the number of line per perf metric, plus useful information for that host

4. Finally ensure you don't have the perf data, and config data in:

index=nmon sourcetype=nmon_data hostname=<myhost>

index=nmon sourcetype=nmon_config hostname=<myhost>

Note: you can use the field "host" or "hostname", it's the same.

Have you tried to restart a UF on one Solaris host, for testing purposes ?
Is splunkd set to restart in the App configuration of your deployment server ?

I will be happy to help you if you are still in trouble, we would exchange by mail, you'll get my mail on the Help page accessible from the icon marker on the App home page.

I have recently made important updates for Solaris in last versions, and so i am very interested in having people using the App for Solaris management 🙂

Guilhem

guilmxm · ‎05-01-2015

Hi sohnaeo,

That's interesting, for sure i will get this fixed.
One first question, is performance data correctly generated for your Solaris hosts ? (even with this error being reported)

Since the version 1.2.07 of the TA-nmon, the nmon_helper.sh script tries to better identifying Nmon instances that are related to the Nmon App to prevent from killing non app Nmon insances. (for example if people also collects nmon data with their own third party process)

On Solaris, this uses pfiles to identify resources in use by Nmon instances, it looks like it is trying to gain information of a non existing process.

When this error occurs, is the PID reported the same than the Nmon PID running ? (look in nmon_collect sourcetype to get the current PID, or directly on the host in $SPLUNK_HOME/var/run/nmon/nmon.pid)

Guilhem

After upgrading to 1.5.17, Solaris hosts throwing an error

Announcing Scheduled Export GA for Dashboard Studio

Extending Observability Content to Splunk Cloud

More Control Over Your Monitoring Costs with Archived Metrics GA in US-AWS!