Getting Data In

Is it possible to report on a server's CPU, network, and memory utilization in Splunk?

wellhung
Explorer

Hi,

I have been looking at network tools such as PTRG, Zabbix, etc. to do weekly reports on Windows servers and a few in house Apps. None of them can do what I want without some heavy customizations.

I found Splunk by chance. I have already installed Splunk Light and have couple Windows servers forward Application and System Events as well as text logs from our Apps. Already I can I see the possibilities.

I am wondering if anyone can provide me with some answers:

As mentioned before, the goal right now is to do weekly reporting and not necessarily active monitoring.

What I want to report on:

  1. When a server's CPU usage goes for over 80% for more than 20 seconds record that event, adding the top 5 processes using the most CPU at that time period. Is this possible? I understand that a query might return perfmon counters that can be filtered with the above conditions, but I am not sure if it can include the top processes. Can this be offloaded to a powershell script and append it to the report?
  2. Same questions but for Network and Memory utilization.
  3. As mentioned before, I added a custom log file into the universal forwarder. This log file is regularly rolled over. The file name to be monitored will always be "file.log", after it gets to a certain size it will be backed up and renamed to file.log.1. My question: does Splunk understand when the file it's monitoring is being moved and then recreated, is there a chance of missing new data (lines) during roll overs?

Thanks!

0 Karma

sundareshr
Legend

These online docs should help answer your questions. The short answer is yes, Splunk can handle the scenarios you describe. Splunk has a very robust list of SPL commands to transforms timeseries data into meaning/actionable reports/dashboards.

http://docs.splunk.com/Documentation/Splunk/6.2.1/SearchReference/Commandsbycategory

This link explains how Splunk handles log rotation

http://docs.splunk.com/Documentation/Splunk/6.4.1/Data/Howlogfilerotationishandled

wellhung
Explorer

Hi,

I have this: :

sourcetype="Perfmon:CPU Load" counter="% Processor Time" earliest="-7d@d"| bucket _time span=20s | stats avg(Value) as avgCPU by _time | where avgCPU >80

Seems to work, had to lower the avgCPU to test.

Reading your link though I have yet to find a way to include the top processes whenever the CPU goes above the threshold. I don't think perfmon keeps that data.

I was hoping the Universal Forwarder, seeing it had the option to monitor CPU on setup (I suppose using Perfmon...), maybe has the ability to run scripts on a threshold and append the output of the scripts into the events sent to Splunk.

I mean it is a lot of documentation, and I was admittedly skimming based on the names of those commands . Did I miss anything? Can I get some direction.

Thanks!

0 Karma

sundareshr
Legend
0 Karma

wellhung
Explorer

That looks about right but I can't seem to make it work.

On each servers's universal forwarder I added this to \etc\system\local\wmi.conf:

[WMI:SessionProcess]
interval = 10
disabled = 0
index = perfmon_index
wql = Select ProcessId, SessionId From Win32_Process

And to \etc\system\local\inputs conf

[perfmon://Process]
interval = 10
object = Process
counters = % Processor Time; ID Process; Working Set - Private; IO Read Operations/sec; IO Write Operations/sec
instances = *
index= perfmon_index
disabled = 0
mode = multikv

And restarted the UniversalForwarder service. It doesn't seem to forward any processes.

Also in the samples inside the default wmi.conf and index.conf, the index is usually "perfmon" and not "perfmon_index". Which one should it be?

Any ideas?

Thanks!

0 Karma
Get Updates on the Splunk Community!

Routing logs with Splunk OTel Collector for Kubernetes

The Splunk Distribution of the OpenTelemetry (OTel) Collector is a product that provides a way to ingest ...

Welcome to the Splunk Community!

(view in My Videos) We're so glad you're here! The Splunk Community is place to connect, learn, give back, and ...

Tech Talk | Elevating Digital Service Excellence: The Synergy of Splunk RUM & APM

Elevating Digital Service Excellence: The Synergy of Real User Monitoring and Application Performance ...