Disclaimer: I'm not saying this particular example is useful analysis - I'm just not sure how to think about solving a problem like this in Splunk properly.
I have thousands of events of Zabbix Data where socket-wide data points are normalized into individual events. i.e. system.cpu.util[socket,core,type] across heterogeneous hardware configurations (i.e. # of sockets or # of cores are different).
I want to understand the distribution of the load across a socket by machine modeltype to ensure it matches up to temperature readings - and then flag outliers (either on temperature or idle cores).
I've seen tricks around extracting the itemKey into named Variables which I think works because the timestamp is exactly the same.... but how do you run stats on variables that might not exist? (i.e. socket 4 or core 20?)
Does any of this make sense?
... host=hostname |eval socket=if(isnull(socket),"null",socket) | timechart avg(value) max(value) by socket
AND
... host=hostname | eval core=if(isnull(core),"null",core)| timechart avg(value) max(value) by core
should be fine for a host by host basis. Both would work well on a dashboard with a drop down list to select the hostname etc.
... |eval socket=if(isnull(socket),"null",socket) | eval core=if(isnull(core),"null",core)| stats avg(value) max(value) by host core socket
The above should be fine for an analyst to select specific time ranges with time picker and see if activity spikes occured, etc.
Thank you for the helpful suggestion. I'm looking for more aggregate trends across a class of hosts with different underlying hardware models - which sort of precludes individual host analysis with eyeballs...
Do provide some sample data.
It's not very exciting (one row per pseudo-event):
_time,host,itemKey="system.cpu.util[user_utilization,#socket,#core,]",value=int
....
_time,hostN,itemKey="system.cpu.util[user_utilization,#socket,#core,]",value=int
Currently, the query uses rex to extract the #socket/#core are extracted to new variables via Rex...
What will the field value contains?
an integer value from 1- 100. representing utilization ... the equiv of /proc/stat
Is list of possible socket/core fixed?
It is hard to predict the socket/core count... but it is a finite set.