All Apps and Add-ons

Splunk for *NIX datetime fields not extracted

geek238
Engager

While investigating a search to find maximum cpu usage per day for a particular process, I was going to "< search|multikv|etc>|stats max(pctCPU) as MaxPctCPU by date_mday". Except, default datetime fields are not getting generated, this date_mday doesn't exist.

Odd.

If I do a simple "index=os sourcetype=ps" search, timestamp field shows empty, yet _time is populated. Of course, without timestamp, date_* doesn't get populated. Searching for all sourcetypes gives same thing.

Something specific in the app config? I haven't found it so far.

Added:

For this exercise, I have a process 'foo'. Said process 'foo' runs on a farm of hosts. Also, there can be multiple processes of 'foo' running. httpd could be an example.

So, for the first part:


index="os" sourcetype="ps" host="foohosts*" |
``multikv fields pctCPU, COMMAND, host |
search COMMAND=foo |
stats sum(pctCPU) as sumpctCPU by _time,host |
timechart avg(sumpctCPU) as avgpctCPU by host

Where we use stats to sum up the cpu hit for 'foo' for the event and then feed that to timechart, where for a particular day it will average it into 30 minute buckets. Let's say we just tell Splunk to search 'Yesterday'.

To find the maximum value of the day, we can add:

 | stats max() as *

or
timechart span=1d max() as *

and for a single day, they will give you equivalent single row of data, the maximum observed value of the calculated 30 minute averages for Yesterday. Useful if you want to get daily observed maximums of the averaged buckets to, say, plot trendlines over months to get estimates of when capacity runs out.

If I tell Splunk to do the last 7 days, however, I get a different value for Yesterday's max. It's close, but not the same value. Why would it be different?

ADDED FURTHER:

Further exploring, if I used earliest and latest to define the ranges instead of the menu selector, I got more consistent results. And, credit goes to the solution to my original problem. However, just tossing this out there in case someone else runs across this behavior.

(the < pre> tags don't seem to respect the asterisks above that should be in max() and follow as )

0 Karma
1 Solution

gkanapathy
Splunk Employee
Splunk Employee

the date_* fields are not necessarily populated from the _time field. Rather, they come from the raw data, so e.g., if your timestamp comes from the current time, the filename, the file last modification time, or perhaps is in epoch time, it won't have that field. In the case of the Unix app, the timestamp is set from the current time, not from the raw data.

I hate the date_* fields, because they're unreliable, not corrected for time zone, and don't sort properly. Use ... | timechart span=1d max(pctPCU) instead to get the results you want.

View solution in original post

geek238
Engager

In my particular instance, I can. I know there's 4 invocations of the process, as it's single threaded. But, I agree, you do need to know what your data is like.

0 Karma

gkanapathy
Splunk Employee
Splunk Employee

I would recommend you look at the bucket search command to use with stats. I guess I'm not sure that mathematically taking the sum of measurements and the averaging those is that meaningful, unless you can guarantee that you have the same number of measurements in every interval/bucket/host, which I don't think is true for many processes.

0 Karma

gkanapathy
Splunk Employee
Splunk Employee

the date_* fields are not necessarily populated from the _time field. Rather, they come from the raw data, so e.g., if your timestamp comes from the current time, the filename, the file last modification time, or perhaps is in epoch time, it won't have that field. In the case of the Unix app, the timestamp is set from the current time, not from the raw data.

I hate the date_* fields, because they're unreliable, not corrected for time zone, and don't sort properly. Use ... | timechart span=1d max(pctPCU) instead to get the results you want.

geek238
Engager

OK, yep, that points me down the road. I do get some interesting differences in some results.

I'll edit my original question with additional data.

0 Karma
Get Updates on the Splunk Community!

Routing logs with Splunk OTel Collector for Kubernetes

The Splunk Distribution of the OpenTelemetry (OTel) Collector is a product that provides a way to ingest ...

Welcome to the Splunk Community!

(view in My Videos) We're so glad you're here! The Splunk Community is place to connect, learn, give back, and ...

Tech Talk | Elevating Digital Service Excellence: The Synergy of Splunk RUM & APM

Elevating Digital Service Excellence: The Synergy of Real User Monitoring and Application Performance ...