Splunk Search

How to have my summary index only index on the calculated value?

burras
Communicator

I have a sourcetype that has a tremendous amount of data - we use this data to calculate an overall number of calls per second on any given day and ensure that we're staying within our licensing capacity. This data is collected every 30 seconds for a large number of devices and we need granularity down to the 30s level when looking at daily data. I'm able to successfully timechart over the course of a given day with no issues using the following search:

index=foo sourcetype=bar |timechart span=30s eval((sum(field1)+sum(field2)+sum(field3)+(sum(field4)/10)/30) as CPS

When we start looking at data over a longer timeframe, we still need to maintain the ability to identify peaks at the 30s granularity, but we no longer need to see every 30s period. My goal was to take the data from the above search and send it to a summary index. We don't need all of the underlying data after we calculate the CPS, just the timestamp and the CPS value itself for that timestamp. Our goal is to display a long-term view that only shows the max for any given day but that could be drilled into to see what specific time that timestamp happened.

I configured a summary index on my instance using the following search:

index=foo sourcetype=bar earliest=-3d@d latest=2d@d |sitimechart span=30s eval((sum(field1)+sum(field2)+sum(field3)+(sum(field4)/10)/30) as CPS

This search is scheduled to run daily and lags slightly to allow for daily data delivery. It runs just fine. The problem is that after summarization, there is no entry in the summary_index for CPS - instead it has the values for each of the subfields. When I want to search against the summary index for CPS, I then need to do all of the calculations over again - and as soon as I start going over a timeframe greater than one day, we start running into issues of too many rows being returned in the search. This are the searches I've tried:

index=summary_foo search_name="foo_cps_summary" |timechart span=1d max(CPS) -- returns no values

index=summary_foo search_name="foo_cps_summary" |timechart span=30s eval((sum(field1)+sum(field2)+sum(field3)+(sum(field4)/10)/30) as CPS |timechart span=1d max(CPS) -- works great as long as I do it for one day, but returns too many rows over a longer timeframe since it continues to calculate each 30s interval individually

I'm sure I probably just screwed something easy up in the search that I'm summarizing against. What am I missing so that I can summarize only the CPS value for each 30s span and not anything else and then be able to search on it?

0 Karma
1 Solution

rjthibod
Champion

You have run into part of the challenges associated with the si- commands. They cannot intelligently figure out what you may or may not want to do with the individual fields you use in the internal eval() operation to caclulate your CPS field.

Here is my recommendation for one way of writing the summary search to simplify how you write the reporting search in the way you want.

index=foo sourcetype=bar earliest=-3d@d latest=2d@d 
| fields _time field1 field2 field3 field4
| bin _time span=30s
| fillnull value=0 field1 field2 field3 field4
| stats sum(field*) as field* by _time
| eval CPS = field1 + field2 + field3 + (field4/10/30)
| fields _time CPS
| sitimechart span=30s sum(CPS) as CPS

View solution in original post

niketn
Legend

You can alternatively try collect command to push data to summary index through scheduled search.

1) Create your search with timechart. Earliest and latest time to be -3d@d and -2d@d
2) Since you have a minimal bucket of 30 sec which is not whole timebucket like 1 day, 1 hour or 1 minute, you would need to have lowest bucket defined.
3) Create your own timestamp %Y/%M/%d %H:%M:%s for every 30 sec. Overwrite _time field with your own Time.
4) Save search as Scheduled Search to run every day at appropriate time.
5) Use Settings > Search, reports and alerts to edit Scheduled Search.
6) Using collect command send stats to your summary index and choose addTime parameter to add _time to indexed data.

https://docs.splunk.com/Documentation/Splunk/latest/SearchReference/Collect

____________________________________________
| makeresults | eval message= "Happy Splunking!!!"
0 Karma

rjthibod
Champion

You have run into part of the challenges associated with the si- commands. They cannot intelligently figure out what you may or may not want to do with the individual fields you use in the internal eval() operation to caclulate your CPS field.

Here is my recommendation for one way of writing the summary search to simplify how you write the reporting search in the way you want.

index=foo sourcetype=bar earliest=-3d@d latest=2d@d 
| fields _time field1 field2 field3 field4
| bin _time span=30s
| fillnull value=0 field1 field2 field3 field4
| stats sum(field*) as field* by _time
| eval CPS = field1 + field2 + field3 + (field4/10/30)
| fields _time CPS
| sitimechart span=30s sum(CPS) as CPS

burras
Communicator

Giving this a shot, I'll let you know how it works out. In the line: stats sum(field*) as field* by _time, if the field names aren't actually field1, field2, etc., but actually more complicated names without overlap, I assume I would need to run multiple stats commands to populate that? (i.e. stats sum(HICRCount) as HICR, sum(IPTFCount) as IPTF by _time)? Would I need to specify by time for each of them? Or can that be handled globally at the end?

0 Karma

burras
Communicator

So, working through setting this up and ran into an issue - everything through line 5 appears to be working (I see the truncated fields list in my events) but the eval CPS command in line 6 doesn't appear to be calculating correctly (or at all).

I modified line 5 to account for the different field names like this: stats sum(a1) as a1count, sum(b2) as b2count... and then have line 6 as eval CPS = (a1count + b2count ... + (f6count/10)/30) but when I timechart it I get the list of _time's but no concurrent CPS calculation. And in the events I see no field for CPS - just the a1 - f6 fields that were created with the stats command.

0 Karma

burras
Communicator

Okay, I played around with this a little more and renamed all of my crazily named fields to actually be field1, field2, etc. so that I could use your exact syntax. Lo and behold, once I did that it worked perfectly. Must have been something that I was doing wrong with the field names.

Long and short of it, however, is that using the syntax you provided above I was able to successfully store just the CPS field and now search on it because of the more limited returned number of fields. Thanks for the help!

rjthibod
Champion

Glad it was useful. Sorry I was unable to replay sooner. I was out of office on vacation for the past 6 days.

0 Karma
Get Updates on the Splunk Community!

.conf24 | Registration Open!

Hello, hello! I come bearing good news: Registration for .conf24 is now open!   conf is Splunk’s rad annual ...

ICYMI - Check out the latest releases of Splunk Edge Processor

Splunk is pleased to announce the latest enhancements to Splunk Edge Processor.  HEC Receiver authorization ...

Introducing the 2024 SplunkTrust!

Hello, Splunk Community! We are beyond thrilled to announce our newest group of SplunkTrust members!  The ...