Getting Data In

File based statistics collection which have events covering large windows of time.

Lucas_K
Motivator

I have a scheduled search that create statistics based on individual files. These searches run once per hour.

ie. a log comes in. Stats are generated about the event inside this file.

My problem is that events in this file can on average range from a few hours old to DAYS old (oldest one I found was 45 but closer to a dozen days).

So the main problem is that events are being inserted into the past. I had previously looked at the answers posts in regard to late events and none of them helped me.

I have found a way around this issue by specifically only generating statistics for that batch file once I see a specific "HDR" (header) text marker. By only looking for specific source files the search runs very fast and only looks at applicable events. Also by specifying a large search window manually (to override the 1 hour time window used by the savedsearch) it is able to capture all the events that were originally contained inside that source file.

ie. index=ivr [search index="ivr" "HDR," earliest=-1h latest=now| fields source ] earliest=-365d latest=+365d | dedup _raw | REX field=source "(?ain\S+[a-zA-Z0-9]+$)" | bucket _time span=1h | stats first(date_mday) AS FirstDay, last(date_mday) AS LastDay, first(date_month) AS Month, first(date_year) AS Year, last(date_hour) AS Hour, last(date_minute) AS Minute, first(_time) AS File_Time, some other stats calcs here by Type Batch

So in a nutshell "if you see a header marker find ALL the events from ALL(or the equivalent of) time for that particular source file and calculate statistics on it."

Now this is fine and actually captures the data "per file" but has the side effect of breaking how summary index results are searched for.

By hardcoding the earliest and latest parameters the summary index is now saving its stash entries with dates from 1 year ago. So if a search for summary results run in the last week is performed I will get no results. However if I check 1 year ago there they are. This is because the "info_min_time" is used by the normal search as the "_time" value by default. As this is metadata I don't think there is anyway that I can change it for already created summary index results. So its not possible for me to do something like "| rename info_min_time as _time".

Does anyone know how I can do trick a search summary results search to use another time field to search on OR how to modify a scheduled search to perform searches outside of its running time window without breaking the summary index results as I already have?

Tags (1)
0 Karma
1 Solution

Lucas_K
Motivator

Actually. I think I might be able to just use "rename info_max_time AS _time" in the summary search itself.

edit: trying " | addinfo | eval info_min_time=File_Time" instead.

edit2: none of these work as the savedsearch addinfo takes precidence when the stash is saved to the index. Using "addinfo" just create a second set of info_xxxx_time fields that are ignored in later searches on the summary index.

A work around is to have an "earliest=0" in my summary index search then using more evals to shuffle the fields around for display. Ugly as hell and also breaks the ability to get something useful out of the "view results" link 😞

edit3: ok the solution of putting earliest=0 into savesearches.conf fixes the issue of the savedsearch recreating incorrect info_xxxx_time fields. So a normal search on the summary index (ie. last hour) will correctly show summary searches run within this time. yay.

View solution in original post

0 Karma

Lucas_K
Motivator

Actually. I think I might be able to just use "rename info_max_time AS _time" in the summary search itself.

edit: trying " | addinfo | eval info_min_time=File_Time" instead.

edit2: none of these work as the savedsearch addinfo takes precidence when the stash is saved to the index. Using "addinfo" just create a second set of info_xxxx_time fields that are ignored in later searches on the summary index.

A work around is to have an "earliest=0" in my summary index search then using more evals to shuffle the fields around for display. Ugly as hell and also breaks the ability to get something useful out of the "view results" link 😞

edit3: ok the solution of putting earliest=0 into savesearches.conf fixes the issue of the savedsearch recreating incorrect info_xxxx_time fields. So a normal search on the summary index (ie. last hour) will correctly show summary searches run within this time. yay.

0 Karma
Get Updates on the Splunk Community!

ICYMI - Check out the latest releases of Splunk Edge Processor

Splunk is pleased to announce the latest enhancements to Splunk Edge Processor.  HEC Receiver authorization ...

Introducing the 2024 SplunkTrust!

Hello, Splunk Community! We are beyond thrilled to announce our newest group of SplunkTrust members!  The ...

Introducing the 2024 Splunk MVPs!

We are excited to announce the 2024 cohort of the Splunk MVP program. Splunk MVPs are passionate members of ...