Knowledge Management

Run summary index every 5 minutes

fk319
Builder

I can run a summary index every hour with a "-h@h" to "@h". How can I run a sumary index more often than once an hour without creating several entries?

I would like to run the cron on the 5n+1 minute and then use a time frame of "-(5m)@(5m)" to "@(5m)"

1 Solution

gkanapathy
Splunk Employee
Splunk Employee

You can get the effect with:

dispatch.earliest_time = -6m
dispatch.latest_time   = -1m
cron_schedule = 1,6,11,16,21,26,31,36,41,46,51,56 * * * *

The time offset will be calculated according to scheduled time, so it won't matter if the job is delayed, but you would have to change the range if you changed the schedule to, e.g., 5n+2 minutes

Unfortunately there is no "snap-to" for anything but whole minutes, hours, etc. It would be a good enhancement request to have a snap-to time range specifier for 5m (or 30s, or 15min, etc).

View solution in original post

alanden_splunk
Splunk Employee
Splunk Employee

You will have issues with partial buckets if you have a span defined in the search like:

| bucket _time span=5m

If the search takes more than one minute then you will not be happy with:

-5m@m to @m

The scheduling for snapping to 5 minutes looks like:

@h-5m to @h cron 0 * * * *
@h to @h+5m cron 5 * * * *
@h+5m to @h+10m cron 10 * * * *
@h+10m to @h+15m cron 15 * * * *
...
@h+50m to @h+55m cron 55 * * * *

Simply increase the number after the "cron" to allow a few more minutes for incoming data to get indexed before the search kicks off.

0 Karma

martin_mueller
SplunkTrust
SplunkTrust

I wouldn't recommend running latest=@h on a 0 * * * * schedule - you'll inevitably miss events, even without any failures in your input pipeline. A good band-aid is doing what Gerald recommended eight years ago: introduce a minute of indexing delay buffer.

I also wouldn't recommend defining twelve searches, just go with 1-59/5 * * * * or Gerald's long-form version of listing out all twelve minutes-past-the-hour when to run. The buffer minute from above will make sure you're always snapped to five-minute intervals... IF the original question from eight years ago was using bin span=5m, who knows.

0 Karma

alanden_splunk
Splunk Employee
Splunk Employee

I appreciate your feedback and I'll simply respond with the same commentary I gave another customer regarding a similar case operating on a 10 min interval:

After some testing to verify the saved search mechanics, I have verified that the time snaps of the saved search do not change with delays in the search time execution, and thus the extra work of creating the six searches per hour may not seem necessary (though not harmful either). If a search is scheduled to run at 3:10 and snap to the minute (-10m@m to @m), and it runs at 3:12, the time range of the search will be from 3pm to 3:10pm because it acts as if it was actually run at 3:10 despite the 2 minute delay. That said, having the six searches increases the time period allowed for search kick-off (or completion) delays from 10 minutes to 60 minutes. As we have seen with the user search queue limitations, if a search is delayed past the time of the next scheduled run, it will run the next scheduled run and skip the others.

Also, we should consider the delay of indexed data. If a search runs over the last 10 minutes, it is possible that some of that data has not been received and indexed by the time the search kicks off. Increases the time after the search window helps ensure that all of the data is seen but also increases the delay in seeing the data. The current data delay is between 0-10min (and even infinity for data not indexed prior to the search). If you give 5 min for the latest data in a given period to finish indexing, then the delay in seeing the data is between 5-15min. Thus a search on :00-:10 will run at :15 on the hour. If you want to change the cron to add a slight delay to the search for its search period, you’ll need to consider how much time is acceptable for your team.

Thus while the other approach appears to work fine, this approach allows for longer delays and other potential issues, especially for searches across much smaller time spans where such delays become profoundly more noticable.

0 Karma

fk319
Builder

I am using this to build a summary index. I ran into issues where I was duplicating data, so I want to be careful how I run my script. Currently I am getting my data once an hour, but during an issue, I will want updates much quicker, I think 5 minutes is a good frequency.

0 Karma

gkanapathy
Splunk Employee
Splunk Employee

You can get the effect with:

dispatch.earliest_time = -6m
dispatch.latest_time   = -1m
cron_schedule = 1,6,11,16,21,26,31,36,41,46,51,56 * * * *

The time offset will be calculated according to scheduled time, so it won't matter if the job is delayed, but you would have to change the range if you changed the schedule to, e.g., 5n+2 minutes

Unfortunately there is no "snap-to" for anything but whole minutes, hours, etc. It would be a good enhancement request to have a snap-to time range specifier for 5m (or 30s, or 15min, etc).

ftk
Motivator

Not sure I understand your question -- what is holding you back from just defining that schedule?

0 Karma
Get Updates on the Splunk Community!

Index This | I am a number, but when you add ‘G’ to me, I go away. What number am I?

March 2024 Edition Hayyy Splunk Education Enthusiasts and the Eternally Curious!  We’re back with another ...

What’s New in Splunk App for PCI Compliance 5.3.1?

The Splunk App for PCI Compliance allows customers to extend the power of their existing Splunk solution with ...

Extending Observability Content to Splunk Cloud

Register to join us !   In this Extending Observability Content to Splunk Cloud Tech Talk, you'll see how to ...