All Apps and Add-ons

How to extract, with a Scheduled Report, per _indextime instead of per _time?

edoardo_vicendo
Contributor

Hi All,

We are ingesting batch application logs in an index (let's call it "myindex") with a scripted input that run every 5 minutes in our UNIX Production Machine. The scripted input run with this crontab:

*/5 * * * *

so it start at the minute 5th, 10th, 15th etc... and it takes few seconds to run.
After that the logs are ingested by Splunk and indexed in "myindex".

Due to the fact that they are custom logs, we have written several regular expression to extract valuable fieds, but due to the fact that the regular expression are too much, and they are applied at Search Time, this is slowing down the SPL execution on large data subsets.

For this reason we have implemented a Scheduled Report that save the output in a Summary Index (let's call it "mysummaryindex").
The Scheduled Report run every 5 minutes, extracting the last 5 minutes of data, with this crontab set-up in Splunk:

2,7,12,17,22,27,32,37,42,47,52,57 * * * *

so it start at the minute 7th, 12th, 17th etc... and it takes few seconds to run.
We have delayed the run of 2 minutes to give time to Splunk to index the data.

Here below you can find a schema of what we have explained above:

alt text

So now it comes the question 🙂
Due to the fact that there are long running batch jobs, the _time of our events represent the "start" of the batch jobs, and when they will be indexed in "myindex" they could refer to the past.
For this reason we have to find a way to "force" the Scheduled Report to extract based on _indextime instead of _time.

I have look at this post:
https://answers.splunk.com/answers/171/using-indextime-to-specify-time-range.html

and tried to apply it to the SPL in the Scheduled Report as follow:

index="myindex" sourcetype="mysourcetype" host="myhost" AND source="myfiles.*"
_index_earliest=-5m@m _index_latest=@m
| rex field=_raw "myregularExpression1..."
| rex field=_raw "myregularExpression2..."
etc...

but it seems it is missing some data (saying that it is even not very simple to compare what it should have pick up and what it has summarized).

Do you see something wrong?

Thanks a lot,
Edoardo

0 Karma
1 Solution

edoardo_vicendo
Contributor

Hi All,

I realized how to solve this issue.
Basically the below instruction:

_index_earliest=-5m@m _index_latest=@m

works perfectly but in Splunk this piece of SPL code does not drive the "Splunk Time range picker" as I was expecting. In fact I haven't found a way till now to override it.

So for my case it is important that the Splunk Time range picker is set-up in this way:

  • Last 48 hours: it is needed because some events have _time in the past (we are expecting no jobs take more than 48 hours to finish)
  • Earliest: Beginning of hour --> this is NOT mandatory but it is more conservative, in this way it will always round to the lower limit of the current hour
  • Latest: Now --> this is mandatory (to avoid missing events)

So with 48 hours of time range (to avoid missing any event) and with "_index_earliest=-5m@m _index_latest=@m" you will be able to extract only the events indexed in the last 5 minutes (that it is what I was looking for) and this _index_earliest/latest Time Modifier command is improving incredibly the query performance.

Hope this can help you!

View solution in original post

0 Karma

edoardo_vicendo
Contributor

Hi All,

I realized how to solve this issue.
Basically the below instruction:

_index_earliest=-5m@m _index_latest=@m

works perfectly but in Splunk this piece of SPL code does not drive the "Splunk Time range picker" as I was expecting. In fact I haven't found a way till now to override it.

So for my case it is important that the Splunk Time range picker is set-up in this way:

  • Last 48 hours: it is needed because some events have _time in the past (we are expecting no jobs take more than 48 hours to finish)
  • Earliest: Beginning of hour --> this is NOT mandatory but it is more conservative, in this way it will always round to the lower limit of the current hour
  • Latest: Now --> this is mandatory (to avoid missing events)

So with 48 hours of time range (to avoid missing any event) and with "_index_earliest=-5m@m _index_latest=@m" you will be able to extract only the events indexed in the last 5 minutes (that it is what I was looking for) and this _index_earliest/latest Time Modifier command is improving incredibly the query performance.

Hope this can help you!

0 Karma
Get Updates on the Splunk Community!

Introducing the 2024 SplunkTrust!

Hello, Splunk Community! We are beyond thrilled to announce our newest group of SplunkTrust members!  The ...

Introducing the 2024 Splunk MVPs!

We are excited to announce the 2024 cohort of the Splunk MVP program. Splunk MVPs are passionate members of ...

Splunk Custom Visualizations App End of Life

The Splunk Custom Visualizations apps End of Life for SimpleXML will reach end of support on Dec 21, 2024, ...