Getting Data In

What is the cleanest way to store when a search was last run?

dsollen
Explorer

I have a search which is run to generate data and output the CSV to be processed later by another program. Due to the nature of the other program, I need to ensure that I never output the same data twice in my CSV, or it will count the results twice and give inaccurate scores. So I want my saved search, which runs on some automated time frame, to be written to only return results that were added since the last time the search was run.

What is the cleanest manner to do this? So far I only know of two options, but I don't really like either. The first is to hard code a knowledge of the interval between search runs into the search itself so if my search runs every 8 hours, then add into the search a criteria that looks for _time > current-8 hours. Of course, if someone changes the interval this search runs or runs the search manually, this would screw up.

The second is to save a text file with a "last run" date that is loaded, and make the search look for _time > last_run. However, I don't know how to do this entirely from Splunk. I only know how to do this if I use a separate Python script with Splunk SDK which I can do, but would prefer to not need to.

Is there a cleaner way to maintain an awareness of the last time a search was run so that I only look at any data that was added to Splunk since that time?

0 Karma

hsesterhenn_spl
Splunk Employee
Splunk Employee

Hi,

searching for searches itself the index=_audit would be a better option because you get detailed information about the type of search (scheduled, accelerated, adhoc, etc.).

Your second option could be done using "outputcsv" or other lookup techniques.

HTH,

Holger

0 Karma

martin_mueller
SplunkTrust
SplunkTrust

Firstly, the hard-coded way isn't that bad to do in Splunk - if you have a search running on 0 */8 * * * you just set your time range to be eight hours long and you're pretty much there.

For a different approach, you can query Splunk's index=_internal sourcetype=scheduler savedsearch_name="yoursearch" for the last time it ran, over what time range it was run, and so on. Use that in the next search to calculate the time range this run needs to go through.

Get Updates on the Splunk Community!

Built-in Service Level Objectives Management to Bridge the Gap Between Service & ...

Wednesday, May 29, 2024  |  11AM PST / 2PM ESTRegister now and join us to learn more about how you can ...

Get Your Exclusive Splunk Certified Cybersecurity Defense Engineer Certification at ...

We’re excited to announce a new Splunk certification exam being released at .conf24! If you’re headed to Vegas ...

Share Your Ideas & Meet the Lantern team at .Conf! Plus All of This Month’s New ...

Splunk Lantern is Splunk’s customer success center that provides advice from Splunk experts on valuable data ...