Splunk Search

Filtering on multiple fields based on a stats subsearch

saboobaker
New Member

I have multiple log sources that are appended on a daily basis. All rows in one refresh have same epoch time. I would like to select all values from each log source based on last epoch time.
I get the last collection epoch by following stats
sourcetype=my_sourcetype | fields epochtime, source | stats max(epochtime) as lasttime by source

This gives me a list of epoch value with each source.
Next, I am trying to filter based on above as follows
sourcetype=my_sourcetype |rename epochtime as lasttime | join type=inner lasttime, source[ search sourcetype=my_sourcetype | fields epochtime, source | stats max(epochtime) as lasttime by source ]

Expecting that it would filter like SQL filter where a.lasttime=b.lasttime and a.source=b.source but it seems like I get for each epoch, multiple events from the main query.

How do I change this query to filter based on the results in stats?

0 Karma

DalJeanis
Legend

Okay, so first, go read this, so you understand how to think about Splunk queries. Thinking in terms of SQL will result in extremely inefficient code in Splunk, every time.

https://answers.splunk.com/answers/561130/sql-analogy-for-my-log-search-join-two-tables-wher.html


Second, if you just want the last event of each source, then use this and you are done...

sourcetype=my_sourcetype 
| dedup source

Explanation - dedup picks the first record of each combination of keys. In this case, the only key given is source, so it picks the first record of each source.

By default, Splunk returns the most recent records first. Thus, the most recent of each will be chosen.


Third, if there is a chance that some source has stopped being sent, and the most recent date will not have any of that source, and you want to get rid of that source because the date doesn't match, then do this...

sourcetype=my_sourcetype 
| dedup source
| eventstats max(epochtime) as lasttime 
| where epochtime = lasttime

Explanation: eventstats is like stats, except it does not delete the underlying events, it just copies the results onto every event that matches the keys. In this case, we don't need any key, because all the events from the most recent load have the same epochtime value. If any remaining event does NOT have the same value, then it means it is a source that is left over from a prior load.


Fourth, here's a free bonus method that will sometimes (or even often) be more efficient than dedup.

sourcetype=my_sourcetype 
| stats latest(epochtime) as epochtime latest(someotherfield) as someotherfield by source
| eventstats max(epochtime) as lasttime 
| where epochtime = lasttime

Explanation: within a stats-type command, latest(x) will pick the value of x from the record with the highest value for _time. Stats can be more efficient in some cases because it can be partially calculated at the indexes, limiting the amount of data that gets transferred to the search head.

Test this version against the third version and inspect the results to see which one is more efficient in your particular situation.

saboobaker
New Member

Thank you for the answer with an alternate solution.

However, I was looking for help on Join and why does the join stated in question does not work. Is the join with more than one fields not supported? Or is it something else incorrect in the way I am joining the results.

Thanks.

0 Karma

DalJeanis
Legend

@saboobaker - I gave you all those because this structure is extremely inefficient. I don't see any issues with the theory, though.

First, put white space between the source[ just in case the subsearch does not return a starting space. If that works, you are done.

Second, check to see whether you are running out of time or space. That should not be the case, since the subsearch is only returning two fields. Nonetheless, it is a theoretical possibility. Look at the job inspector and look for language about the job being complete or incomplete, and how many events were scanned and how many returned. If that looks normal, then check the search log to see what happened there.

0 Karma
Get Updates on the Splunk Community!

Introducing the 2024 SplunkTrust!

Hello, Splunk Community! We are beyond thrilled to announce our newest group of SplunkTrust members!  The ...

Introducing the 2024 Splunk MVPs!

We are excited to announce the 2024 cohort of the Splunk MVP program. Splunk MVPs are passionate members of ...

Splunk Custom Visualizations App End of Life

The Splunk Custom Visualizations apps End of Life for SimpleXML will reach end of support on Dec 21, 2024, ...