This is just a fun optimization question. The benefit may be very little in fact! My Splunk searches are already optimized joining 24 million events across 3 sourcetypes in just about 40 seconds searching over 30 days by using the stats method for joining data. - https://conf.splunk.com/files/2019/slides/FNC2751.pdf However, before I do all the join operations using stats, I have to first use stats latest() to ensure each event is the latest. That is because all my sourcetypes have historical data, but has unique identifiers. Not all sourcetypes have data every single day, so I have to look back at least 30 days to get a reasonably complete picture. Here's an example stats latest(): <initial search>
| fields _time, xxx, xxx, <pick your required fields>
| eval coalesced_primary_key=coalesce(sourcetype_1_primary, sourcetype_2_primary, sourcetype_3_primary)
| stats
latest(*) AS *
by coalesced_primary_key The total events in the index before the implicit search (first line) is run are 24,000,000 events. After the implicit search, but before stats latest() is run, I have 13,000,000 events total. After stats latest() is run, total becomes 750,000 events. What if the "stats latest" pipe was skipped altogether? By somehow making the implied search (first line) to return only the latest events. In other words, cutting the event total from 24,000,000 to 750,000 events directly? That can optimize the query to be much faster if this is possible. I have the unique primary keys for each sourcetype already, so it would be the idea of using latest(sourcetype_1_primary) but in the first line implicit search. I'm afraid my Splunk knowledge doesn't help me there, and googling doesn't seem to pull up anything.
... View more