Splunk Search

Why does my query blow-up in size with a join?

NickJLange
Explorer

Why does my query blow-up in size with a join?
I have a query which without a join (for further analysis) runs in 2MB with 200K events. I added a metadata inner join on hostname (or so I think...) to add two new fields to the output for timechat bucketing. Splunk now tells me I'm using 500Mb with 200K events.

Something is messed up. I don't speak splunk debug... any easy ideas on what might be going on?

also, would converting this metadata from a search index to a lookuptable increase performance?

Tags (1)
0 Karma
1 Solution

martin_mueller
SplunkTrust
SplunkTrust

I'm assuming there's also some reporting commands like stats following the join, right?

If you do search | stats, all the heavy lifting happens on the indexers. The search head only gets a small result set back to combine.
If you do search | join | stats, the indexers have to return tons of data to the search head that then gets to do a huge join and simple stats.

To solve this, move your metadata to a lookup file, define that lookup file as automatic for your sourcetype, and search like this:

earliest=-9d index=os_nix_perf itemKey="cpuLoad" hostname=hostwildcard* (color=Purple OR color=Red) | stats ...

The indexers can apply the lookup, filter accordingly, run the stats, and only return a small result set back.

If for some inexplicable reason you cannot move the metadata to a lookup file, at least move the join after the stats:

earliest=-9d index=os_nix_perf itemKey="cpuLoad" hostname=hostwildcard* | stats something by hostname | join hostname [search index=metadata (color =Purple) or (color=Red) | fields hostname, color | dedup hostname]

Then the set that needs to be returned to the search head and joined there is fairly small.

View solution in original post

martin_mueller
SplunkTrust
SplunkTrust

I'm assuming there's also some reporting commands like stats following the join, right?

If you do search | stats, all the heavy lifting happens on the indexers. The search head only gets a small result set back to combine.
If you do search | join | stats, the indexers have to return tons of data to the search head that then gets to do a huge join and simple stats.

To solve this, move your metadata to a lookup file, define that lookup file as automatic for your sourcetype, and search like this:

earliest=-9d index=os_nix_perf itemKey="cpuLoad" hostname=hostwildcard* (color=Purple OR color=Red) | stats ...

The indexers can apply the lookup, filter accordingly, run the stats, and only return a small result set back.

If for some inexplicable reason you cannot move the metadata to a lookup file, at least move the join after the stats:

earliest=-9d index=os_nix_perf itemKey="cpuLoad" hostname=hostwildcard* | stats something by hostname | join hostname [search index=metadata (color =Purple) or (color=Red) | fields hostname, color | dedup hostname]

Then the set that needs to be returned to the search head and joined there is fairly small.

NickJLange
Explorer

Thank you both. The source data is more or less zabbix data across several thousand hosts. The "join" data is proprietary metadata about individual hostname values used for bucketing.

i.e. Show me the system load of all purple hosts, where the default events in the zabbix index don't know the hosts are purple.

As I can't post the query here - I can simply say it looks like this:

earliest=-9d index=os_nix_perf itemKey="cpuLoad" hostname=hostwildcard* | join hostname [search index=metadata (color =Purple) or (color=Red) | fields hostname, color | dedup hostname]

0 Karma

woodcock
Esteemed Legend

It is exceedingly difficult to help if you do not show us your searches.

0 Karma

martin_mueller
SplunkTrust
SplunkTrust

In general, join is an expensive operation. Read https://answers.splunk.com/answers/129424/how-to-compare-fields-over-multiple-sourcetypes-without-jo... and watch the March 2016 virtual.conf talk from http://wiki.splunk.com/Virtual_.conf for more info.

In order to get debugging help it'd help if you posted your search, your requirements, and some info about your data or even sample data.

0 Karma
Get Updates on the Splunk Community!

Webinar Recap | Revolutionizing IT Operations: The Transformative Power of AI and ML ...

The Transformative Power of AI and ML in Enhancing Observability   In the realm of IT operations, the ...

.conf24 | Registration Open!

Hello, hello! I come bearing good news: Registration for .conf24 is now open!   conf is Splunk’s rad annual ...

ICYMI - Check out the latest releases of Splunk Edge Processor

Splunk is pleased to announce the latest enhancements to Splunk Edge Processor.  HEC Receiver authorization ...