Monitoring Splunk

Why wildcard search is much slower than "where like(field, value)"?

czhang_splunk
Splunk Employee
Splunk Employee

I have two searches return the same result in my single Splunk instance environment, but there is huge performance different between two searches.

Searches:
1. index=main sourcetype="aws:description" placement="us-west-2*"
2. index=main sourcetype="aws:description" | where like(placement, "us-west-2%")

Results:
1. This search has completed and has returned 2,013 results by scanning 36,909 events in 35.372 seconds.
2. This search has completed and has returned 2,013 results by scanning 561,295 events in 11.913 seconds.

The raw events are in JSON format. placement field has the values of us-west-2a, us-west-2b, and us-west-2c. The performance gap becomes even larger if there is larger data set.

Could anyone explain why wildcard search is much slower? Is it always best practice to use where + like?

Thanks!

UDPATE Thank everyone for the help.

Figured out the reason by reading http://conf.splunk.com/sessions/2016-sessions.html#search=fields%2C%20indexed%20tokens%20and%20you&

0 Karma

ddrillic
Ultra Champion

About -

-- Could anyone explain why wildcard search is much slower?

Lovely discussion at Adavanced searches are too slow since they don't seem to make use of indexes

@sideview said -

alt text

0 Karma

lakromani
Builder

I did a test in my large (250GB/day) splunk. I got these result.

index=syslog msg="*ACE-4*"
506 000 result in 1 min and 2 seconds

index=syslog | where like(msg,"%ACE-4%")
506 000 result in 1 min and 46 seconds

0 Karma

czhang_splunk
Splunk Employee
Splunk Employee

That's interesting. I removed the "sourcetype="aws:description" in both of my searches, and I got the very similar results as yours.
Could you please try to limit your result set a lit bit and try again? For example, add sourcetype="foo" and make sure all events has msg field? Thanks for your time!

0 Karma

lakromani
Builder

index=syslog sourcetype=rsyslog splunk_server="indexer-ix-*" msg="*ACE-4*"
500 000 result in 44 seconds

index=syslog sourcetype=rsyslog splunk_server="indexer-ix-*" | where like(msg,"%ACE-4%")
500 000 result in 1 min 33 seconds.

0 Karma

cmerriman
Super Champion

if you turn on your LIPSY (an expression Splunk uses to locate events, it can be turned on in limits.conf, I believe. [search_info] infocsv_log_level=DEBUG), you can see how the search is actually working in the backend. Martin Muller did a great talk at .conf this year on how searches look to us vs how they run.

You can find his slides about this at http://conf.splunk.com/sessions/2016-sessions.html#search=fields%2C%20indexed%20tokens%20and%20you&

It'll probably explain more about why the wildcard searches that way.

Get Updates on the Splunk Community!

ICYMI - Check out the latest releases of Splunk Edge Processor

Splunk is pleased to announce the latest enhancements to Splunk Edge Processor.  HEC Receiver authorization ...

Introducing the 2024 SplunkTrust!

Hello, Splunk Community! We are beyond thrilled to announce our newest group of SplunkTrust members!  The ...

Introducing the 2024 Splunk MVPs!

We are excited to announce the 2024 cohort of the Splunk MVP program. Splunk MVPs are passionate members of ...