Splunk Search

After upgrading from Splunk 5 to 6.2.4, why are some searches that use a macro with a lot of eval case branches running 10 to 20 times slower?

knielsen
Contributor

Hello,

Since we upgraded from Splunk 5 to Splunk 6.2.4, some of our searches run 10 to 20 times slower than before. It happens where we use a complex macro with a lot of case branches. It seems, Splunk now spends more time on case branches that aren't evaluated.

So the performance of

index=something sourcetype=sourcetype1 | eval mode=case(
        sourcetype=="sourcetype1", some string logic,
        true(), "undefined")

is very different from

index=something sourcetype=sourcetype1 | eval mode=case(
        sourcetype=="sourcetype1", some string logic,
        sourcetype=="sourcetype2", some string logic,
        sourcetype=="sourcetype3", some string logic,
        sourcetype=="sourcetype4", some string logic,
        sourcetype=="sourcetype5", some string logic,
        ...
        true(), "undefined")

This kind of logic is now much much slower on Splunk 6 for us. Is this expected, or is there something we could tune? We would need to redo a lot of summary index definitions that use a macro with this kind of search, where we gather information from various sourcetype in a single summary index.

0 Karma

knielsen
Contributor

Hi,

I actually was mistaken, and the increase in runtime was not due to number of case branches. It was due to a specific case, and it boiled down to:

... ,like(http_request,"%/jobs%") AND action=“run"

Splunk hates me for using the fieldname "action" (which should be created by auto field extraction). I had to rewrite this as

..., like(http_request,"%/jobs%") AND like(http_request,"%action=run%")

It took me quite a while to get there. And I have no clue why this fixed it. Is "action" some kind of special field now in Splunk 6? Anyway, removing relying on "action" as fieldname brought back performance to normal level. Using the case statement in original form drops performance by 10 to 20 times even if the case branch is not used. Really strange.

0 Karma

woodcock
Esteemed Legend

What is the difference in the normalized search between the two?

0 Karma

knielsen
Contributor

This is not really fitting the title of my original question as I mistakenly blamed case for the performance loss, but here is a little test for me. I stripped down a sample search and the macro to do something minimal, albeit not very meaningful.

Original:
index=lbsp sourcetype=lbsp_regeo_access |  eval mode=case(like(http_request,"%/jobs%") AND action="run","jobs-submit",true(),"other") | stats count by mode
Normalized:
litsearch index=lbsp sourcetype=lbsp_regeo_access | eval mode=case(like(http_request,"%/jobs%") AND action="run","jobs-submit",true(),"other") | addinfo type=count label=prereport_events | fields keepcolorder=t "mode" "prestats_reserved_*" "psrsvd_*" | prestats count by mode 

-> This search has completed and has returned 1 result by scanning 4,460,539 events in 141.771 seconds.

Modified not to use action=
index=lbsp sourcetype=lbsp_regeo_access |  eval mode=case(like(http_request,"%/jobs%") AND like(http_request,"%action=run%"),"jobs-submit",true(),"other") | stats count by mode
Normalized:
litsearch index=lbsp sourcetype=lbsp_regeo_access | eval mode=case(like(http_request,"%/jobs%") AND like(http_request,"%action=run%"),"jobs-submit",true(),"other") | addinfo type=count label=prereport_events | fields keepcolorder=t "mode" "prestats_reserved_*" "psrsvd_*" | prestats count by mode 

-> This search has completed and has returned 1 result by scanning 4,460,539 events in 8.684 seconds.

0 Karma

knielsen
Contributor

Oh, I just tried anf changed action="run" to bla="run"

index=lbsp sourcetype=lbsp_regeo_access |  eval mode=case(like(http_request,"%/jobs%") AND bla="run","jobs-submit",true(),"other") | stats count by mode

-> This search has completed and has returned 1 result by scanning 4,460,539 events in 6.381 seconds.

None of the events on that sourcetype have action=... or bla=... in any request. So it's really the use of "action" that makes it slow.

Make this even more basic and let it run vs _internal:

index=_internal sourcetype=splunkd_access | eval foo=if(isnull(action),"foo","bar") | stats count by foo

-> This search has completed and has returned 1 result by scanning 9,280,747 events in 162.836 seconds.

index=_internal sourcetype=splunkd_access | eval foo=if(isnull(bla),"foo","bar") | stats count by foo

This search has completed and has returned 1 result by scanning 9,280,747 events in 4.126 seconds.

Lessons learnt: avoid the usage of a field named "action" I guess. And I didn't see that in Splunk 5.

0 Karma

woodcock
Esteemed Legend

You should edit the subject line of your question (look for the gear icon on top) and change it to an appropriate string and then convert the previous comment to an Answer and then Accept it.

0 Karma

woodcock
Esteemed Legend

You can use the job inspector to examine the normalized search and this will show you the intense optimizations that Splunk is trying to do on your search in v6.2.4. Hopefully you still have an instance of v5 (or can stand one up) to see how it was working in the earlier release and then compare the two. You might be AMAZED at the CRAZY stuff they do with reverse lookups, etc. If the slowdown is indeed as a result of bad optimization (which would not at all surprise me), you can disable the optimization by piping your first search to a second partly duplicated search like this:

index=something sourcetype=sourcetype1 | search sourcetype=sourcetype1 | eval mode=case(
     sourcetype=="sourcetype1", some string logic,
     sourcetype=="sourcetype2", some string logic,
     sourcetype=="sourcetype3", some string logic,
     sourcetype=="sourcetype4", some string logic,
     sourcetype=="sourcetype5", some string logic,
     ...
     true(), "undefined")

Again, if you compare the normalized search of this version of your search to the other version, it will probably not contain the crazy optimizations and will definitely run differently (hopefully more quickly, like it did before).

For more information, check out this question:

http://answers.splunk.com/answers/256125/why-does-splunk-add-junk-to-the-front-of-a-search.html

0 Karma
Get Updates on the Splunk Community!

Index This | I am a number, but when you add ‘G’ to me, I go away. What number am I?

March 2024 Edition Hayyy Splunk Education Enthusiasts and the Eternally Curious!  We’re back with another ...

What’s New in Splunk App for PCI Compliance 5.3.1?

The Splunk App for PCI Compliance allows customers to extend the power of their existing Splunk solution with ...

Extending Observability Content to Splunk Cloud

Register to join us !   In this Extending Observability Content to Splunk Cloud Tech Talk, you'll see how to ...