The search used looks like this:
index=my_sanitized_index_name sourcetype=web_access_logs | timechart count(eval(x_Status < 400)) as Success count(eval(x_Status>=400)) as Failure | addtotal row=f col=t labelfield=_time | eval SuccessRate = 100 * Success / (Success + Failure)
Where x_Status is the return code from a web server farm's collective access logs
In its current form, I did a search for the "last 15 minutes". It found 859,984 events and took 25.384 seconds to execute. I need to run this thing to give me stats for the last week. It usually stops working around the third hour.
I am completely new to Splunk, I am certain there is a better way to do this. I just don't know what that is. Help?
@OstermanA - Looks like you have a few possible solutions to your question. If one of them provided a working solution, please don't forget to click "Accept" below the best answer to resolve this post. If you still need help, please leave a comment. Don’t forget to upvote anything that was helpful too. Thanks!
The only field you need from those events is the x_status and the _time, so explicitly get rid of everything else the first moment that you can, using the fields command. Try this and see what happens to your search time.
index=my_sanitized_index_name sourcetype=web_access_logs | fields _time x_status
| timechart count(eval(x_Status < 400)) as Success count(eval(x_Status>=400)) as Failure
| addtotal row=f col=t labelfield=_time
| eval SuccessRate = 100 * Success / (Success + Failure)
This cut down execution time by 75%. I still should probably look into setting up summary indices, but this was very helpful. Thank you.
You're welcome. richgalloway's answer is probably your best long-term solution, but for efficiency in splunk you should also always get rid of all the data you don't need at the earliest opportunity.
Agree. In fact, would suggest to implement @DalJeanis's optimization method to your summary index search (and any other search) solution of @richgalloway.
See if this make some improvements.
index=my_sanitized_index_name sourcetype=web_access_logs | eval Type=if(x_Status<400,"Success","Failure") | timechart count by Type | addtotal row=f col=t labelfield=_time | eval SuccessRate = 100 * Success / (Success + Failure)
IMHO, your problem is not with the number of eval
commands (which isn't that many), but with the sheer number of events you're trying to process. Those hundreds of millions of events should be distributed among many indexers for best performance.
But that doesn't help you now. Consider running your search across small intervals like you're doing, but save the results in a summary index. Then run another search to collect the data from the summary search into your weekly report. That will be much faster.
That sounds promising. Can you link me to any documentation you would recommend as a good resource for a beginning Splunk user?
Well, The one above is good. A video is available here
More info on summary indexing here: http://docs.splunk.com/Documentation/Splunk/6.5.1/Knowledge/Usesummaryindexing