I want to show our worst performing access log results. Having broken it down to fields including timetaken for a time in ms, I can search for this using
index="production" sourcetype="access" | sort -timetaken
and add this as a table in my app. However, because it's not created by one of top, stats, chart or timechart, presumably I can't make use of summary indexes to accelerate this search?
Is there any other way to accelerate this?
<table>
<title>Worst 10 responses in access log in last day</title>
<searchTemplate>sourcetype="access" index="$index$"| sort -timetaken</searchTemplate>
<earliestTime>-1d</earliestTime>
<fields>timetaken, req_time, status, uri</fields>
</table>
We're looking at around 8GB of access logs to index for the above search.
I don't believe you can summary index the sort itself (or at least, that's not what will provide you the best benefit), but if what you're looking for is the worst 10 responses, then you could do a sort limit=10 -timetaken
and toss just that in a summary index. That way, instead of going through 8 GB of data it will go through 10 records. If it's a requirement that it be the last 24 hours (e.g., -24h or -24h@h, versus -1d@d), you could get close to that by running the search every hour, or every half hour, and just taking the last reply, though it's going to have some degree of lag. I haven't played with overlapping periods in summary indexes, but I'm sure there is a way of pulling out just the most recent period.
Does that make sense?
Fundamentally, you are asking for Splunk to sort a very large amount of data along an axis by which it isn't already sorted.
I think you aren't interested in all the results, just the outliers, so there's a simple hack to get started:
index="production" sourcetype="access" timetaken > 300 | sort -timetaken |head 10
If you have any events at all in the data which arent candidates, you'd want to further narrow the initial search, eg
index="production" sourcetype="access" wanted NOT unwanted timetaken > 300 | sort -timetaken | head 10
A fancier way might be to do a subsearch (or feed a lookup by searching) over a short interval take perc95 of the timetaken, and then clip the 24 hour period to timetaken > that value. This approach could be further combined with the summary way if needed.
I think both of these answers are helpful - this will certainly speed up my interactive search against the full index until I sort out the summary indexing properly.
I don't believe you can summary index the sort itself (or at least, that's not what will provide you the best benefit), but if what you're looking for is the worst 10 responses, then you could do a sort limit=10 -timetaken
and toss just that in a summary index. That way, instead of going through 8 GB of data it will go through 10 records. If it's a requirement that it be the last 24 hours (e.g., -24h or -24h@h, versus -1d@d), you could get close to that by running the search every hour, or every half hour, and just taking the last reply, though it's going to have some degree of lag. I haven't played with overlapping periods in summary indexes, but I'm sure there is a way of pulling out just the most recent period.
Does that make sense?
Maybe: ((index=larger_general_index earliest=-5m) OR (index=categorized_rare_events latest=-5m)) interesting terms
Thanks for the answers and comments - how would I combine searching the summary index and live index?
If the summary action is to reindex the actual event texts into the summary index, the interactive search can just get the 10 worst in the past day. It could even search the summary index for the past day, and the live index for the past N minutes.