when I search with below query
sourcetype=my_log UUID="3fc5e6c2-57b4-4e59-a3c0-8115f5ec74a1"
search result will appear within one second amazing fast 🙂
this log information is older then one month
but when I search with this query
sourcetype=my_log | transaction startswith=log_begin endswith=log_end | where UUID="3fc5e6c2-57b4-4e59-a3c0-8115f5ec74a1"
It'll take 8 to 10 minutes to display the result 😞 extremely slow
Now I have two question
So I realize I'm way late to the party here, but what about using a subsearch? Assuming that there is a field in your log data (let's call it myTransactionID) can be used to uniquely identify a transaction, you could do something like:
sourcetype=my_log [search sourcetype=my_log UUID="3fc5e6c2-57b4-4e59-a3c0-8115f5ec74a1" | dedup myTransactionID | fields myTransactionID] | transaction startswith=log_begin endswith=log_end
Essentially, what the subsearch does is find the initial log with the specified UUID value, obtain the value of myTransactionID, and then pass that as an argument to the main search so that it only returns events with the matching transaction ID. Normally subsearches aren't particularly fast, so as a general rule I wouldn't be suggesting them for optimization, but it will be far better than letting transaction operate on every single event with the my_log sourcetype.
Does the UUID
field exist in all events you are interested in? Like martin_mueller said the first search is fast because index data is used to narrow down your search results. But the second search is very slow because it is handling so much data. If i understand the search pipeline correctly, your second search is taking the entire contents of my_log
and trying to apply the transaction
function to it before narrowing it down again with the where
command. Transaction
is an intensive operation and you'll want to narrow down your search results as much as possible before piping to it. Additionally, if there is a field that uniquely identifies log entries as part of a transaction, you should include them as the optional field list of the transaction
command, this makes it easier for transaction
to group events together. Would a search like one of the following accomplish what you need?
sourcetype=my_log UUID="3fc5e6c2-57b4-4e59-a3c0-8115f5ec74a1" | transaction UUID startswith=log_begin endswith=log_end
8 minutes is understandable since you're telling Splunk to retrieve all events from disk before really doing anything.
You might want to look into the localize command: http://docs.splunk.com/Documentation/Splunk/5.0.1/SearchReference/Localize
NO UUID appears only once in a transaction, I understand the reason but 8 minutes is not good for search the log. Is there any other alternate e.g. to display x line before UUID field and y line after UUID field.
The first query is fast because splunk can use index data to narrow down the events that need to be loaded.
The second query is slow because splunk has to push everything into the transaction command, which then is slow because it can't handle large (in splunk terms) amounts of data.
One way to speed things up is to narrow down the time range that needs to be searched.
Other ways depend on your data and what you do with it.