Hello,
I have a query with multiple subsearches that is slower than I would like, so I am looking for ways to optimize it.
The search finds the first event whose path is transaction
that occurs after an event whose path is finalize
for a given customerId
and sessionId
. The sessionId
is passed in through a dashboard. The first subsearch pulls the customerId
out of a cookie using rex
. The second gets the timestamp of the finalize
event for the relevant session.
index=my_index event=response path=*/transactions* [search index=my_index event=response path=*/pair* $sessionId$ | spath output=cookie path=headers.cookie | rex field=cookie "(?<=\bCID=)(?<customerId>.{36})" | return 1 $customerId] | spath ts output=transactionTs | eval finalizeTs=[search index=my_index event=response path=*/finalize* $sessionId$ | spath ts output=t | return 1 $t] | where transactionTs>finalizeTs | tail 1
For all the Splunk pros out there, is there a way to rewrite this query to remove the subsearches, or otherwise improve the performance?
Whether or not this will be more efficient will depend on whether you can properly scale the time frame. This pulls back all the transactions in the time frame, then culls them later.
(path=*/transactions* customerId=*)
OR (path=*/pair* $sessionId$)
OR (path=*/finalize* $sessionId$)
| rename COMMENT as "The above gets the pair and finalize record, plus all transactions in the time frame."
| rename COMMENT as "List all fields here that you want to retain from any record."
| fields customerId path ts "headers.cookie"
| rename COMMENT as "mark the record types to make the code more readable."
| eval rectype=case(like(path,"%/transactions%"),"t",
like(path,"%/finalize%"),"f",
true(),"p")
| rename COMMENT as "Get the timestamp if present on this kind of record."
| spath ts output=Ts
| eval transactionTs = case(rectype="t",Ts)
| eval finalizeTs = case(rectype="f",Ts)
| rename COMMENT as "Get the customerId if this is a pair record."
| spath output=cookie path=headers.cookie
| rex field=cookie "(?<=\bCID=)(?<customerId>.{36})"
| rename COMMENT as "roll finalizeTs onto the pair record by sessionId and drop the f record"
| eventstats max(finalizeTs) as finalizeTs by sessionId
| where rectype!="f"
| rename COMMENT as "roll finalizeTs onto t record, by customerId"
| eventstats max(finalizeTs) as finalizeTs by customerId
| rename COMMENT as "drop everything that isnt a t record for the same customer later than the f record"
| where transactionTs > finalizeTs
| tail 1
There are other efficiencies that can be tuned if this is anywhere near better than what you have. For instance, if sessionId is in an actual field, we could use eventstats to roll the customerId from the pair record to the finalize record by sessionId BEFORE we decode the timestamps, then delete all the transaction records that do not have a customerId, before decoding the timestamps. That would probably save a significant amount of CPU.
Whether or not this will be more efficient will depend on whether you can properly scale the time frame. This pulls back all the transactions in the time frame, then culls them later.
(path=*/transactions* customerId=*)
OR (path=*/pair* $sessionId$)
OR (path=*/finalize* $sessionId$)
| rename COMMENT as "The above gets the pair and finalize record, plus all transactions in the time frame."
| rename COMMENT as "List all fields here that you want to retain from any record."
| fields customerId path ts "headers.cookie"
| rename COMMENT as "mark the record types to make the code more readable."
| eval rectype=case(like(path,"%/transactions%"),"t",
like(path,"%/finalize%"),"f",
true(),"p")
| rename COMMENT as "Get the timestamp if present on this kind of record."
| spath ts output=Ts
| eval transactionTs = case(rectype="t",Ts)
| eval finalizeTs = case(rectype="f",Ts)
| rename COMMENT as "Get the customerId if this is a pair record."
| spath output=cookie path=headers.cookie
| rex field=cookie "(?<=\bCID=)(?<customerId>.{36})"
| rename COMMENT as "roll finalizeTs onto the pair record by sessionId and drop the f record"
| eventstats max(finalizeTs) as finalizeTs by sessionId
| where rectype!="f"
| rename COMMENT as "roll finalizeTs onto t record, by customerId"
| eventstats max(finalizeTs) as finalizeTs by customerId
| rename COMMENT as "drop everything that isnt a t record for the same customer later than the f record"
| where transactionTs > finalizeTs
| tail 1
There are other efficiencies that can be tuned if this is anywhere near better than what you have. For instance, if sessionId is in an actual field, we could use eventstats to roll the customerId from the pair record to the finalize record by sessionId BEFORE we decode the timestamps, then delete all the transaction records that do not have a customerId, before decoding the timestamps. That would probably save a significant amount of CPU.
Thank you for the prompt and thorough response. This was extremely helpful.