Splunk Search

Splunk map search by chunks or any other way to make it faster

AndreiIssakov
Explorer

Hello!
Could somebody please suggest if it is possible to do a map search search more effectively?
What I am trying to do:
1. there are events with client transactions. A huge list (thousands every second).
2. I search for transaction chains, which are suspicious by some conditions for last hour
3. If a transaction chain is suspicious, I make a longer search (last 3 weeks) because some operations do not fit into the last hour. I basically do the same calculations, but with longer time interval and with more strict conditions
The following search works, but it takes several minutes and sometimes cancelled due to timeout:

 

 

 

 

 

<MY_SEARCH>
| stats first(orgCode) AS orgCode first(accountId) AS accountId sum(amount) AS totalAmount sum(controlAmount) AS totalControlAmount by transactionChainRef 
| where totalControlAmount>0 and totalControlAmount<totalAmount
| map search="search
          <MY_SEARCH> AND message=\"*transactionChainRef\\\":$transactionChainRef$*\" earliest=-3w 
          | eval orgCode=$orgCode$
          | eval accountId=$accountId$
          | eval totalControlAmount=$totalControlAmount$
          | stats first(orgCode) AS orgCode first(accountId) AS accountId sum(amount) AS totalAmount first(totalControlAmount) AS totalControlAmount by transactionChainRef
          | where totalControlAmount<totalAmount
    " maxsearches=9999

 

 

 

 

 


Unfortunately I cannot make query right away for the last 3 weeks because there will be still transaction chains, which may go outside of the 3 weeks (a chain has finished, say, 2.5 weeks ago; its start may be 5.5 weeks ago).

My idea currently is to make a map search by chunks, for example, by 100 transactionChainRefs

Thanks in advance!

Labels (1)
Tags (1)
0 Karma
1 Solution

bowesmana
SplunkTrust
SplunkTrust

You could run the search as two searches, the first collecting the stats output to a lookup file and the second search, rather than doing one search for each ref, it would search for all refs in a single search, which would be more efficient. Depending on your runtime requirements, this could work

<MY_SEARCH>
| stats first(orgCode) AS orgCode first(accountId) AS accountId sum(amount) AS totalAmount sum(controlAmount) AS totalControlAmount by transactionChainRef 
| where totalControlAmount>0 and totalControlAmount<totalAmount
| outputlookup refs_to_investigate.csv

followed by

<MY_SEARCH>  earliest=-3w 
  [ | inputlookup refs_to_investigate | fields transactionChainRef 
    | eval message="*transactionChainRef\":".transactionChainRef."*\"" ]
... OTHER_STUFF...

I am a little confused by the eval orgCode=... and first(orgCode) ... statements, as to what they are doing, but this logic should be able to overcome the serial approach to map search statements, which is never going to scale well.

Note that if you are running in a search head cluster, you would need to ensure that the two searches run on the same search head or that the replication has occurred of the outputlookup.

Anyway, something to try?

 

View solution in original post

0 Karma

bowesmana
SplunkTrust
SplunkTrust

You could run the search as two searches, the first collecting the stats output to a lookup file and the second search, rather than doing one search for each ref, it would search for all refs in a single search, which would be more efficient. Depending on your runtime requirements, this could work

<MY_SEARCH>
| stats first(orgCode) AS orgCode first(accountId) AS accountId sum(amount) AS totalAmount sum(controlAmount) AS totalControlAmount by transactionChainRef 
| where totalControlAmount>0 and totalControlAmount<totalAmount
| outputlookup refs_to_investigate.csv

followed by

<MY_SEARCH>  earliest=-3w 
  [ | inputlookup refs_to_investigate | fields transactionChainRef 
    | eval message="*transactionChainRef\":".transactionChainRef."*\"" ]
... OTHER_STUFF...

I am a little confused by the eval orgCode=... and first(orgCode) ... statements, as to what they are doing, but this logic should be able to overcome the serial approach to map search statements, which is never going to scale well.

Note that if you are running in a search head cluster, you would need to ensure that the two searches run on the same search head or that the replication has occurred of the outputlookup.

Anyway, something to try?

 

0 Karma

AndreiIssakov
Explorer

Thanks @bowesmana !
This way works much faster than the map.
There is only one thing I could not achieve: I need an alert, so both searches (outputlookup and inputlookup) must be in one search. But separately they work just fine:

<MY_SEARCH_WITH_PIPELINES> 
 | stats sum(amount) AS totalAmount sum(controlAmount) AS totalControlAmount by transactionChainRef 
 | table transactionChainRef
 | outputlookup transactionChainRefs.csv
 
<MY_SEARCH_BASE> earliest=-3w 
 [| inputlookup transactionChainRefs.csv | eval message="*".transactionChainRef."*" | fields message ]
  | <MY_SEARCH_PIPELINES>
  | table transactionChainRef

 

0 Karma

PickleRick
SplunkTrust
SplunkTrust

Alternatively, if the "base" search is relatively short and returns not that many resulting rows, you could try rephrasing it the other way around - using it as a subsearch generating additional conditions for the main search.

Something like:

<Your_search> earliest=3w> [ <Your_search> earliest=1h
| stats first(orgCode) AS orgCode first(accountId) AS accountId sum(amount) AS totalAmount sum(controlAmount) AS totalControlAmount by transactionChainRef
| where totalControlAmount>0 and totalControlAmount<totalAmount
| eval message="*transactionChainRef\":".transactionChainRef."*"
| table message orgCode accountId totalControlAmount ]
| stats first(orgCode) AS orgCode first(accountId) AS accountId sum(amount) AS totalAmount first(totalControlAmount) AS totalControlAmount by transactionChainRef
| where totalControlAmount<totalAmount

You might want to adjust the escaping with "eval message=".

Oh, and this search will be highly inefficient due to the wildcard at the beginning of the message pattern. Try to find a way to limit your search to well-anchored search pattern. It will greatly speed up your search.

AndreiIssakov
Explorer

Thanks @PickleRick !
This solution is just great! At first I made the search with intermediate CSV file as suggested in previous reply by @bowesmana. But then tried out the subsearch, and found out, that it works with thousands of the OR conditions, and it is even faster, as with CSV file:

<MY_SEARCH_BASE> AND earliest=-3w
[search
   <MY_SEARCH_WITH_PIPELINES>
   | stats sum(amount) AS totalAmount sum(controlAmount) AS totalControlAmount by transactionChainRef 
   | <WHERE_CONDITION>
   | eval message= "*".transactionChainRef."*"
   | table message
]
| <MY_SEARCH_PIPELINES>
| stats first(accountId) AS accountId first(orgId) AS orgId sum(amount) AS totalAmount sum(controlAmount) AS totalControlAmount by transactionChainRef
| <WHERE_CONDITION>
0 Karma

PickleRick
SplunkTrust
SplunkTrust

But remember that subsearches do have limitations - execution time limit and result set size one if I remember correctly. So the subsearch might be prematurely terminated and yield incomplete results. That's why I said "if you have few results".

0 Karma

AndreiIssakov
Explorer

One more nice benefit from this solution:
I can visually observe the calculation:
at first I see some hundreds of potentially suspicious chains. Then this amounts constantly reduces to zero (or a pair of suspicious transaction chain).
With search map the result set was empty until the very finish

0 Karma
Get Updates on the Splunk Community!

Detecting Remote Code Executions With the Splunk Threat Research Team

REGISTER NOWRemote code execution (RCE) vulnerabilities pose a significant risk to organizations. If ...

Observability | Use Synthetic Monitoring for Website Metadata Verification

If you are on Splunk Observability Cloud, you may already have Synthetic Monitoringin your observability ...

More Ways To Control Your Costs With Archived Metrics | Register for Tech Talk

Tuesday, May 14, 2024  |  11AM PT / 2PM ET Register to Attend Join us for this Tech Talk and learn how to ...