Solved: Appendpipe or Multisearch for continuing processin...

stroud_bc · ‎11-08-2019

I am attempting to create a custom Risk Attribution rule based on Web Proxy traffic to newly-seen (not-seen-before-yesterday) domains based off the blog post found below:

https://www.splunk.com/blog/2018/01/17/finding-new-evil-detecting-new-domains-with-splunk.html

What I am running into is the following: I want to use the "new domain monitoring" technique described in the above link, but I want to attribute risk to a USER. The problem is that I want to run my statistics WITHOUT the user field in order to update previously_seen_domains.csv and WITH the user field in order to create my risk attribution.

| tstats count from datamodel=Web where nodename=Web.Proxy by Web.url, Web.user, _time 
| rename Web.* AS * 
| eval list="mozilla" 
| `ut_parse_extended(url, list)` 
| appendpipe 
    [| stats earliest(_time) as earliest latest(_time) as latest by ut_domain
    | inputlookup append=t previously_seen_domains.csv 
    | stats min(earliest) as earliest max(latest) as latest by ut_domain
    | outputlookup previously_seen_domains.csv
    |search 1!=1
     ]
| stats earliest(_time) as earliest latest(_time) as latest by ut_domain, user 
| inputlookup append=t previously_seen_domains.csv 
| stats min(earliest) as earliest max(latest) as latest by ut_domain, user 
| eval isOutlier=if(earliest >= relative_time(now(), "-1d@d"), 1, 0) 
| convert ctime(earliest) ctime(latest) 
| where isOutlier=1

This search DOES work, but I don't know much about the internals of the |appendpipe command, and it seems like |multisearch might be more performant? Right now previously_seen_domains.csv has about 10k entries in it. If there are any suggestions on how to run the stats calculations once instead of twice with different aggregation, those would be helpful as well.

stroud_bc · ‎11-11-2019

I think I have a better understanding of |multisearch after reading through some answers on the topic. Additionally, for any future readers who are trying a similar approach, I found that the above search fails to respect the earliest values from the lookup, since the second | stats earliest(_time) as earliest latest(_time) as latest by ut_domain, user line ends up recalculating earliest values FOR EACH USER, which means the values will always fall within the relevant window and therefore will always end up attributing risk... To solve this, I turned to a much simpler and more linear approach using a subsearch with a separate |tstats command AFTER the outputlookup to join the user data to the newly-seen domain data.

View solution in original post

stroud_bc · ‎11-11-2019

I think I have a better understanding of |multisearch after reading through some answers on the topic. Additionally, for any future readers who are trying a similar approach, I found that the above search fails to respect the earliest values from the lookup, since the second | stats earliest(_time) as earliest latest(_time) as latest by ut_domain, user line ends up recalculating earliest values FOR EACH USER, which means the values will always fall within the relevant window and therefore will always end up attributing risk... To solve this, I turned to a much simpler and more linear approach using a subsearch with a separate |tstats command AFTER the outputlookup to join the user data to the newly-seen domain data.

Appendpipe or Multisearch for continuing processing after Outputlookup?

More Ways To Control Your Costs With Archived Metrics | Register for Tech Talk

.conf24 | Personalize your .conf experience with Learning Paths!

Threat Hunting Unlocked: How to Uplevel Your Threat Hunting With the PEAK Framework ...