Splunk Dev

Combine two source types using data models and join

Ojay87
Explorer

Hi everyone!

In Enterprise Security I am trying to combine results from two different source types by using "join" but facing problem with subsearch limits. My goal is to make a statistic table where the traffic data is coming from another log, but this traffic log is huge even if I narrow the search for one hour. All I really need is just fields "bytes_in" and "bytes_out" from traffic log, but now it is collecting everything. I have tried narrow the subsearch results, but not yet successfully.

Below search is the very basic setup which would work without any limits, but I cannot change them. Looking for any ideas how to bypass it.

| datamodel datamodel1 search | search attack="vulnerability1"
| join src_ip type=left overwrite=false [search index=logs sourcetype=traffic_log ] 
| stats sum(eval(round(bytes_in(1024*1024),2))) AS "Incoming Mb" sum(eval(round(bytes_out/(1024*1024),2))) AS "Outgoing Mb" by src_ip
Tags (1)
0 Karma
1 Solution

DalJeanis
SplunkTrust
SplunkTrust

Assumption 1: You need bytes_in and bytes_out from traffic_log. But you only need that information if the src_ip is in the datamodel and marked as "vulnerability1".

Assumption 2: The frequency of src_ip in the datamodel and marked as "vulnerability1" in traffic_log are low/sparse relative to the total number of src_p in traffic_log.

index=logs sourcetype=traffic_log 
  [| datamodel datamodel1 search | search attack="vulnerability1" | dedup src_ip | table src_ip]
| stats sum(eval(round(bytes_in/(1024*1024),2))) AS "Incoming Mb" 
        sum(eval(round(bytes_out/(1024*1024),2))) AS "Outgoing Mb" by src_ip

View solution in original post

DalJeanis
SplunkTrust
SplunkTrust

Assumption 1: You need bytes_in and bytes_out from traffic_log. But you only need that information if the src_ip is in the datamodel and marked as "vulnerability1".

Assumption 2: The frequency of src_ip in the datamodel and marked as "vulnerability1" in traffic_log are low/sparse relative to the total number of src_p in traffic_log.

index=logs sourcetype=traffic_log 
  [| datamodel datamodel1 search | search attack="vulnerability1" | dedup src_ip | table src_ip]
| stats sum(eval(round(bytes_in/(1024*1024),2))) AS "Incoming Mb" 
        sum(eval(round(bytes_out/(1024*1024),2))) AS "Outgoing Mb" by src_ip

Ojay87
Explorer

Thank you for your reply! This is what I wanted as you described in your first assumption.

This was very helpful. Now it works!

DalJeanis
SplunkTrust
SplunkTrust

@ojay87 - We're glad to help. Please accept the answer, so the question will show as closed.

Or instead, if @cmerriman's answer was the one that did the trick, rather than mine, then please let us know and we can convert her comment to an answer so you can accept that instead. We're all family around here.

0 Karma

cmerriman
Super Champion

how much data is in the datamodel? if you flipped them around and had the datamodel as a subsearch, would you still run into limitations? Is there a way you can do some aggregations inside the subsearch as well as to the base search to limit the amount of events?

0 Karma

Ojay87
Explorer

Thanks for your reply!

I get this error when switching rows of datamodel and subsearch or trying datamodel as a subsearch: "Error in 'SearchParser': The datamodel command can only be used as the first command on a search'.

0 Karma

cmerriman
Super Champion

that shouldn't be a problem. just make sure it's formatted properly.

index=logs sourcetype=traffic_log|join src_ip type=left overwrite=false [| datamodel datamodelname datasetname search | search attack="vulnerability1"]| stats sum(eval(round(bytes_in(1024*1024),2))) AS "Incoming Mb" sum(eval(round(bytes_out/(1024*1024),2))) AS "Outgoing Mb" by src_ip

the problem might come with the datamodel timing out, perhaps, but it can be used as a subsearch, it is the first command in the search (the subsearch is it's own search).

have you tried using the pivot command instead? is the datamodel accelerated? again, is there a way to aggregate either before joining them together on the raw events? perhaps by summing the bytes in and out by src_ip in the traffic_log and using the datamodel/pivot as a subsearch with a distinct list of src_ip that had vulnerable attacks? (i'm completely guessing that that is what your datamodel is there for)

also, if you did the aggregate of bytes in/out by src_ip in the subsearch, you might not run into the limitation.

0 Karma

Ojay87
Explorer

Thanks, you were right. I wrote the commands incorrectly. Now, it works how I wanted! Many thanks for your support here!

0 Karma
Get Updates on the Splunk Community!

Welcome to the Splunk Community!

(view in My Videos) We're so glad you're here! The Splunk Community is place to connect, learn, give back, and ...

Tech Talk | Elevating Digital Service Excellence: The Synergy of Splunk RUM & APM

Elevating Digital Service Excellence: The Synergy of Real User Monitoring and Application Performance ...

Adoption of RUM and APM at Splunk

    Unleash the power of Splunk Observability   Watch Now In this can't miss Tech Talk! The Splunk Growth ...