Splunk Search

Alternatives to "join" with two matching event fields.

sknot1454
Explorer

I'm trying to search through one sourcetype called "Windows_System".

There's also a specific field I'm interested in first called "EventCode"

If I do a search only for for sourcetype="Windows_System" EventCode=1, I get the results I want.

There's an executable that I want to be present in the fields when I search for EventCode 2 called "executable.exe"

If I do a search only for sourcetype="Windows_System" EventCode=2 process="executable.exe", I get the results I want.

However, there are fields that show up when EventCode=1 is specified for that aren't present when EventCode=2 is. I want to chart out all the fields I want if both EventCode values are specified.

The good thing is that there's a common field with a unique value between both searches called "GUID" so I can focus on that.

Here's the query so far

sourcetype="Windows_System" EventCode=2 Image="executable.exe"
| join GUID type=outer [search sourcetype="Windows_System" EventCode=1]
| rename SourceHostname as hostname_ip, SourceIp as source_ip, SourcePort as source_port, DestinationHostname as destination_hostname, DestinationIp as destination_ip, DestinationPort as destination_port
| table User, hostname_ip, source_ip, source_port, destination_hostname, destination_ip, destination_port, Protocol, GUID, process, CommandLine, ParentImage, ParentCommandLine

CommandLine, ParentImage, ParentCommandLine are fields that are present ONLY when you specify EventCode1 that aren't when you specify EventCode2. The data in those fields are not showing up when run my main.

I have two questions.

Is the "join" command really necessary or can I accomplish this with a less intensive search like with stats or chart?

Is this search the correct syntax to get the results I want?

Thanks!

0 Karma
1 Solution

sknot1454
Explorer

I know this was a while ago, but I ended up using "Eval GUID=coalesce(GUID,GUID)" and sorting by "GUID" when I tabled out the fields.

Since I know when EventCode=1 is specified, there would always be data in the field CommandLine, and I knew there would always be data for the field "DestinationIp" when EventCode=3 was specified, I just used "CommandLine!="" and DestinationIP="" to filter out null fields.

Works 100000x faster than a join. Queries that would take hours to run with join now take less than 10 minutes

View solution in original post

0 Karma

sknot1454
Explorer

I know this was a while ago, but I ended up using "Eval GUID=coalesce(GUID,GUID)" and sorting by "GUID" when I tabled out the fields.

Since I know when EventCode=1 is specified, there would always be data in the field CommandLine, and I knew there would always be data for the field "DestinationIp" when EventCode=3 was specified, I just used "CommandLine!="" and DestinationIP="" to filter out null fields.

Works 100000x faster than a join. Queries that would take hours to run with join now take less than 10 minutes

0 Karma

roopendra
Engager

Can you please share final version of your query? Thanks

0 Karma

somesoni2
Revered Legend

Joins are expensive and your requirement can easily be done using a stats. Try something like this

sourcetype="Windows_System" (EventCode=2 Image="executable.exe") OR (EventCode=1)
| stats values(process) as process, values(SourceHostname) as hostname_ip, values(SourceIp) as source_ip, values(SourcePort) as source_port, values(DestinationHostname) as destination_hostname, values(DestinationIp) as destination_ip, values(DestinationPort) as destination_port values(CommandLine) as CommandLine, values(ParentImage) as ParentImage, values(ParentCommandLine) as ParentCommandLine by User, Protocol, GUID
| table User, hostname_ip, source_ip, source_port, destination_hostname, destination_ip, destination_port, Protocol, GUID, process, CommandLine, ParentImage, ParentCommandLine

UPDATED SEARCH

sourcetype="Windows_System" (EventCode=2 Image="executable.exe") OR (EventCode=1)
| eval CommandLine=coalesce(CommandLine,null()) | eval ParentImage=coalesce(ParentImage,null()) | eval ParentCommandLine=coalesce(ParentCommandLine
 ,null())   | stats values(process) as process, values(SourceHostname) as hostname_ip, values(SourceIp) as source_ip, values(SourcePort) as source_port, values(DestinationHostname) as destination_hostname, values(DestinationIp) as destination_ip, values(DestinationPort) as destination_port values(CommandLine) as CommandLine, values(ParentImage) as ParentImage, values(ParentCommandLine) as ParentCommandLine by User, Protocol, GUID
    | table User, hostname_ip, source_ip, source_port, destination_hostname, destination_ip, destination_port, Protocol, GUID, process, CommandLine, ParentImage, ParentCommandLine

sknot1454
Explorer

You're definitely right. The stats helped speed up the query.

Unfortunately the CommandLine, ParentImage, ParentCommandLine field values are still blank in the results table.

All of the fields tabled out are common between EventCode1 and EventCode2 except for the three I listed above.

Any idea why they aren't being included?

0 Karma

somesoni2
Revered Legend

See if updated answer works for you.

0 Karma

sknot1454
Explorer

Still not working.

I tabled out the _raw field and it seems only raw data the from EventCode 3 is in the field. There's nothing from EventCode 1.

It seems they aren't joining together which makes sense that I can't see those three fields in my table.

The data is in XML format. Would this be a problem?

0 Karma
Get Updates on the Splunk Community!

ICYMI - Check out the latest releases of Splunk Edge Processor

Splunk is pleased to announce the latest enhancements to Splunk Edge Processor.  HEC Receiver authorization ...

Introducing the 2024 SplunkTrust!

Hello, Splunk Community! We are beyond thrilled to announce our newest group of SplunkTrust members!  The ...

Introducing the 2024 Splunk MVPs!

We are excited to announce the 2024 cohort of the Splunk MVP program. Splunk MVPs are passionate members of ...