Solved: comparing two searches and calculate the differenc...

smanojkumar · ‎12-06-2023

Hi There!

I would like to find the values of host that were in macro 1 but not in macro 2
search 1

`macro 1`
| fields host

search 2

`macro 2`
| fields host

macro 1

host
a
b
c
d

macro 2

host
a
b
e
f

Result

Count - 2
because host c and d were not in macro 2

Thanks in Advance!

yuanliu · ‎12-07-2023

Do you mean to only count difference and not showing host values? In that case, add count.

`macro 1`
| eval source = "macro1"
| append
    [search `macro 2`
    | eval source = "macro2"]
| stats values(source) as source by host
| where mvcount(source) < 2 AND source == "macro 1"
| stats dc(host) as count_diff

If you don't get desired results, you need to examine data and post relevant data from each macro (anonymize as needed) as well as actual results.

Here is an emulation based on your original mock data.

macro 1	macro 2
\| makeresults format=csv data="host a b c d"	\| makeresults format=csv data="host a b e f"

Adding them together (because search command is not used, the subsearch looks different)

| makeresults format=csv data="host
a
b
c
d"
| eval source="macro 1"
| append
    [makeresults format=csv data="host
a
b
e
f"
    | eval source="macro 2"]
| stats values(source) as source by host
| where mvcount(source) < 2 AND source == "macro 1"

This gives

host	source
c	macro 1
d	macro 1

If I add | stats dc(host), it gives me 2.

View solution in original post

yuanliu · ‎12-06-2023

First, let me clarify that this problem is solvable as stated. But you may want to reconsider how "macro 1" and "macro 2" is structured to make this easier. You may also want to structure a different search to make this function more efficient.

Back to the stated problem. The idea is to "tag" output from each macro, then count which host is in which output.

`macro 1`
| eval source = "macro1"
| append
    [search `macro 2`
    | eval source = "macro2"]
| stats values(source) as source by host
| where mvcount(source) < 2 AND source == "macro 1"

Note I insert "search" command in the subsearch because I do not know how "macro 2" is constructed. It may not need that, or the "search" command may ruin it. The where command also uses a feature/side effect of SPL's equality comparator against multivalue fields.

smanojkumar · ‎12-06-2023

Hi @yuanliu ,

The code in macro is just the index name and source type and with several fields, as our need is only host, so we need that,

same code in macro 2 as well, only the source type varies

macro 1
index=sap sourcetype=1A*

maceo 2
index=sap sourcrtype=2A*

Thanks in Advance!

yuanliu · ‎12-07-2023

In that case, the search I gave should work as is. Have you tried?

smanojkumar · ‎12-07-2023

Hi @yuanliu ,

Few results were wrong, I don't know why!
When I'm checking with in search, It is having macro1 as value but when I'm checking with that macro, there we are having that host.

Thanks in Advance!

yuanliu · ‎12-07-2023

Do you mean to only count difference and not showing host values? In that case, add count.

`macro 1`
| eval source = "macro1"
| append
    [search `macro 2`
    | eval source = "macro2"]
| stats values(source) as source by host
| where mvcount(source) < 2 AND source == "macro 1"
| stats dc(host) as count_diff

If you don't get desired results, you need to examine data and post relevant data from each macro (anonymize as needed) as well as actual results.

Here is an emulation based on your original mock data.

macro 1	macro 2
\| makeresults format=csv data="host a b c d"	\| makeresults format=csv data="host a b e f"

Adding them together (because search command is not used, the subsearch looks different)

| makeresults format=csv data="host
a
b
c
d"
| eval source="macro 1"
| append
    [makeresults format=csv data="host
a
b
e
f"
    | eval source="macro 2"]
| stats values(source) as source by host
| where mvcount(source) < 2 AND source == "macro 1"

This gives

host	source
c	macro 1
d	macro 1

If I add | stats dc(host), it gives me 2.

smanojkumar · ‎12-14-2023

Hi @yuanliu ,

I think append had some limits to append it, so if we are handling with large volumes of data set like 2lakhs events, which command will be useful!

yuanliu · ‎12-14-2023

I think append had some limits to append it, so if we are handling with large volumes of data set like 2lakhs events, which command will be useful!

This is why my first reply hinted that restructuring the searches could be a better option. You did mention each of the two macros were simple index searches with different sourcetype constraints. It is much more efficient to combine the two searches into one, then stats over their differences. If you observe how the search I constructed before had to manufacture a field named "source" (which obviously is not your data field named source), you would draw a parallel.

Based on pseudo code of your macros, here is an example of what you can use instead:

index=sap sourcetype IN (1A*, 2A*)
| eval sourcetype = if(match(sourcetype, "^1A"), "1A...", "2A...")
| stats values(sourcetype) as sourcetype by host
| where mvcount(sourcetype) < 2 AND sourcetype == "1A..."
| stats dc(host) as count_diff

comparing two searches and calculate the differences

other

Introducing the Splunk Community Dashboard Challenge!

Wondering How to Build Resiliency in the Cloud?

Updated Data Management and AWS GDI Inventory in Splunk Observability