Splunk Search

Compare two values from extracted fields - if match increment counter

splunk_95
Explorer

Hi all,

Having read a few similar threads I realised they do not quite ask what I need so decided to post a new thread.

I have two extracted fields both 5 digits long lets say:

A = 12345 B=12345

I extracted these two field each from different sources (source 1 = "log a" and source 2 = "log b") over a 1 day interval.

Now lets say we get:

**source 1 = log a           and **                                           **source 2 = log b**                 
A = 12345                                                              B = 98765
A = 23456                                                              B = 12345 
A = 34678                                                              B = 87878

As matching values could be any instance of the other field (as shown above) it may be required to iterate through all values..(unless anyone can think of a better idea.. I essentially need to check the value in log A has made it log B).

If it has I would like to increment another field by 1 for every 'match' made. This would then be shown using a timechart.

I am fairly new to splunk so have found the answer may be using the eval command as follows:

index="..." source="log a" OR source="log b" | eval match= match + 1|where A==B | timechart span =1d count (match)

Its the 'iterator' (like in c++ say) bit im having a bit of trouble with... not sure how to get it to check each instance of A against every value of B.

Also just to say A and B are extracted fields already.

IF any feels this may be too performance heavy and have a better idea I'm all ears 😄

Thanks for any help in advance

DalJeanis
Legend

This puts the value of A or B into a single field matchfield so you can stats them together. We bin the _time at the 1 day level, and use the value of source as an easy proxy for remembering whether it is A or B. If there are two different sources, then we know we found both of them.

 index="..." source="log a" OR source="log b" 
| bin _time span=1d
| eval matchvalue = if( source="log a",A,B)
| stats values(source) as source by _time matchvalue
| where mvcount(source)>1
| timechart span=1d count

Updated to include the timechart line.

0 Karma

splunk_95
Explorer

Hi thanks for your suggestion.

Im a little unclear as to how I could get a count of the number of matches..

As ideally I would put the number of matches onto a timechart (so one column would be matches and another would be unique matches - dc(matches) for example)

From the code you wrote - how would I get the count of number of matches where A==B - just stats count(source) by _time matchvalue?

I have tried to stats count (matchvalue) but that didn't seem to work

0 Karma

DalJeanis
Legend

Every record that reaches the end of the code is exactly one unique match, so | stats count by _time is one way, or | timechart span=1d count is another.


If you need to know non-unique matches, then you need to define what you mean. If there are 4 A records and 5 B records, do you want the non-unique match number to be 4, 5, 8, 9 or 20? I'll assume 9 for this code, so the meaning of "match" is "records in either file that were matched in the other file".

  index="..." source="log a" OR source="log b" 
 | bin _time span=1d
 | eval matchvalue = if( source="log a",A,B)
 | stats values(source) as source, count(source="log a") as CountA, count(source="log b") as CountB by _time matchvalue
 | where mvcount(source)>1
 | eval CountMatch = CountA+CountB
 | stats count as DistinctMatchCount, sum(CountMatch) as TotalMatchCount by _time
 | untable _time series count
 | timechart span=1d count by series
0 Karma

splunk_95
Explorer

Thanks for your reply.
I apologize for the confusion.
So my definition for match is "an event in log A which is equivalent to an event in log B"
i.e (assume in both logs each event is always 5 digits)
log A :
A= 12345
A= 23456
A= 34567
A= 12345

Suppose log B:
B=54321
B=98765
B=34567
B=12345
B=12345

So for non unique 'match' i should get the value of CountMatch to equal 3.
For a 'unique' (i.e if the two events matched isnt previous match) a previous match I should get the value of 'CountMatch' to be 2 for the example above. I tried to understand the code above but I dont think it quite does that.. (please correct me if im wrong)?

Also does the fact there may be a different number of events in both logs make a difference to the code in your comment?

Many, many thanks - I have had a lot of problems with this - your help is really appreciated.

0 Karma

DalJeanis
Legend

@splunk_95 - try this. This is a count of all items in A where there was a match the same day in B.

index="..." source="log a" OR source="log b"
| bin _time span=1d
| eval matchvalue = if( source="log a",A,B)
| stats values(source) as source, count(source="log a") as CountA, count(source="log b") as CountB by _time matchvalue
| where mvcount(source)>1
| stats sum(CountA) as count by _time
| timechart span=1d count

0 Karma

3no
Communicator
index="..." source="log a" OR source="log b" | eval match=if(A==B,1,0) | timechart span =1d sum(match)

DalJeanis
Legend

No, that's going to check each individual event to see whether the values of A and B on that event match. Since they are coming from different indexes, match will never be other than 0.

0 Karma

splunk_95
Explorer

Hi
The search you suggested below didn't seem to work... what would be the best way to debug it?

0 Karma

splunk_95
Explorer

Thank you for the reply, though I would like to learn exactly how this answer works (for my splunk development).

Does that eval command check A against every instance of B? Sorry if that is a silly question.. I just cant see what logic makes it check that, kinda like the 'foreach' command in c#.

Also another criteria I had was that this only considered the events over a day so if you only place that threshold on the timechart it should be fine i.e some sort of 'span=1d or _time' is not needed near the eval command?

0 Karma

3no
Communicator

Yes, it will check for every value of A if it equals a value of B (same as foreach), if it match it will give to "match" the value of 1, else 0. Then you make the sum to know how much occurence you have.

The span=1d means it will sum "match" over one day, this means that if you make your search over a week you'll get 7 value (one for each day). I'm not sure to understand your question on that last part

0 Karma

splunk_95
Explorer

awesome thanks! Just as an extension, if I only wanted to consider only the unique values of A against values of B is that possible?

so if

A =12345
A= 23456
A= 23489
A= 12345 (This event would not be compared against all values of B)

Also the spl doesn't seem to be working I checked the extracted fields and can see matching values in both A and B but match seems to return a value of zero... any idea how best to debug?

0 Karma

3no
Communicator

Yes, my bad A and B are not in the same event (as DalJeanis said)

How about if you try this way :

index="..." (source="log a" OR source="log b") | rename B as A | dedup A, source | stats count by A | where count > 1 | table A | stats count

3no

0 Karma

3no
Communicator

index="..." (source="log a" OR source="log b") // show the data

| rename B as A // rename fields B to field A
| dedup A, source // show the unique value of A by source (so you know which are original A and wich are original B)
| stats count by A // Count by field A
| where count > 1 // We take only the field A where the count is superior to 1, because if the value was on A and B count should be 2
| table A // show this values
| stats count // return the count

Try from the beginning and start adding each command to see if it gives you the correct values (when I say command, I mean everything that comes after a pipe "|")

And let me know how it goes 🙂

3no

0 Karma

splunk_95
Explorer

hey 🙂
thanks for the reply.
So essentially I feel "rename B as A" is not working however this seems to fail at "where count > 1". I look through the values of A and the top 10 values the count is 1. There is also no indication of increase in number of 'A' events after the renaming.

Do you reckon sorting it by source like the other example would be better?

I feel the renaming isn't exactly doing what we would like.

Ideally I can get this out the count of matched values from both logs into a timechart instead of a table that would be great.

0 Karma
Get Updates on the Splunk Community!

Introducing Splunk Enterprise 9.2

WATCH HERE! Watch this Tech Talk to learn about the latest features and enhancements shipped in the new Splunk ...

Adoption of RUM and APM at Splunk

    Unleash the power of Splunk Observability   Watch Now In this can't miss Tech Talk! The Splunk Growth ...

Routing logs with Splunk OTel Collector for Kubernetes

The Splunk Distribution of the OpenTelemetry (OTel) Collector is a product that provides a way to ingest ...