Solved: How to write a search for the most occurring field...

sshkaya3344 · ‎10-20-2014

Device Table1 Table2 Table3 Table4 Table5
Name1   XP      XP    XP     XP     XP 
Name2    7      7    XP     Null    XP
Name3   XP      XP    XP     7     2000 
Name4   XP     Null   Null  Null    XP 
Name5   7       7      7    Null   Null

I am trying to pull the best probable match per record (the best match would be the most occuring string per record).
and also would like to calculate the probability percentage of those matches. I tried several commands but not getting idea how to get there. Need help please.

martin_mueller · ‎10-20-2014

Here's a thought:

| stats count | eval device = "Name1 XP XP XP XP XP
Name2 7 7 XP  XP
Name3 XP XP XP 7 2000
Name4 XP    XP
Name5 7 7 7   " | makemv delim="
" device | mvexpand device | rex field=device "(?<device>\S+) (?<table1>\S+)? (?<table2>\S+)? (?<table3>\S+)? (?<table4>\S+)? (?<table5>\S+)?"
| foreach table* [eval match = mvappend(match , <<FIELD>>)] | top device match | dedup device | sort device

The first six lines set up dummy data, the last line is the actual search. Here's the result for your example:

device  match  count  percent
Name1      XP      5  100.000000
Name2      XP      2  40.000000
Name3      XP      3  60.000000
Name4      XP      2  40.000000
Name5       7      3  60.000000

I didn't take care to resolve the conflict for device Name2 where 7 and XP are equally likely. Is that relevant to your use case?
Similarly, I treated null values as invalid rather than letting device Name4 be voted null.

View solution in original post

martin_mueller · ‎10-20-2014

Here's a thought:

| stats count | eval device = "Name1 XP XP XP XP XP
Name2 7 7 XP  XP
Name3 XP XP XP 7 2000
Name4 XP    XP
Name5 7 7 7   " | makemv delim="
" device | mvexpand device | rex field=device "(?<device>\S+) (?<table1>\S+)? (?<table2>\S+)? (?<table3>\S+)? (?<table4>\S+)? (?<table5>\S+)?"
| foreach table* [eval match = mvappend(match , <<FIELD>>)] | top device match | dedup device | sort device

The first six lines set up dummy data, the last line is the actual search. Here's the result for your example:

device  match  count  percent
Name1      XP      5  100.000000
Name2      XP      2  40.000000
Name3      XP      3  60.000000
Name4      XP      2  40.000000
Name5       7      3  60.000000

I didn't take care to resolve the conflict for device Name2 where 7 and XP are equally likely. Is that relevant to your use case?
Similarly, I treated null values as invalid rather than letting device Name4 be voted null.

martin_mueller · ‎10-21-2014

Sure, it just takes a slightly different route than a simple dedup.

... | foreach table* [eval match = mvappend(match , <<FIELD>>)] | top device match
    | eventstats max(count) as max by device | where count==max | fields - max | sort device

Same until the top.

sshkaya3344 · ‎10-21-2014

Thank you so much for the quick reply and it's quite helpful. Yes, in equally likely case can I display both matches?

Regards,
Sanju
,Thank you so much for the quick reply and it's quite helpful. Yes, in equally likely case, can we display both matches??

Regards,
Sanju

How to write a search for the most occurring field value per record and calculate their probability percentage?

.conf24 | Registration Open!

ICYMI - Check out the latest releases of Splunk Edge Processor

Introducing the 2024 SplunkTrust!