Solved: How do I get a count from multiple sources after m...

jlosee · ‎08-14-2015

Hi, I'm redoing a search to avoid using join as it was truncating results. I'm trying to get a count after searching multiple sources and using values(field) followed by mvexpand(field), and I'm not getting the counts I expect. This is my query:

(index=index1) OR (index=index2 source="source1" OR source="source2") | stats values(field1) values(field2) values(field3) values(field4) values(field5) by field6| rename values(field1) AS Field1 values(field2) AS Field2 values(field3) as Field3 values(field4) as Field4 values(field5) as Field5 | mvexpand Field1 | mvexpand Field2  | mvexpand Field3 | mvexpand Field4 | mvexpand Field5   | stats count by Field1 , Field2 , Field3 , Field4 , Field5 | eventstats count as Total by Field1 | addcoltotals

So in the old query with join I was getting count values in the hundreds or thousands but with this, most are just 1. Is there something I'm doing wrong with mvexpand? Should I be using eventstats or streamstats?

Hopefully this question makes sense. Thanks!

jlosee · ‎08-18-2015

Okay well after talking to someone else on my team we figured a way around all this. We're still using join but we don't have any missing entries. It look like this:

index=index1
| rex field=_raw "rex_field=xxx(?<rexA>\w+.\w+)%23(?<rexB>\w+)" 
| dedup rexA rexB
| top limit=50 showperc=false rexB  
| sort count desc  

| join type=left rexB [search index=index2 source="source1"
| table rexA fieldA fieldB... 
| dedup fieldA] 

| table fieldA rexB fieldB...count 
| search fieldA!="" | head 25 
| addcoltotals labelfield=fieldA label="Top 25 Total"

We also added information from source2 to source1 to simplify things. Thanks for your help everyone!

View solution in original post

jlosee · ‎08-18-2015

Okay well after talking to someone else on my team we figured a way around all this. We're still using join but we don't have any missing entries. It look like this:

index=index1
| rex field=_raw "rex_field=xxx(?<rexA>\w+.\w+)%23(?<rexB>\w+)" 
| dedup rexA rexB
| top limit=50 showperc=false rexB  
| sort count desc  

| join type=left rexB [search index=index2 source="source1"
| table rexA fieldA fieldB... 
| dedup fieldA] 

| table fieldA rexB fieldB...count 
| search fieldA!="" | head 25 
| addcoltotals labelfield=fieldA label="Top 25 Total"

We also added information from source2 to source1 to simplify things. Thanks for your help everyone!

somesoni2 · ‎08-15-2015

Try something like this

index=index1 OR (index=index2 source="source1" source="source2" source="source3") |  rex field=rex_field "aaa<partA1>aaa" | rex field=rex_field "aaa<partB1> | eval AB=if(index="index1",partA1.":".partB1, partA.":".partB) | dedup AB | fillnull value="0"  | stats count by field1,AB,field2,field3,field4, field5,count | addcoltotals

martin_mueller · ‎08-18-2015

if() and case() can do the same things, they just look different.

jlosee · ‎08-15-2015

It seems that it will only search whichever index I specify in the if clause. What about using case?

woodcock · ‎08-14-2015

Give us 2 or 3 events from each source and then give us (based on that data) the desired output followed by a description of the steps take to produce the output. I have no idea what you mean based on what you have shown us so far.

richgalloway · ‎08-14-2015

What are you trying to count? The stats values(field) command dedups its results. If you want to count all values rather than unique ones, use stats list(field).

---
If this reply helps you, Karma would be appreciated.

jlosee · ‎08-14-2015

Okay I've been trying out list instead and it looks promising. How would I get a sum of the counts that share some of the fields? e.g. I would like to sum up the ones that have the same field1 and field2.

richgalloway · ‎08-14-2015

Try this: ... | stats sum(eval x=if(field1=field2,1,0)) ...

---
If this reply helps you, Karma would be appreciated.

jlosee · ‎08-14-2015

Oh sorry what I meant was something like this:
field1 field2 field3
bar foo something
bar foo somethingElse

woodcock · ‎08-14-2015

What was the old query?

jlosee · ‎08-14-2015

index=index1  | rex field=rex_field "aaa<partA>aaa" | rex field=rex_field "aaa<partB> | eval AB=partA.":".partB | dedup field_foo,AB |  join type=left AB [search index=index2 source="source1" |   eval AB=partA.":".partB | dedup AB| table AB,field1,field4,field5]  | join type=left AB [search index=index2 source="source2"   |   eval AB=partA.":".partB | dedup AB |  table field1 ,AB, field7] | dedup AB | fillnull value="0"  | table  field1,AB,field2,field3,field4, field5,count | addcoltotals

woodcock · ‎08-14-2015

Wow. I strongly suspect (mostly because of the dedups) that this search is also not doing what you would like it to. Will you back up and explain what is in the data sets and how you are trying to connect them?

jlosee · ‎08-14-2015

Well at a very high level, index1 has a list of assets we are sending out, whereas index2 is more of a list of assets that we have available. We want to count the assets going through the first index with more information from the second (and the two sources of that index have some of their own data). We have similar fields in both indexes but their not quite called the same thing, so we use regex to combine fields in such a way that we have a matching field in both indexes. I think without the dedups we would just have a lot of unnecessary repeats. In my new query I was trying to avoid the regex but it may be necessary after all.

Sorry I don't think I can be more detailed. Does that clarify things?

How do I get a count from multiple sources after mvexpand on multiple fields?

ICYMI - Check out the latest releases of Splunk Edge Processor

Introducing the 2024 SplunkTrust!

Introducing the 2024 Splunk MVPs!