Getting Data In

Differences between two sourcetype according to a field

lufermalgo
Path Finder

I need to know what events are on the sourcetype A that are not in the sourcetype B.

the query must evaluate more than 1 million events

consecutive   consecutive
000234456     000234456
000977832     000977832
000383748     000183846
000773732     000773732
000859484     000339274
000272849     000646473
000263664     000263664
000183846
000339274
000646473
sourcetypeA   sourcetypeB
Tags (3)
0 Karma
1 Solution

martin_mueller
SplunkTrust
SplunkTrust

Replacing the count with a dc(sourcetype) would lift that restriction... however, the question implies that events in B but not in A aren't interesting, which would not be distinguishable with count or dc(sourcetype) alone. Here's how to solve that:

sourcetype=A OR sourcetype=B | stats values(sourcetype) as sourcetypes by consecutive | search sourcetypes=A NOT sourcetypes=B

Note, this will work for millions of events but might break, slow down, or at least consume a lot of memory for millions of distinct values of consecutive... stats basically has to keep millions of buckets around in that case.
Is there any relationship between the timestamps of A and B? If so, using transaction with short spans might use massively less memory:

sourcetype=A OR sourcetype=B | transaction consecutive maxspan=1m keepevicted=t | search sourcetype=A NOT sourcetypes=B

In theory there's also this:

sourcetype=A NOT [search sourcetype=B | dedup consecutive | fields consecutive]

But that again might (will) break for millions of distinct consecutive values.

View solution in original post

lufermalgo
Path Finder

Thanks @martin_mueller and @gpradeepkumarreddy for their contributions.

Take one of the recommendations of @martin_mueller and Realize the following query:

index=ilt sourcetype=boleta_generada OR sourcetype=boleta_indexado TIPO_DOCUMENTO="BE PEDIDO" CAMPANA>="201411" | dedup CONSECUTIVO WF_ESTADO | eventstats values(sourcetype) as sourcetypes by CONSECUTIVO | search sourcetypes=boleta_generada NOT sourcetypes=boleta_indexado | stats count

Effectively this give me the expected result but equally was very slow to deliver the answer.

Martin_mueller question about whether I could create a transaction with the transaction command, could not, as the events in the sourcetype=boleta_indexado can crearce after many days.

I want to share with you that I did another query graph and I'm getting basically what I would like to integrate other bar to show me the difference:

index=ilt sourcetype=boleta_generada OR sourcetype=boleta_indexado TIPO_DOCUMENTO="BE PEDIDO" (CAMPANA>="201408") | dedup CONSECUTIVO WF_ESTADO | chart count over CAMPANA by WF_ESTADO | sort -WF_ESTADO

chart:
https://drive.google.com/file/d/0B1HsOnBT01xZWDcyY2dTNUUwZFE/edit?usp=sharing

The WF_ESTADO field contains the values​​:

GENERATED
INDEXED

I would like the other bar that was called MISSING paint.

0 Karma

martin_mueller
SplunkTrust
SplunkTrust

Replacing the count with a dc(sourcetype) would lift that restriction... however, the question implies that events in B but not in A aren't interesting, which would not be distinguishable with count or dc(sourcetype) alone. Here's how to solve that:

sourcetype=A OR sourcetype=B | stats values(sourcetype) as sourcetypes by consecutive | search sourcetypes=A NOT sourcetypes=B

Note, this will work for millions of events but might break, slow down, or at least consume a lot of memory for millions of distinct values of consecutive... stats basically has to keep millions of buckets around in that case.
Is there any relationship between the timestamps of A and B? If so, using transaction with short spans might use massively less memory:

sourcetype=A OR sourcetype=B | transaction consecutive maxspan=1m keepevicted=t | search sourcetype=A NOT sourcetypes=B

In theory there's also this:

sourcetype=A NOT [search sourcetype=B | dedup consecutive | fields consecutive]

But that again might (will) break for millions of distinct consecutive values.

pradeepkumarg
Influencer

if the value exists in both the sourcetypes do they exist more than once in each sourcetype? If not, below query might help you

sourcetype=sourcetypeA OR sourcetype=sourcetypeB | stats count by consecutive | search count < 2

0 Karma
Get Updates on the Splunk Community!

Introducing the 2024 SplunkTrust!

Hello, Splunk Community! We are beyond thrilled to announce our newest group of SplunkTrust members!  The ...

Introducing the 2024 Splunk MVPs!

We are excited to announce the 2024 cohort of the Splunk MVP program. Splunk MVPs are passionate members of ...

Splunk Custom Visualizations App End of Life

The Splunk Custom Visualizations apps End of Life for SimpleXML will reach end of support on Dec 21, 2024, ...