I need to know what events are on the sourcetype A that are not in the sourcetype B.
the query must evaluate more than 1 million events
consecutive consecutive
000234456 000234456
000977832 000977832
000383748 000183846
000773732 000773732
000859484 000339274
000272849 000646473
000263664 000263664
000183846
000339274
000646473
sourcetypeA sourcetypeB
Replacing the count
with a dc(sourcetype)
would lift that restriction... however, the question implies that events in B but not in A aren't interesting, which would not be distinguishable with count
or dc(sourcetype)
alone. Here's how to solve that:
sourcetype=A OR sourcetype=B | stats values(sourcetype) as sourcetypes by consecutive | search sourcetypes=A NOT sourcetypes=B
Note, this will work for millions of events but might break, slow down, or at least consume a lot of memory for millions of distinct values of consecutive
... stats
basically has to keep millions of buckets around in that case.
Is there any relationship between the timestamps of A and B? If so, using transaction
with short spans might use massively less memory:
sourcetype=A OR sourcetype=B | transaction consecutive maxspan=1m keepevicted=t | search sourcetype=A NOT sourcetypes=B
In theory there's also this:
sourcetype=A NOT [search sourcetype=B | dedup consecutive | fields consecutive]
But that again might (will) break for millions of distinct consecutive
values.
Thanks @martin_mueller and @gpradeepkumarreddy for their contributions.
Take one of the recommendations of @martin_mueller and Realize the following query:
index=ilt sourcetype=boleta_generada OR sourcetype=boleta_indexado TIPO_DOCUMENTO="BE PEDIDO" CAMPANA>="201411" | dedup CONSECUTIVO WF_ESTADO | eventstats values(sourcetype) as sourcetypes by CONSECUTIVO | search sourcetypes=boleta_generada NOT sourcetypes=boleta_indexado | stats count
Effectively this give me the expected result but equally was very slow to deliver the answer.
Martin_mueller question about whether I could create a transaction with the transaction
command, could not, as the events in the sourcetype=boleta_indexado
can crearce after many days.
I want to share with you that I did another query graph and I'm getting basically what I would like to integrate other bar to show me the difference:
index=ilt sourcetype=boleta_generada OR sourcetype=boleta_indexado TIPO_DOCUMENTO="BE PEDIDO" (CAMPANA>="201408") | dedup CONSECUTIVO WF_ESTADO | chart count over CAMPANA by WF_ESTADO | sort -WF_ESTADO
chart:
https://drive.google.com/file/d/0B1HsOnBT01xZWDcyY2dTNUUwZFE/edit?usp=sharing
The WF_ESTADO field contains the values:
GENERATED
INDEXED
I would like the other bar that was called MISSING paint.
Replacing the count
with a dc(sourcetype)
would lift that restriction... however, the question implies that events in B but not in A aren't interesting, which would not be distinguishable with count
or dc(sourcetype)
alone. Here's how to solve that:
sourcetype=A OR sourcetype=B | stats values(sourcetype) as sourcetypes by consecutive | search sourcetypes=A NOT sourcetypes=B
Note, this will work for millions of events but might break, slow down, or at least consume a lot of memory for millions of distinct values of consecutive
... stats
basically has to keep millions of buckets around in that case.
Is there any relationship between the timestamps of A and B? If so, using transaction
with short spans might use massively less memory:
sourcetype=A OR sourcetype=B | transaction consecutive maxspan=1m keepevicted=t | search sourcetype=A NOT sourcetypes=B
In theory there's also this:
sourcetype=A NOT [search sourcetype=B | dedup consecutive | fields consecutive]
But that again might (will) break for millions of distinct consecutive
values.
if the value exists in both the sourcetypes do they exist more than once in each sourcetype? If not, below query might help you
sourcetype=sourcetypeA OR sourcetype=sourcetypeB | stats count by consecutive | search count < 2