Adding dedup _raw before timechart returns 0 resul...

tommy_o · ‎01-07-2014

I apologize if this is asked already but I search to no avail.

When writing a Splunk query that will eventually be used for summary indexing using sitimechart, I have this query:

index=app sourcetype=<removed> host=<removed> earliest=-10d
    | eval Success_Count=if(scs=="True",1,0)
    | eval Failure_Count=if(scs=="False",0,1)
    | timechart span=1d sum(Success_Count) as SuccessCount sum(Failure_Count) as FailureCount count as TotalCount by host

Results are as expected. However, some data was accidentally indexed twice, so I need to remove duplicates. If I'm doing a regular search, I just use | dedup _raw to remove the identical events. However, if I run the following query, I get zero results returned (no matter where I put | dedup _raw😞

index=app sourcetype=<removed> host=<removed> earliest=-10d
    | dedup _raw
    | eval Success_Count=if(scs=="True",1,0)
    | eval Failure_Count=if(scs=="False",0,1)
    | timechart span=1d sum(Success_Count) as SuccessCount count(Failure_Count) as FailureCount count as TotalCount by host

What am I doing wrong? I'm using Splunk 4.3.2.

tommy_o · ‎01-07-2014

They have the same timestamp

somesoni2 · ‎01-07-2014

Try following

index=app sourcetype=<removed> host=<removed> earliest=-10d
    | fields _time, scs,host
    | dedup _time, scs,host
    | timechart span=1d count(eval(scs="True")) as SuccessCount count(eval(scs="False")) as FailureCount count as TotalCount by host

somesoni2 · ‎01-07-2014

Is your duplicate records issue resolved?

tommy_o · ‎01-07-2014

Okay, thank you for the confirmation. That was written in an internal corporate document and I wasn't getting any summary data in my index -- I was thinking my use of eval on the same line as sitimechart may have been causing that problem (but glad to hear that it shouldnt be). Thanks again.

somesoni2 · ‎01-07-2014

I am able to use eval() in the sameline as sitimechart command (and don't see any restriction about same in the documentation).

tommy_o · ‎01-07-2014

I was under the impression that I couldn't use eval() on the same line as sitimechart (which I will be switching over to once I've ironed out this duplicate problem). Is that not correct? This is essentially what my query looked like originally.

somesoni2 · ‎01-07-2014

True, best approach would to be to include all the fields which make an event unique in the "fields" and "dedup" clause, so that all those legit events are not getting filtered out.

lukejadamec · ‎01-07-2014

Careful with that. Depending on the volume and timestamp extraction you can have many legit non-duplicate events with the same timestamp that will hidded by deduping _time.

somesoni2 · ‎01-07-2014

When you said the data was duplicated, duplicate events have same timestamp or different?

tommy_o · ‎01-07-2014

There's a type-o on the eval Failure_Count line, but the reCaptcha blocked me from editing 😞 Edit: there should have been sum(), sum(), count but again, captcha is keeping me from fixing that.

Adding dedup _raw before timechart returns 0 results

Introducing the Splunk Community Dashboard Challenge!

Get the T-shirt to Prove You Survived Splunk University Bootcamp

Wondering How to Build Resiliency in the Cloud?