I apologize if this is asked already but I search to no avail.
When writing a Splunk query that will eventually be used for summary indexing using sitimechart, I have this query:
index=app sourcetype=<removed> host=<removed> earliest=-10d
| eval Success_Count=if(scs=="True",1,0)
| eval Failure_Count=if(scs=="False",0,1)
| timechart span=1d sum(Success_Count) as SuccessCount sum(Failure_Count) as FailureCount count as TotalCount by host
Results are as expected. However, some data was accidentally indexed twice, so I need to remove duplicates. If I'm doing a regular search, I just use | dedup _raw
to remove the identical events. However, if I run the following query, I get zero results returned (no matter where I put | dedup _raw
😞
index=app sourcetype=<removed> host=<removed> earliest=-10d
| dedup _raw
| eval Success_Count=if(scs=="True",1,0)
| eval Failure_Count=if(scs=="False",0,1)
| timechart span=1d sum(Success_Count) as SuccessCount count(Failure_Count) as FailureCount count as TotalCount by host
What am I doing wrong? I'm using Splunk 4.3.2.
They have the same timestamp
Try following
index=app sourcetype=<removed> host=<removed> earliest=-10d
| fields _time, scs,host
| dedup _time, scs,host
| timechart span=1d count(eval(scs="True")) as SuccessCount count(eval(scs="False")) as FailureCount count as TotalCount by host
Is your duplicate records issue resolved?
Okay, thank you for the confirmation. That was written in an internal corporate document and I wasn't getting any summary data in my index -- I was thinking my use of eval on the same line as sitimechart may have been causing that problem (but glad to hear that it shouldnt be). Thanks again.
I am able to use eval() in the sameline as sitimechart command (and don't see any restriction about same in the documentation).
I was under the impression that I couldn't use eval()
on the same line as sitimechart (which I will be switching over to once I've ironed out this duplicate problem). Is that not correct? This is essentially what my query looked like originally.
True, best approach would to be to include all the fields which make an event unique in the "fields" and "dedup" clause, so that all those legit events are not getting filtered out.
Careful with that. Depending on the volume and timestamp extraction you can have many legit non-duplicate events with the same timestamp that will hidded by deduping _time.
When you said the data was duplicated, duplicate events have same timestamp or different?
There's a type-o on the eval Failure_Count
line, but the reCaptcha blocked me from editing 😞 Edit: there should have been sum(), sum(), count
but again, captcha is keeping me from fixing that.