Solved: dedup within timechart

sloshburch · ‎07-05-2013

I have several searches that I am trying to optimize now that our platform is on splunk 5+. My preference is to leverage report acceleration because of its ability to dynamically back-fill the way it efficiently runs in the background.

Unfortunately, several of my searches use a dedup on multiple fields (ie: dedup field1 field2 field3) and then runs timechart against one of those fields (ie: timechart span=1d field1). The use of dedup before timechart prevents report acceleration from being used as its not a streamable command.

I'm trying to find a way to eliminate to enable this search for report acceleration.

I've tried removing the dedup and playing with distinctcount(field1 field2 field3) but that failed. I also tried timechart span=1d dc(field1) by field2 field3 but that also is not allowed. I'm suspicious that I'm overlooking a trivial way to do this. Perhaps the community can enlighten me?

sloshburch · ‎07-17-2013

I believe I found a solution: do a stats count by field1 field2 field3 where field3 is the timepan (in this case, just the day of the _time). If I'm thinking clearly, that will dedup by those three fields. Then, if I want a total count, I can do another stats count. This results in a distinct count. I believe this should be more efficient than the eval approach.

View solution in original post

sloshburch · ‎07-17-2013

I believe I found a solution: do a stats count by field1 field2 field3 where field3 is the timepan (in this case, just the day of the _time). If I'm thinking clearly, that will dedup by those three fields. Then, if I want a total count, I can do another stats count. This results in a distinct count. I believe this should be more efficient than the eval approach.

sloshburch · ‎07-08-2013

The goal is to enable report acceleration on a pre-existing saved search - but the saved search was designed with dedup on several fields before the timechart command. So the folks that use the saved search want to timechart some distinct values. Is that more clear? Thanks for the clarifying questions.

martin_mueller · ‎07-05-2013

And what are you trying to achieve here, in natural language rather than SPL?

sloshburch · ‎07-05-2013

Notice the use of dc(eval(fieldD.";".fieldE.";".Date1)). My thought was to create a unique string from the fields I would dedup against, then get a distinct count of those. Again, I assume I'm hacking this and there's probably a more trivial approach I should be using.

sloshburch · ‎07-05-2013

Fair request. I tried to abstract the company stuff so hopefully this still is clear for the community:

index=a ( sourcetype="b" OR sourcetype="c" ) ( source="/path/file1*" OR source="/path/file2*" ) fieldA=* ( fieldB=val1 OR fieldC=val2 )
| convert timeformat="%m/%d/%y" ctime(_time) as Date1
| timechart span=1day dc(eval(fieldD.";".fieldE.";".Date1)) as countFieldName
| convert timeformat="%m/%d/%y" ctime(_time) as Date
| table Date countFieldName

martin_mueller · ‎07-05-2013

Post your search and what you want to achieve with it - maybe there is an entirely different approach.

dedup within timechart

Stay Connected: Your Guide to May Tech Talks, Office Hours, and Webinars!

They're back! Join the SplunkTrust and MVP at .conf24

Enterprise Security Content Update (ESCU) | New Releases