I'm working on some statistics related queries. I'm trying to get the security id, date and count of hosts connected to.
index=wineventlog sourcetype="WinEventLog:Security" 4624 |
fields host,Security_ID,_time |
bucket _time span=1d |
stats dc(host) by Security_ID, _time
They work perfectly until I start adding Security_ID. With no by
command or only based on time it's fast.
I also tried to do a dedup Security_ID, _time, host
before the stats dc command but it didn't help the overall speed.
It takes well over 10 minutes to complete this search for a week, and I'd like to be able to run this for 30 60 or 90 days. What do I need to do for that to be viable?
Heya @timbCFCA, if DalJeanis solved your problem, please don't forget to accept an answer! You can upvote posts as well. (Karma points will be awarded for either action.) Happy Splunking!
Seems puzzling, I do see that Security_ID gets used as a source key several times in transforms.conf in the TA, I wonder if that causes any overhead. Maybe try specifying EventCode=4624 so that it isn't searching through all fields looking for 4624.
How often are you running the search? If you are running it fairly often, then you might consider a summary index so you don't have to re-spin the whole search multiple times a day.
Try this -
index=wineventlog sourcetype="WinEventLog:Security" 4624
| fields host, Security_ID, _time
| eval _time=floor(_time/86400)*86400
| dedup host,Security_ID,_time
| stats dc(host) by Security_ID, _time
Explanation -
eval
is streaming and distributed, while bin
is not. This way, the binning can be done at the individual indexers.
Try it with and without the dedup
and see what happens.
@DalJeanis not really. It did let me save as an accelerated search which helps.
@timbCFCA - Well, that's a decent consolation prize.
Did you try it without the dedup
, which is redundant with the dc()
?
@timbCFCA - Did this change the run time at all?