Hi,
I'm new to Splunk but getting a lot of value from it. I've gotten a reasonable way using trial and error and a lot of referring to the forums and help, but I've now hit a challenge I can't solve, which is a bit fiddly to describe. Here goes...
I have a data set that's imported once a week, which contains vulnerability information about computers, pulled from a scanning system. I've been able to use "stats" to create a table of devices and the number of vulnerabilities per device. I had to use "dedup" so the same vulnerability didn't show up multiple times - although the data is added weekly I'm using 4 weeks worth of results in my search, to ensure I'm not missing out devices that may come and go.
However what I've now realised is that until a vulnerability falls outside my 4 week date range it'll stay in the report, even if its been resolved.
So as an example:
At the moment my report continues to show 3 defects until I hit week 6, when the defect that was fixed falls out of the 4 week range.
Every result has a date detected associated with it. What I want to do is only include results where the detected date matches whatever the latest detected date is for any defect against each device. So in my case above, once I've imported data for week 2 and am looking at 4 weeks worth together, defect1 will be in that data set but its detected date will be older than that for defect2 and defect3, so it could be safely ignored.
I can generate a table of devices showing each device and the correct latest "last detected" date that I would want to filter on (it could be different for each device), as follows:
| stats max(LastDetected) by Device
However I can't see a way to then use that calculated data to filter against the whole data set, as I've already piped it to get that information. I looked at using "eval" but that seems to be use data within each record discretely, rather than calculating across records, as stats allows.
I've not done a very good job explaining it so apologies in advance. If you can make head or tail of what I'm asking and have any wisdom you can share it may save my sanity...
Is this what you're aiming at?
List of devices and their vulnerabilities, showing only results from the latest available scan for each device:
sourcetype=vulnerability_scanner earliest=-4w
| streamstats max(_time) as maxTime by device
| where _time=maxTime
| stats dc(defect) by device
Drop the last line if you want the actual list, rather than the count of unique defects.
I would probably recommend using eventstats
rather than streamstats
in this case. Both will give the same results, but eventstats
may have slightly lower overhead in processing and map-reduce slightly better. (streamstats
requires sorting of the results that eventstats
does not, and thus eventstats
may have opportunies for optimization that aren't available to streamstats
.) This won't make any difference if have only a single indexer though.
That worked perfectly.
I hadn't used streamstats as yet but I see that its a very powerful command that resulted in a surprisingly elegant solution.
Many thanks for the swift response, and I'm sure I'll be back sooner rather than later to pick up more pearls of wisdom.