Re: Can I filter results based on calculations acr...

nuuki · ‎11-08-2010

Hi,

I'm new to Splunk but getting a lot of value from it. I've gotten a reasonable way using trial and error and a lot of referring to the forums and help, but I've now hit a challenge I can't solve, which is a bit fiddly to describe. Here goes...

I have a data set that's imported once a week, which contains vulnerability information about computers, pulled from a scanning system. I've been able to use "stats" to create a table of devices and the number of vulnerabilities per device. I had to use "dedup" so the same vulnerability didn't show up multiple times - although the data is added weekly I'm using 4 weeks worth of results in my search, to ensure I'm not missing out devices that may come and go.

However what I've now realised is that until a vulnerability falls outside my 4 week date range it'll stay in the report, even if its been resolved.

So as an example:

Device1 has 3 results (representing defects) in the data import for week 1.
By week 2, 1 of these defects (say "defect1") is fixed. The import for that week now shows 2 results for device1 (defects2 and 3).

At the moment my report continues to show 3 defects until I hit week 6, when the defect that was fixed falls out of the 4 week range.

Every result has a date detected associated with it. What I want to do is only include results where the detected date matches whatever the latest detected date is for any defect against each device. So in my case above, once I've imported data for week 2 and am looking at 4 weeks worth together, defect1 will be in that data set but its detected date will be older than that for defect2 and defect3, so it could be safely ignored.

I can generate a table of devices showing each device and the correct latest "last detected" date that I would want to filter on (it could be different for each device), as follows:

| stats max(LastDetected) by Device

However I can't see a way to then use that calculated data to filter against the whole data set, as I've already piped it to get that information. I looked at using "eval" but that seems to be use data within each record discretely, rather than calculating across records, as stats allows.

I've not done a very good job explaining it so apologies in advance. If you can make head or tail of what I'm asking and have any wisdom you can share it may save my sanity...

southeringtonp · ‎11-08-2010

Is this what you're aiming at?

List of devices and their vulnerabilities, showing only results from the latest available scan for each device:

sourcetype=vulnerability_scanner earliest=-4w
| streamstats max(_time) as maxTime by device
| where _time=maxTime
| stats dc(defect) by device

Drop the last line if you want the actual list, rather than the count of unique defects.

gkanapathy · ‎01-04-2011

I would probably recommend using eventstats rather than streamstats in this case. Both will give the same results, but eventstats may have slightly lower overhead in processing and map-reduce slightly better. (streamstats requires sorting of the results that eventstats does not, and thus eventstats may have opportunies for optimization that aren't available to streamstats.) This won't make any difference if have only a single indexer though.

nuuki · ‎11-09-2010

That worked perfectly.

I hadn't used streamstats as yet but I see that its a very powerful command that resulted in a surprisingly elegant solution.

Many thanks for the swift response, and I'm sure I'll be back sooner rather than later to pick up more pearls of wisdom.

Can I filter results based on calculations across records?

Detecting Remote Code Executions With the Splunk Threat Research Team

Observability | Use Synthetic Monitoring for Website Metadata Verification

More Ways To Control Your Costs With Archived Metrics | Register for Tech Talk