Hi guys,
I am very new to Splunk (about 1 month or so) and I am having some trouble incorporating "set diff" into my search to compare the same search from previous day to now. Can anyone help??
Below is my search used to retrieve the data i am looking for i am unsure where and how "set diff" should be used.
index=security sourcetype= (blank) source="myfiles.csv"
| dedup displayName actions | table displayName actions
| rex field=actions "hosts\W+(?P.*?)]" max_match=0
| table displayName Hosts | mvexpand Hosts | makemv delim="," Hosts | rex mode=sed field=Hosts "s/'/ /g"
| mvexpand Hosts
Using set diff, it has to be done like this (two subsearches, one selecting data for yesterday and other selecting data for today)
| set diff [search index=security sourcetype= (blank) source="myfiles.csv" earliest=-1d@d latest=@d
| dedup displayName actions | table displayName actions
| rex field=actions "hosts\W+(?P<Hosts>.*?)]" max_match=0
| table displayName Hosts | mvexpand Hosts | makemv delim="," Hosts | rex mode=sed field=Hosts "s/'/ /g"
| mvexpand Hosts] [search index=security sourcetype= (blank) source="myfiles.csv" earliest=@d latest=now
| dedup displayName actions | table displayName actions
| rex field=actions "hosts\W+(?P<Hosts>.*?)]" max_match=0
| table displayName Hosts | mvexpand Hosts | makemv delim="," Hosts | rex mode=sed field=Hosts "s/'/ /g"
| mvexpand Hosts]
But I would rather do it using below, better search (no subsearch, selecting data for yesterday and today)
index=security sourcetype= (blank) source="myfiles.csv" earliest=-1d@d latest=now
| bucket span=1d _time
| dedup _time displayName actions | table _time displayName actions
| rex field=actions "hosts\W+(?P<Hosts>.*?)]" max_match=0
| table _time displayName Hosts | mvexpand Hosts | makemv delim="," Hosts | rex mode=sed field=Hosts "s/'/ /g"
| mvexpand Hosts
| stats dc(_time) as daysReported by displayName Hosts | where daysReported=1 | table displayName Hosts
I would not use set diff
for many reasons; try this:
index=security sourcetype= (blank) source="myfiles.csv" earliest=-2d@d latest=@d
| dedup displayName actions
| fields displayName actions
| rex field=actions max_match=0 "hosts\W+(?P.*?)]"
| fields -actions
| mvexpand Hosts
| makemv delim="," Hosts | rex mode=sed field=Hosts "s/'/ /g"
| mvexpand Hosts
| bin _time span=1d
| eval _time = if(_time < relative_time(now(), "-1d@d"), "YesterYesterDay", "YesterDay")
| chart values(Hosts) OVER displayName BY _time
| nomv YesterYesterDay
| nomv YesterDay
| rex field=YesterYesterDay mode=sed "s/[\r\n\s]+/;/g"
| rex field=YesterDay mode=sed "s/[\r\n\s]+/;/g"
| eval setdiff = split(replace(replace(replace(replace(mvjoin(mvsort(mvappend(split(replace(YesterYesterDay, "(;|$)", "#1;"), ";"), split(replace(YesterDay, "(;|$)", "#0;"), ";"))), ";"), ";(\w+)#0\;\1#1", ""), ";\w+#1", ""), "#0", ""), ";(?!\w)|^;", ""), ";")
| makemv delim=";" YesterYesterDay
| makemv delim=";" YesterDay
See this run-anywhere example:
| tstats values(sourcetype) AS sourcetype WHERE index=_* earliest=-2d@d latest=@d BY host _time span=1d
| eval _time = if(_time < relative_time(now(), "-1d@d"), "YesterYesterDay", "YesterDay")
| chart values(sourcetype) OVER host BY _time
| nomv YesterYesterDay
| nomv YesterDay
| rex field=YesterYesterDay mode=sed "s/[\r\n\s]+/;/g"
| rex field=YesterDay mode=sed "s/[\r\n\s]+/;/g"
| eval setdiff = split(replace(replace(replace(replace(mvjoin(mvsort(mvappend(split(replace(YesterYesterDay, "(;|$)", "#1;"), ";"), split(replace(YesterDay, "(;|$)", "#0;"), ";"))), ";"), ";(\w+)#0\;\1#1", ""), ";\w+#1", ""), "#0", ""), ";(?!\w)|^;", ""), ";")
| makemv delim=";" YesterYesterDay
| makemv delim=";" YesterDay
It manually calculates the difference between two multivalue fields.
Attention @darrenaefc, I had a mistake in my original answer. It works properly now.
Which was the core of OPs ask.
You can thank @martin_mueller for that setdiff
line.
@woodcock, whats this last eval setdiff
is doing here ?
It had a bug and wasn't working right. Try it now.
Using set diff, it has to be done like this (two subsearches, one selecting data for yesterday and other selecting data for today)
| set diff [search index=security sourcetype= (blank) source="myfiles.csv" earliest=-1d@d latest=@d
| dedup displayName actions | table displayName actions
| rex field=actions "hosts\W+(?P<Hosts>.*?)]" max_match=0
| table displayName Hosts | mvexpand Hosts | makemv delim="," Hosts | rex mode=sed field=Hosts "s/'/ /g"
| mvexpand Hosts] [search index=security sourcetype= (blank) source="myfiles.csv" earliest=@d latest=now
| dedup displayName actions | table displayName actions
| rex field=actions "hosts\W+(?P<Hosts>.*?)]" max_match=0
| table displayName Hosts | mvexpand Hosts | makemv delim="," Hosts | rex mode=sed field=Hosts "s/'/ /g"
| mvexpand Hosts]
But I would rather do it using below, better search (no subsearch, selecting data for yesterday and today)
index=security sourcetype= (blank) source="myfiles.csv" earliest=-1d@d latest=now
| bucket span=1d _time
| dedup _time displayName actions | table _time displayName actions
| rex field=actions "hosts\W+(?P<Hosts>.*?)]" max_match=0
| table _time displayName Hosts | mvexpand Hosts | makemv delim="," Hosts | rex mode=sed field=Hosts "s/'/ /g"
| mvexpand Hosts
| stats dc(_time) as daysReported by displayName Hosts | where daysReported=1 | table displayName Hosts