Splunk Search

How to measure an increase in number of error

mataharry
Communicator

I am looking for the best method to highlight host with errors, by comparing them to the previous days.

by example I run this search every day :

index="lsfgbc" OR index="lsficeng" process="lsf-sbatchd-audit" numautofsdefunct 
| `autofsversion` 
| table host, index, numstucksbatchd, numautofsdefunct, autofs_version 
| sort index, host

I tried this but using, earliest=-48h latest=-24h returned an empty result.

| set diff [ search index="lsfgbc" OR index="lsficeng" process="lsf-sbatchd-audit" numautofdefunct 
               earliest=-48h latest=-24h 
             | fields + host | fields - _time _raw  ]
           [ search index="lsfgbc" OR index="lsficeng" process="lsf-sbatchd-audit" numautofsdefunct 
               earliest=-24h latest=now 
             | fields + host | fields - _time _raw ]

How to compare to the previous day, or the last month ?

Tags (1)
0 Karma
1 Solution

yannK
Splunk Employee
Splunk Employee

An easy approach is to compare count per day. by example to see the number of error per hosts over a week, and generate a nice graph

error earliest=-7d@d | timechart span=1d count by host useother=0

You also can use summary indexing to save your results every day instead of recalculate them every time : http://www.splunk.com/base/Documentation/4.1.7/Knowledge/Usesummaryindexing with a scheduled search, running every day at midnight (+15 minutes to make sure that all your data is available) example my saved search "summary_error_daily" with the "si" version of the timechart and a more precise detail (per hour)

error earliest=-1d@d latest=@d | sitimechart span=1h count by host useother=0

then call the results with

index=summary name=summary_error_daily | timechart span=2d count by host

Another method is to use the alerting, run a stats count by host search, the use the condition with the "if number of host rise by 1" let it run one day (to store the first values), then it will fire email alerts see http://www.splunk.com/base/Documentation/latest/Admin/HowdoesalertingworkinSplunk

View solution in original post

dwaynelee
New Member

I tried the raw log search about but got this error: Error in 'timechart' command: When you specify a split-by field, only single functions applied to a non-wildcarded data field are allowed.

I would like to get the list of hosts each day, how would I do that?

0 Karma

yannK
Splunk Employee
Splunk Employee

can you provide the search you used ?

0 Karma

yannK
Splunk Employee
Splunk Employee

An easy approach is to compare count per day. by example to see the number of error per hosts over a week, and generate a nice graph

error earliest=-7d@d | timechart span=1d count by host useother=0

You also can use summary indexing to save your results every day instead of recalculate them every time : http://www.splunk.com/base/Documentation/4.1.7/Knowledge/Usesummaryindexing with a scheduled search, running every day at midnight (+15 minutes to make sure that all your data is available) example my saved search "summary_error_daily" with the "si" version of the timechart and a more precise detail (per hour)

error earliest=-1d@d latest=@d | sitimechart span=1h count by host useother=0

then call the results with

index=summary name=summary_error_daily | timechart span=2d count by host

Another method is to use the alerting, run a stats count by host search, the use the condition with the "if number of host rise by 1" let it run one day (to store the first values), then it will fire email alerts see http://www.splunk.com/base/Documentation/latest/Admin/HowdoesalertingworkinSplunk

David
Splunk Employee
Splunk Employee

The way I would recommend doing this is by setting up a summary index to look at the number of events over the last day (-1d@d) and then comparing the last 24 hours to the recent days. That will likely better than searching the raw logs, and solve the problem itself.

However, doing it based on the raw logs you can do:

index="lsfgbc" OR index="lsficeng" process="lsf-sbatchd-audit" numautofsdefunct 
        | autofsversion 
        | table host, index, numstucksbatchd, numautofsdefunct, autofs_version
        | timechart span=1d sum(numstucksbatchd) as sumnumstucksbatchd, sum(numautofsdefunct) as numautofsdefunct by host
        | delta sumstucksbatchd as diffsumstucksbatchd
        | delta sumnumautofsdefunct as diffsumnumautofsdefunct

Timechart should summarize the events to a day (you might need to play with whether you want sum, avg or first, depending on the contents of the logs) and then delta will show you the change in values since the previous day. I pulled off a couple of the fields for the timechart, just because it can get overwhelming and it sounds like what you want, but you can toss them back in as well.

Let me know if that all makes sense.

Get Updates on the Splunk Community!

Stay Connected: Your Guide to May Tech Talks, Office Hours, and Webinars!

Take a look below to explore our upcoming Community Office Hours, Tech Talks, and Webinars this month. This ...

They're back! Join the SplunkTrust and MVP at .conf24

With our highly anticipated annual conference, .conf, comes the fez-wearers you can trust! The SplunkTrust, as ...

Enterprise Security Content Update (ESCU) | New Releases

Last month, the Splunk Threat Research Team had two releases of new security content via the Enterprise ...