Splunk Search

How to search for newly added servers by comparing today's log with the previous week's logs?

SathyaNarayanan
Path Finder

Hi,

I have a file with hostname. I need to find out the newly added server in it. When I use the set diff command, it shows the difference between today's log & the previous week's log. But the problem is, the difference may contain a newly added server or decommissioned server. How to find out the newly added server alone?

Thanks in advance

DalJeanis
SplunkTrust
SplunkTrust

One way is to use a metasearch. This checks all indexes for all hosts and finds the earliest and latest times that _host is on any index, telling you which indexes it is on. I assumed that you were looking for the hosts, not necessarily when the host first reported to a specific index. In this case, the code is set for 86400*30, i.e. hosts that first reported in the last 30 days.

metasearch index=* host=* 
| stats earliest(_time) as firsttime, latest(_time) as lasttime values(index) as index by host 
| addinfo  
| eval testtime=info_search_time-86400*30 
| where firsttime>testtime
| eval firstseen=strftime(firsttime,"%Y-%m-%d %H:%M:%S"),lastseen=strftime(lasttime,"%Y-%m-%d %H:%M:%S"),testseen=strftime(testtime,"%Y-%m-%d %H:%M:%S") 
| table host index firstseen lastseen testseen 

A second way, if you have weekly lists, is to use a left join from the current list to the prior list. When the prior list does not return anything, then the host is presumably new.

(this weeks list)
| table host foo bar TheDate
| join type=left host [|inputlookup hostlist | table host foo bar TheDate | rename TheDate as PriorDate]
| where isnull(PriorDate)

The second method could also be used to output a host list into a map command, to do more extensive reporting on new hosts.

0 Karma

dwaddle
SplunkTrust
SplunkTrust

One approach is to use a lookup to hold the state of servers you have seen before. I'm going to build up to the solution, so read along carefully. Let's imagine you can do this:

index=myindex | stats min(_time) as oldest max(_time) as newest by host

If you run this over a (say) 7 day period, then you can get an idea of hosts that are both "new" and "missing" in that time period. Hosts will be in one of three states:

  1. The value of oldest will be at the beginning of the 7 day period, the value of newest will be at the end. This means the host was (in all likelihood) sending events the whole 7 days and is neither new nor old.
  2. The value of oldest is not the beginning of the 7 day period, and the value of newest is at the end. This means the host started sending data sometime in the 7 day period, and has been sending since and is therefore "new"
  3. The value of oldest is at the beginning of the 7 day period, and the value of newest is sometime before the end of the 7 day period. This means the host was sending data at the beginning and then stopped and is therefore "old" and/or "missing"

This is good but not optimal. So let's make a slightly different variant:

index=myindex | stats min(_time) as oldest, max(_time) as newest by host 
| outputlookup myindex_host_status.csv

This doesn't provide any new logic, but merely persists the data made by the search out to a lookup file. Now, let's use that lookup file to provide context in a slightly more complex search:

index=myindex | stats min(_time) as oldest, max(_time) as newest by host
| inputlookup myindex_host_status.csv
| stats min(oldest) as oldest, max(newest) as newest by host
| outputlookup myindex_host_status.csv

We can now take this search and run it every day over the past 24 hours. Or every hour over the past hour, or whatever. It winds up keeping for us - over an infinitely long period of time - the first timestamp and last timestamp for a given host. It keeps this even if the original data ages off. The scheduled maintenance search runs and maintains the lookup holding state for us. Now, we can use that state:

| inputlookup myindex_host_status.csv | where oldest > now() - (86400 * 3)

Giving us a list of host who first sent data within the past three days. Or

| inputlookup myindex_host_status.csv | where newest < now() - (86400 * 7)

Giving us a list of every host that has not sent in any new data in the past 7 days.

The trick here is we're using lookups to hold the long term state, and taking advantage of how splunk stores _time as an integer value that increases as time goes on. Every day is exactly 86,400 seconds, and bigger numbers are higher times. Simple mathematical functions like min() and max() work to compute earliest and latest times.

SathyaNarayanan
Path Finder

thanks for ur time & help. But here the host does not send data to the splunk,
Remedy tool is sending data to splunk, in that hostname is an field in it. Now i need to find the newly added host name & decommissioned or deleted host name from the remedy log.

0 Karma

dwaddle
SplunkTrust
SplunkTrust

You should be able to adapt this to work in the exact same way.

0 Karma

sk314
Builder

Have you tried using set diff between last two weeks data and last week's data?

0 Karma

SathyaNarayanan
Path Finder

ya, i have tried it. I have mentioned in my question itself

0 Karma

sk314
Builder

I meant a slightly different diff from what you mentioned. I meant diff between last TWO weeks and just last week. Kind of akin to (AUB)-B in set notations. A = current week's hosts, B = last week's hosts, A U B = hosts in the last two weeks. (AUB) - B = hosts in A that weren't present in B. Hope this works out for you.

0 Karma
Get Updates on the Splunk Community!

Index This | I am a number, but when you add ‘G’ to me, I go away. What number am I?

March 2024 Edition Hayyy Splunk Education Enthusiasts and the Eternally Curious!  We’re back with another ...

What’s New in Splunk App for PCI Compliance 5.3.1?

The Splunk App for PCI Compliance allows customers to extend the power of their existing Splunk solution with ...

Extending Observability Content to Splunk Cloud

Register to join us !   In this Extending Observability Content to Splunk Cloud Tech Talk, you'll see how to ...