Splunk Search

How would you retrieve the latest indexed set of data with non-unique records? Example provided.

mcrawford44
Communicator

Example data;
(This is one run of a DBX dump input to an index.)

ComputerName1, Application1, _time1
ComputerName1, Application2, _time1
ComputerName1, Application3, _time1
ComputerName1, Application4, _time1
ComputerName1, Application5, _time1
ComputerName2, Application1, _time2
ComputerName2, Application2, _time2
ComputerName2, Application3, _time2

It appears that a DBX dump of the query we are using results in a different '_time' index stamp for each ComputerName. Which... was unexpected behavior but may work to advantage.

When the DBX input runs again, there will be another set of similar data per machine, but the '_time' index stamp will be more recent in time.

Using 'dedup' compresses all the records into the latest single record.

Using '| stats values(ComputerName), latest(_time) by Application' works for a single dump. However any additional dumps would cause issues. For example; If I uninstall an application, this would still bring back the old application with the earlier '_time' index stamp.

Is there a way somehow to group these sets of results individually? The verbal wanted results would be along the lines of; "Return the latest snapshot of a computer, based on it's most recent indexing, which should not include any prior index runs."

This is an example of the data when continued inputs are run;

ComputerName1, Application1, _time1
ComputerName1, Application2, _time1
ComputerName1, Application3, _time1
ComputerName1, Application4, _time3
ComputerName1, Application5, _time3
ComputerName2, Application1, _time2
ComputerName2, Application2, _time2
ComputerName2, Application3, _time4

I want to return;

ComputerName1, Application4, _time3
ComputerName1, Application5, _time3
ComputerName2, Application3, _time4

As they are the latest run of a set, per machine.

Edit 3/26/2014;

Using this query gets the correct data, but the application name is turned into a multikey value;

index=foo name=bar | transaction name _time | dedup name | table name, product, scantime, _time

So the table output is;

name, (product1, product2, product3), _time

Tags (1)

dcarmack_splunk
Splunk Employee
Splunk Employee

how often does the DBX job run and how far back historically do you run your report over?

0 Karma

somesoni2
Revered Legend

Try this

index=YourIndex sourcetype=YourSourceType source=YourSource [search index=YourIndex sourcetype=YourSourceType source=YourSource | stats first(_time) as _time by ComputerName | format]

The subsearch get the latest _time value per ComputerName and adds as filter in the main search to just get the records with the latest _time value per ComputerName.

mcrawford44
Communicator

Unfortunately that query doesn't appear to do anything with the data. The results that come back are basically the raw results.

index=foo name=bar [search index=foo name=bar | stats first(_time) as _time by name] | stats values(_time), count by name

Brings back;

bar (1395429530.000,1395700278.000) 2653

So it's still returning every record.

I've been playing with the 'transaction' command and I feel I'm close, but any assistance is welcome.

0 Karma

otaci
Explorer

Did you get anywhere with this? I am having the same problem.

0 Karma

wpreston
Motivator

If you dedup on ComputerName and Applicaion, that should bring you your most recent record. Splunk starts its search with the most recent records first, so deduping on fields that are common amongst all of the events should get you only the most recent event for each combination of ComputerName and Application.

... your search ... | dedup ComputerName Application | ... rest of your search ...

0 Karma

mcrawford44
Communicator

This will bring back records for applications no longer valid. Say I have application 'ABC' installed. It's in this database as a record. I uninstall that application, then it gets removed from the database, but remains in the index.

Using dedup in any raw form will still return that application since it's unique. I only want the latest time set for each machine.

I'm trying to bring back the current latest indexed set without any historical data. I've edited the question to provide how the data looks at that point.

0 Karma
Get Updates on the Splunk Community!

Introducing the 2024 SplunkTrust!

Hello, Splunk Community! We are beyond thrilled to announce our newest group of SplunkTrust members!  The ...

Introducing the 2024 Splunk MVPs!

We are excited to announce the 2024 cohort of the Splunk MVP program. Splunk MVPs are passionate members of ...

Splunk Custom Visualizations App End of Life

The Splunk Custom Visualizations apps End of Life for SimpleXML will reach end of support on Dec 21, 2024, ...