Getting Data In

Index files in order of timestamp or record file timestamp as a field

leecaf
Explorer

I'm indexing a bunch of CSV files provided by an external vendor over ftp ( mapped or synched to my local drive ) there may be duplicate rows across different files. the requirement is to take the row from the file with the latest timestamp. I can achieve this by either:

a) ensuring that the order in which splunk indexes my data is in the same order of the file timstamps. can someone suggest how I can do this without having to rewrite in a script the entire 'scan directory for updated files' logic that splunk nicely provides?

b) Can I add an extra field 'fileTimeStamp'? how would I specify this into my props.conf?

c) lookup the file timestamps as a 'lookup' at search time. but if a file is newly updated at search time, but it has not been indexed yet, I may see misleading results.

suggestions please?

Tags (3)
0 Karma

mataharry
Communicator

No you cannot selectively ask splunk to monitor a part of a file, or the order of them.

A) the simple solution is a dedup in the events.
source=mypath/to/my/folder/* | dedup _raw

see http://docs.splunk.com/Documentation/Splunk/5.0.3/SearchReference/dedup

B ) No. the mod time of the file is not indexed. The closest you have is the _indextime (when the events is received at the indexer)

A solution is to index all and to use the timestamp of the events:

source=mypath/to/my/folder/* | stats latest(_raw) AS _raw by source

or the indextime

source=mypath/to/my/folder/* | eval oldtime=_time | eval _time=_indextime | stats latest(oldtime) AS oldtime latest(_raw) AS _raw by source

C) use the _indextime for the same purpose.

0 Karma
Get Updates on the Splunk Community!

Announcing Scheduled Export GA for Dashboard Studio

We're excited to announce the general availability of Scheduled Export for Dashboard Studio. Starting in ...

Extending Observability Content to Splunk Cloud

Watch Now!   In this Extending Observability Content to Splunk Cloud Tech Talk, you'll see how to leverage ...

More Control Over Your Monitoring Costs with Archived Metrics GA in US-AWS!

What if there was a way you could keep all the metrics data you need while saving on storage costs?This is now ...