Splunk Search

Hunk is not filtering files based on timestamp

mik_cox
Explorer

I have a Hunk installation that is successfully (albeit slowly) pulling data from an s3:// filesystem. However, I'm having problems getting Hunk to only search relevant directories in s3. I see the correct results when running a search over a specific time range in the Hunk UI, but Hunk is still searching over all files in Hadoop to do so which is slow and wasteful.

For instance, I have my data in directories in s3 that follow this format:
s3://my-bucket/data/appname/2016/08/09/22/appname_22_30.log
which would correspond to the logs from my app that were collected on August 9th, 2016 for the minute of 22:30.

I have correspondingly set up my provider with the following properties:

vix.input.1.et.format = yyyyMMddHHmm
vix.input.1.et.offset = 0
vix.input.1.et.regex = .*?/appname/(\d+)?/?(\d+)?/?(\d+)?/?(\d+)?.*_?(\d{2}).*?
vix.input.1.lt.format = yyyyMMddHHmm
vix.input.1.lt.offset = 60
vix.input.1.lt.regex = .*?/appname/(\d+)?/?(\d+)?/?(\d+)?/?(\d+)?.*_?(\d{2}).*?

When running searches, I've noticed in my search.log that I get lines like this...

DEBUG ERP.s3-emr -  VirtualIndex - File meets time heuristic path=s3://my-bucket/data/myapp/2016/08/02/11/myapp_11_40.log, search.et=1470009600, search.lt=1470268800, file.et=0, file.lt=9223372036854775807, file.mtime=1470766383
08-09-2016 20:24:02.879
DEBUG ERP.s3-emr -  VirtualIndex - File meets the search criteria. Will consider it, path=s3://my-bucket/data/myapp/2016/08/02/11/myapp_11_40.log

...which indicate to me that the regex isn't doing its job as file.et and file.lt are not set propertly.

Does anyone have any idea as to why this might be happening?

Thanks in advance!!

0 Karma
1 Solution

mik_cox
Explorer

Answering my own question:

My major problem was that I had put the following properties...

vix.input.1.et.format = yyyyMMddHHmm
vix.input.1.et.offset = 0
vix.input.1.et.regex = .*?/appname/(\d+)?/?(\d+)?/?(\d+)?/?(\d+)?.*_?(\d{2}).*?
vix.input.1.lt.format = yyyyMMddHHmm
vix.input.1.lt.offset = 60
vix.input.1.lt.regex = .*?/appname/(\d+)?/?(\d+)?/?(\d+)?/?(\d+)?.*_?(\d{2}).*?

...on the provider, NOT on the virtual index as they should've been (in indexes.conf). Setting these properties up through the Hunk web interface on the Virtual Index editing page would've configured this properly.

View solution in original post

mik_cox
Explorer

Answering my own question:

My major problem was that I had put the following properties...

vix.input.1.et.format = yyyyMMddHHmm
vix.input.1.et.offset = 0
vix.input.1.et.regex = .*?/appname/(\d+)?/?(\d+)?/?(\d+)?/?(\d+)?.*_?(\d{2}).*?
vix.input.1.lt.format = yyyyMMddHHmm
vix.input.1.lt.offset = 60
vix.input.1.lt.regex = .*?/appname/(\d+)?/?(\d+)?/?(\d+)?/?(\d+)?.*_?(\d{2}).*?

...on the provider, NOT on the virtual index as they should've been (in indexes.conf). Setting these properties up through the Hunk web interface on the Virtual Index editing page would've configured this properly.

Get Updates on the Splunk Community!

Stay Connected: Your Guide to May Tech Talks, Office Hours, and Webinars!

Take a look below to explore our upcoming Community Office Hours, Tech Talks, and Webinars this month. This ...

They're back! Join the SplunkTrust and MVP at .conf24

With our highly anticipated annual conference, .conf, comes the fez-wearers you can trust! The SplunkTrust, as ...

Enterprise Security Content Update (ESCU) | New Releases

Last month, the Splunk Threat Research Team had two releases of new security content via the Enterprise ...