Splunk Search

Hunk is not filtering files based on timestamp

mik_cox
Explorer

I have a Hunk installation that is successfully (albeit slowly) pulling data from an s3:// filesystem. However, I'm having problems getting Hunk to only search relevant directories in s3. I see the correct results when running a search over a specific time range in the Hunk UI, but Hunk is still searching over all files in Hadoop to do so which is slow and wasteful.

For instance, I have my data in directories in s3 that follow this format:
s3://my-bucket/data/appname/2016/08/09/22/appname_22_30.log
which would correspond to the logs from my app that were collected on August 9th, 2016 for the minute of 22:30.

I have correspondingly set up my provider with the following properties:

vix.input.1.et.format = yyyyMMddHHmm
vix.input.1.et.offset = 0
vix.input.1.et.regex = .*?/appname/(\d+)?/?(\d+)?/?(\d+)?/?(\d+)?.*_?(\d{2}).*?
vix.input.1.lt.format = yyyyMMddHHmm
vix.input.1.lt.offset = 60
vix.input.1.lt.regex = .*?/appname/(\d+)?/?(\d+)?/?(\d+)?/?(\d+)?.*_?(\d{2}).*?

When running searches, I've noticed in my search.log that I get lines like this...

DEBUG ERP.s3-emr -  VirtualIndex - File meets time heuristic path=s3://my-bucket/data/myapp/2016/08/02/11/myapp_11_40.log, search.et=1470009600, search.lt=1470268800, file.et=0, file.lt=9223372036854775807, file.mtime=1470766383
08-09-2016 20:24:02.879
DEBUG ERP.s3-emr -  VirtualIndex - File meets the search criteria. Will consider it, path=s3://my-bucket/data/myapp/2016/08/02/11/myapp_11_40.log

...which indicate to me that the regex isn't doing its job as file.et and file.lt are not set propertly.

Does anyone have any idea as to why this might be happening?

Thanks in advance!!

0 Karma
1 Solution

mik_cox
Explorer

Answering my own question:

My major problem was that I had put the following properties...

vix.input.1.et.format = yyyyMMddHHmm
vix.input.1.et.offset = 0
vix.input.1.et.regex = .*?/appname/(\d+)?/?(\d+)?/?(\d+)?/?(\d+)?.*_?(\d{2}).*?
vix.input.1.lt.format = yyyyMMddHHmm
vix.input.1.lt.offset = 60
vix.input.1.lt.regex = .*?/appname/(\d+)?/?(\d+)?/?(\d+)?/?(\d+)?.*_?(\d{2}).*?

...on the provider, NOT on the virtual index as they should've been (in indexes.conf). Setting these properties up through the Hunk web interface on the Virtual Index editing page would've configured this properly.

View solution in original post

mik_cox
Explorer

Answering my own question:

My major problem was that I had put the following properties...

vix.input.1.et.format = yyyyMMddHHmm
vix.input.1.et.offset = 0
vix.input.1.et.regex = .*?/appname/(\d+)?/?(\d+)?/?(\d+)?/?(\d+)?.*_?(\d{2}).*?
vix.input.1.lt.format = yyyyMMddHHmm
vix.input.1.lt.offset = 60
vix.input.1.lt.regex = .*?/appname/(\d+)?/?(\d+)?/?(\d+)?/?(\d+)?.*_?(\d{2}).*?

...on the provider, NOT on the virtual index as they should've been (in indexes.conf). Setting these properties up through the Hunk web interface on the Virtual Index editing page would've configured this properly.

Get Updates on the Splunk Community!

Announcing Scheduled Export GA for Dashboard Studio

We're excited to announce the general availability of Scheduled Export for Dashboard Studio. Starting in ...

Extending Observability Content to Splunk Cloud

Watch Now!   In this Extending Observability Content to Splunk Cloud Tech Talk, you'll see how to leverage ...

More Control Over Your Monitoring Costs with Archived Metrics GA in US-AWS!

What if there was a way you could keep all the metrics data you need while saving on storage costs?This is now ...