In our environment we have data files that contain earliest and latest time expressed in millisecond granularity, see example below
/data/file-1404927949762-1404928067686.csv
We tried to get Hunk to properly recognize these timestamps and improve search performance (by time based partition pruning) however when we add the following configs nothing gets returned
[MyVix]
...
vix.input.1.et.regex = /data/file-(\d+)-
vix.input.1.et.format = epoch
The "epoch" time format expects that the captured value is in seconds since epoch, which is basically telling Hunk that the data is from way in the future. To fix that, you can just capture the second granularity part of the epoch time, the 10 most significant digits, as follows
[MyVix]
...
vix.input.1.et.regex = /data/file-(\d{10})\d+-
vix.input.1.et.format = epoch
One caveat: if you're using this technique to capture latest time you must also set the offset to 1 so that the latest time is rounded up as latest time is exclusive. For example
[MyVix]
...
vix.input.1.lt.regex = /data/file-\d+-(\d{10})\d+\.csv
vix.input.1.lt.format = epoch
vix.input.1.lt.offset = 1
The "epoch" time format expects that the captured value is in seconds since epoch, which is basically telling Hunk that the data is from way in the future. To fix that, you can just capture the second granularity part of the epoch time, the 10 most significant digits, as follows
[MyVix]
...
vix.input.1.et.regex = /data/file-(\d{10})\d+-
vix.input.1.et.format = epoch
One caveat: if you're using this technique to capture latest time you must also set the offset to 1 so that the latest time is rounded up as latest time is exclusive. For example
[MyVix]
...
vix.input.1.lt.regex = /data/file-\d+-(\d{10})\d+\.csv
vix.input.1.lt.format = epoch
vix.input.1.lt.offset = 1