We currently are using MapR and HUNK to index files of the structure:
/user/mapr/(sourcetype)/(year)/(month)/(day)/(hour)/xyz.log
Lets say our sourcetype is foo.
With HUNK, you must specify regex to extract the date and time for the searches to run optimally and not search all files and directories before ending the search.
So for the regex I have:
/user/mapr/foo/(\d+)/(\d+)/(\d+)/(\d+)/.*
This works, but the problem is we will have to create a different virtual index for every different sourcetype.
I have tried /user/mapr//(\d+)/(\d+)/(\d+)/(\d+)/.
But that doesn't work, the search goes through all the files and subdirectories.
So I need something to replace foo with and * doesnt work. Putting (/w+) there doesn't work either as it extracts the "foo" and tries to use it as part of the string for the time lookup.
I have attached the indexes.conf file (minus the provider) below:
[mapr1]
vix.input.1.accept =
vix.input.1.et.format = yyyyMMddHH
vix.input.1.et.regex = /user/mapr/*/(\d+)/(\d+)/(\d+)/(\d+)/.*
vix.input.1.lt.format = yyyyMMddHH
vix.input.1.lt.offset = 3600
vix.input.1.lt.regex = /user/mapr/*/(\d+)/(\d+)/(\d+)/(\d+)/.*
vix.input.1.path = /user/mapr/${sourcetype}/...
vix.provider = maproly
The solution ended up being:
/user/mapr/.?/(\d+)/(\d+)/(\d+)/(\d+)/.
Give this a try
/user/mapr/[^\/]+/(\d+)/(\d+)/(\d+)/(\d+)/.
OR
/user/mapr/\w+/(\d+)/(\d+)/(\d+)/(\d+)/.