Splunk Search

Is it possible to define a lookup to act as a Hunk input filter at search-time?

rhinomike
Explorer

Hi there,

I have been testing Hunk and noticed that due to the lack of pre-indexing, it relies quite a lot on proper Regexes and other sorts of filters to speed up searches.

An example of this is the use of vix.input.1.path and vix.input.1.et.* and vix.input.1.lt.* settings as illustrated below:

[hunktest]
vix.input.1.accept = \.gz$
vix.input.1.path = /test/logs/${environmentid}/...
vix.provider = test-hadoop-cluster
vix.input.1.et.format = yyyyMMddHHmmssSSSS
vix.input.1.et.offset = -3600
vix.input.1.et.regex = .*/logs/\d+/data\.(\d+).*
vix.input.1.et.timezone = GMT
vix.input.1.lt.format = yyyyMMddHHmmssSSSS
vix.input.1.lt.offset = 0
vix.input.1.lt.regex = .*/logs/\d+/data\.(\d+).*
vix.input.1.lt.timezone = GMT

While the above works great, I am facing a small complication. ${environmentid} is a numerical value that has very little meaning to the people who would be using the search heads.

I know I can use a lookup and I have configured one:

[preprocess-gzip]
LOOKUP-env_to_ids = environment_name environmentid OUTPUTNEW environment_name

I also tested the lookup and it seems it is working:

When I perform a search like index=hunktest environmentid=123 I can navigate through the matches and see the environment_name field has been created and matches the CSV contents. I can also see that just one subfolder (123) has raised matches.

However, if I try to run index=hunktest environmentname=Test or index=hunktest environmentname="Test", upon inspecting the search.log, it seems like Hunk crawled the whole HDFS store instead of crawling just /logs/123/

Is it possible to define a lookup so that it act as a filter on search time?

0 Karma

Ledion_Bitincka
Splunk Employee
Splunk Employee

While lookups help in forward translating the values, when we perform reverse lookups the search gets translated into (environmentid=123 OR environmentname=Test) , which unfortunately means that the search based partition pruning cannot help. We'll take that in as an enhancement request and do some research on how we can solve this problem. In the mean time one workaround that I think of would be using form searches to aid users in picking up an environment (show a user friendly string, but use the id to populate the search)

Get Updates on the Splunk Community!

Adoption of RUM and APM at Splunk

    Unleash the power of Splunk Observability   Watch Now In this can't miss Tech Talk! The Splunk Growth ...

Routing logs with Splunk OTel Collector for Kubernetes

The Splunk Distribution of the OpenTelemetry (OTel) Collector is a product that provides a way to ingest ...

Welcome to the Splunk Community!

(view in My Videos) We're so glad you're here! The Splunk Community is place to connect, learn, give back, and ...