I have a external file where each line has a variable number of text strings and I am trying to use this as an input to a Splunk search across each events raw data. The text strings in the input file are not in order and each string must be present in each event for the result to be positive.
Entering a sample search manually might look like this and returns all events that contain all of the below in any order:
source="logfile.log" "sometext.I.want" "this.is.the.second.lot.of.text"
If I try and something like:
source="logfile.log" [|inputlookup ErrorStrings1.csv | fields + String1,String2]
With an input text file like:
String1,String2
sometext.I.want,this.is.the.second.lot.of.text
The search returns no results because Splunk is trying to match events with fields String1 and String2 yet they don't exist and its not possible extract them as the data is consistently structured.
The next avenue I was going to go down was developing a scripted lookup against each event however this could be resource intensive with 500+ rows in the external text lookup file and a high rate of events being indexed.
Any ideas ?
To see exactly what the subsearch returns to the outer search, run the subsearch by itself and append | format
at the end. format
is called implicitly by a subsearch and formats search output in a way that can be used by the search command.
By default Splunk will behave like you've already noticed - if you choose to get the fields String1 and String2 at the end of the subsearch, Splunk will match on these exact fields rather than matching them as freetext. However there is the return
operator that you could use to have Splunk behave like you want instead.
source="logfile.log" [|inputlookup ErrorStrings1.csv | return $String1,$String2]
Then just add as many StringX
fields as you want, return
won't throw an error if you supply a non-existent field so if one particular line in your CSV file has 8 fields but another one only has one, you can still supply String1 ... String8 to be returned without any problems.
To see exactly what the subsearch returns to the outer search, run the subsearch by itself and append | format
at the end. format
is called implicitly by a subsearch and formats search output in a way that can be used by the search command.
By default Splunk will behave like you've already noticed - if you choose to get the fields String1 and String2 at the end of the subsearch, Splunk will match on these exact fields rather than matching them as freetext. However there is the return
operator that you could use to have Splunk behave like you want instead.
source="logfile.log" [|inputlookup ErrorStrings1.csv | return $String1,$String2]
Then just add as many StringX
fields as you want, return
won't throw an error if you supply a non-existent field so if one particular line in your CSV file has 8 fields but another one only has one, you can still supply String1 ... String8 to be returned without any problems.
That works really well and faster than I expected. I'm going to investigate if there's a way to apply similar logic using Props & Transforms so that I can tag events as known errors ie. KnownErrors=Y and KnownErrorType=blabla.
Thanks for your help with this, its much appreciated.
Many thanks Ayn for the answer, it really helped me.
Your query didn't work with me with first trial, then I found that return
command now has an argument for number of returned results.
so the working query for me is:
source="logfile.log" [|inputlookup ErrorStrings1.csv | return 325 $String1,$String2]
Where 325 is the number of entries in csv file.
Regards