I have created a regex search that can search strings in a field, but it is slow. Is there a way that I can search a string without having to use regex for specific urlencoded data?
For instance. I can do:
*| regex "\%[0-9a-fA-F]{2}"
night and day, but a combination of that regex is slow.
I know specific strings of urlencoded data that I want to match on. When I do a search on that data (for instance "%2F%63%67%69%2D%62%69%6E%2F" or "\%2F\%63\%67\%69\%2D\%62\%69\%6E\%2F", I see in the inspector that my search has been changed. The percent signs are now spaces like the following: "[ AND 2d 2f 62 63 67 69 6e index::main ]" ... It looks like my urlencoded data is being split on the percent sign, sorted, and then searched.... How do I get it to treat it like a string and leave it alone and search for my explicit string?
Thanks...
The answer is to do CASE("%2F%63%67%69%2D%62%69%6E%2F") ... then splunk stops doing odd things with the text and matches strings in fields quickly. I think this is a hack though. It is also blazingly fast to use in a search compared to using the | regex ....
I'm guessing you meant TERM()?
CASE() is often a speed killer on a search, so use it with caution. Unless the regex is looking at the url encoding format specifically, it's not the best plan really.
Look at the docs for Functions for eval and where. Second to the last entry in the table is urldecode(). The description is:
This function takes one URL string argument X and returns the unescaped or decoded URL string.
The example is:
... | eval n=urldecode("http%3A%2F%2Fwww.splunk.com%2Fdownload%3Fr%3Dheader")
The result would be that n is "http://www.splunk.com/download?r=header".
For your use, insert
… | eval newfield=urldecode(yourfield) | …
and then do your regex.