I have the following types of events in FIX format. This is what they look like in vi or emacs:
For the sake of simplicity, I have discarded the rest of the FIX message for this example. Notice, the ^A as the delimiter between "fields".
After indexing the data in Splunk, the ^A becomes hex \x1 within Splunk Web and Splunk CLI.
My props.conf looks like this:
My transforms.conf looks like this:
I have tried \\x1, \x1, and \\x01. None of them extract the 4 "fields" in the example. What should the hex value be for the DELIMS to properly break the fields? Is there is a limitation where DELIMS can only take one character? I also tried using "\\", but that did not create any field extraction.
Yes, splunk will replace the unprintable character with their C-style hex notation before indexing. That can be quite annoying, but then again, so is trying to search for unprintable characters. If your curious, you can see a table of these conversions on the Wikipedia ASCII page, search down the page for the "Start of Header" character.
It seems like you have a fields inside of a field thing going on here, right?
You have fields delimited by a pipe (
I'm guessing it would make the most sense to first extract the outer set of fields first using
So, perhaps you would end up with something like this:
I think this should work. This does seem like a complicated scenario.
If the number of subfields is not constant (4), then you could use a multi-value field extraction like this: (That regex should work, it took me a few tries, but it seems to be best solution I could come up with)
Another possible option (and I don't know the FIX format at all, so this may not work). If the 8 in
Another thought (which may make all of the above options simpler) would be to add a
Also, using a punctuation character like a comma also has the advantage of improving the way terms are segmented in your index which will let your search on more of these embedded fields more efficiently. For example, in your example event, you can search for
One more option. Email Glenn and take a look at a custom search command he is using to handle FIX log processing. See his post here:
If you're just trying to substitute the SOH character I was FINALLY able to do it after spending a ton of time and it's a very simple solution. I may be reiterating what Lowell said but hopefully this example saves a ton of time for someone else. Additionally the solution handles it at index time and not at search time. So it makes it easier to read for users who don't realize there's a SOH delimiter to deal with:
edit $SPLUNK_HOME/etc/system/local/props.conf (on the indexer box if your search head and indexer are 2 different boxes) and add the following:
Then restart Splunk. Now any NEW FIX data will have the SOH character replaced with a space character. This will NOT affect existing, indexed FIX data in Splunk already.
Note: Of course, the "myfixsourcetype" needs to be replaced with the actual sourcetype name that your FIX data is coming in as otherwise it has no way of identifying your data in order to apply the sed command to. See props.conf spec for other data identifiers you can use (ie. host or source).
FYI - I'm running Splunk on a RedHat Linux box.
REFERENCES: FIX protocol field delimiter