Splunk Search

Using DELIMS to extract FIX data

ndoshi
Splunk Employee
Splunk Employee

I have the following types of events in FIX format. This is what they look like in vi or emacs:

M|219620|0|i|I|20100506-16:15:53.443|463|8=FIX.4.4^A9=440^A35=i^A50=FXSpot
M|219621|0|i|I|20100506-16:15:53.444|461|8=FIX.4.4^A9=438^A35=i^A50=FXSpot

For the sake of simplicity, I have discarded the rest of the FIX message for this example. Notice, the ^A as the delimiter between "fields".

After indexing the data in Splunk, the ^A becomes hex \x1 within Splunk Web and Splunk CLI.

M|219620|0|i|I|20100506-16:15:53.443|463|8=FIX.4.4\x19=440\x135=i\x150=FXSpot
M|219621|0|i|I|20100506-16:15:53.444|461|8=FIX.4.4\x19=438\x135=i\x150=FXSpot

My props.conf looks like this:

[FIX]
SHOULD_LINEMERGE = false
KV_MODE = none
REPORT-all = get_all_fields

My transforms.conf looks like this:

[get_all_fields]
DELIMS="\\x1"
FIELDS = "a", "b", "c", "d"

I have tried \\x1, \x1, and \\x01. None of them extract the 4 "fields" in the example. What should the hex value be for the DELIMS to properly break the fields? Is there is a limitation where DELIMS can only take one character? I also tried using "\\", but that did not create any field extraction.

Tags (2)
0 Karma

gregbujak
Path Finder

Splunk 6, FIX 4.2

Another approach is to use the key value pair extractions defined in transforms.conf.

(\d+)=((?:(?!\x01).)+)

The short of it is that its using negative lookahead to not match on \x01.

To register this extraction, following: link text


props.conf

[fix]

REPORT-fields = fixkv


transforms.conf

[my_fields]

REGEX = (\d+)=((?:(?!\x01).)+)

FORMAT = $1::$2


Hope this helps someone. If anyone has suggestions on how to make this one more efficient, please feel free to add.

landen99
Motivator

shouldn't the transforms.conf lead with [fixkv] ?

0 Karma

vcarbona
Path Finder

If you're just trying to substitute the SOH character I was FINALLY able to do it after spending a ton of time and it's a very simple solution. I may be reiterating what Lowell said but hopefully this example saves a ton of time for someone else. Additionally the solution handles it at index time and not at search time. So it makes it easier to read for users who don't realize there's a SOH delimiter to deal with:

edit $SPLUNK_HOME/etc/system/local/props.conf (on the indexer box if your search head and indexer are 2 different boxes) and add the following:

[myfixsourcetype]

SEDCMD-stripsoh = s/\x01/ /g

Then restart Splunk. Now any NEW FIX data will have the SOH character replaced with a space character. This will NOT affect existing, indexed FIX data in Splunk already.

Note: Of course, the "myfixsourcetype" needs to be replaced with the actual sourcetype name that your FIX data is coming in as otherwise it has no way of identifying your data in order to apply the sed command to. See props.conf spec for other data identifiers you can use (ie. host or source).

FYI - I'm running Splunk on a RedHat Linux box.

REFERENCES:
FIX protocol field delimiter

Lowell
Super Champion

Yes, splunk will replace the unprintable character with their C-style hex notation before indexing. That can be quite annoying, but then again, so is trying to search for unprintable characters. If your curious, you can see a table of these conversions on the Wikipedia ASCII page, search down the page for the "Start of Header" character.

It seems like you have a fields inside of a field thing going on here, right?

You have fields delimited by a pipe (|), and then the 8th field (at least in your given example) has and additional delimited field. I'm not sure how splunk handles that exactly. If you simply setup your delimiter as the ^A (or \x1) then your first field would contain: M|219620|0|i|I|20100506-16:15:53.443|463|8=FIX.4.4, when you probably only want it to contain 8=FIX.4.4. So simply getting your delimiter set properly isn't going to fully work.

I'm guessing it would make the most sense to first extract the outer set of fields first using DELIMS="|" and then, setup a secondary field extract to pull out your embedded fields.

So, perhaps you would end up with something like this:

props.conf:

[FIX]
SHOULD_LINEMERGE = false
KV_MODE = none
REPORT-outer_fields = get_outer_fields, get_inner_fields

transforms.conf:

[get_outer_fields]
DELIMS="|"
FIELDS = "f1", "f2", "f3", "f4", "f5", "_f6", "f7", "inner_fields"

[get_inner_fields]
REGEX = (?:^|\\x1) (?<a>.+)\\x1(?<b>.+)\\x1(?<c>.+)\\x1(?<d>.+)$
SOURCE_KEY = inner_fields

I think this should work. This does seem like a complicated scenario.

If the number of subfields is not constant (4), then you could use a multi-value field extraction like this: (That regex should work, it took me a few tries, but it seems to be best solution I could come up with)

[get_inner_fields]
REGEX= (?=^|\\x1)(?:\\x1)?(?<my_fields>.+?)(?:\\x1)?(?=$|\\x1)
SOURCE_KEY = inner_fields
MV_ADD = True

Another possible option (and I don't know the FIX format at all, so this may not work). If the 8 in 8=FIX.4.4 means something like 'fix_version_number', you could just write a bunch of extracts that use the leading number of map to different field names. So for example of "8", you could add something like this to your props file:

EXTRACT-fix_field_8 = (?:\||\\x1|^)8=(?<fix_version_number>.*?)(?:\||\\x1|$)


Another thought (which may make all of the above options simpler) would be to add a SEDCMD to your soucetype to change all of the ^A characters into something more useful at index time. Maybe something like a comma? (You would probably want to find a character or sequence of characters not already being used in your events)

Also, using a punctuation character like a comma also has the advantage of improving the way terms are segmented in your index which will let your search on more of these embedded fields more efficiently. For example, in your example event, you can search for "8=FIX.4.4", but you can't search for "50=FXSpot" because it's would be stored in the index as "150=FXSpot", you would have to search with "*50=FXSpot" instead. Using a better punctuation character works around this problem.


One more option. Email Glenn and take a look at a custom search command he is using to handle FIX log processing. See his post here:

Glenn
Builder
0 Karma

Lowell
Super Champion

Thanks Glenn. It's certainly possible to get a custom search script to add fields (it's pretty easy from a pure programmatic perspective), but your right in saying that rex (or kv) could be used after translatefix. Thanks again for jumping in. 😉

0 Karma

Glenn
Builder

I haven't yet managed to upload my "translatefix" custom command as an add-on to Splunkbase, but I have sent the useful contents directly to ndoshi. Hopefully it will do the trick - it should replace all \x01 with a space and also translate a large number of FIX encoded fields and values into plain english. What it won't do is actually extract any fields in Splunk land - if that is required I guess you'll need to pipe your results to translatefix, and then pipe this to rex (or similar).

0 Karma

ndoshi
Splunk Employee
Splunk Employee

Unfortunately, I can't change the log entry at index time to substitute the delimiter with something more manageable. Bob Fox provided me an answer to place in props.conf. Use:
EXTRACT-myfields = (?.)\x01(?.)\x01(?.)\x01(?.)

This means the ^A character will be represented by \x01. It seems as if the rule is to use \xnn for any HEX character where nn represents the HEX code.

0 Karma

Lowell
Super Champion

No you were clear on that point. What I'm trying to tell you is that your issue where splunk will not accept the literal "\x1" as a delimiter doesn't really matter because it will not work the way you want anyways (based on the 2nd group of sample events you provided.) Using a delimiter-based field extraction only works if your entire event is delimited by the same character, which is not the case for your events. Try adding SEDCMD = /\x01/;/g to your props, feed some events in, then set DELIMS=";" and see what I mean. Your first field (a), will contain leading junk it its' value.

0 Karma

ndoshi
Splunk Employee
Splunk Employee

I should have been more clear. The original example log is not what you can use for your regex as it contains unprintable characters (^A) that Splunk turns into \x1 in the index.

What I really wanted to do was figure out how to use \x1 as a delimiter. It may be that DELIMS can only have 1 character so that would not work. I tried the following:

EXTRACT-myfields = (?.)\x1(?.)\x1(?.)\x1(?.)

This works fine with online regex testers (also used +), but it does not work here. This is an unprintable character that needs to be in the regex. I do not know what that is for ^A.

0 Karma
Get Updates on the Splunk Community!

Extending Observability Content to Splunk Cloud

Register to join us !   In this Extending Observability Content to Splunk Cloud Tech Talk, you'll see how to ...

What's new in Splunk Cloud Platform 9.1.2312?

Hi Splunky people! We are excited to share the newest updates in Splunk Cloud Platform 9.1.2312! Analysts can ...

What’s New in Splunk Security Essentials 3.8.0?

Splunk Security Essentials (SSE) is an app that can amplify the power of your existing Splunk Cloud Platform, ...