All Apps and Add-ons

Fixed Width data streamsfield extractions

paul_1994
Path Finder

I have this TCP input that is coming into my system and the fields are fixed width and sometimes the fields are blank.

I need help with my props and transforms so I can read the data correctly and the have fields properly extracted
Has anyone had any experience with this. Also from the stream not all the fields have data they are just blank but the spacing remains the same.

0 Karma

marcoscala
Builder

Check this I just wrote on LinkedIn:

Hi!
I don't know how many of you had to manage data coming from logs with fixed-lenght records. This is the old cobol/mainframe like method of managing datafiles.

With Splunk there's no problem in defining field extraction with regex. If, for instance we have a schema like this:
field1 - 4chars
field2 - 8chars
and so on, we could write a simple EXTRACT like this:

[mysourcetype]
EXTRACT = ^(?.{4})(?.{8}) and so on....

This works fine in a search. So you can do something like:
* | stats count by field1

But, what happens if you try to filter by the field value? If you try something like
* field1=abcd
this won't work!!! To make it work, you have to write something like
* field1=abcd

So, all the power of fields become useless with this kind of data! Transactions are gone, subsearches are gone, most of the power of Splunk Analytics is gone!

Gone forever? Thanks god not! I found a very easy workaround to this. I know it's not a real solution to the problem, but, if you don't have a strong requirement on file integrity, this will save your life!!! The answer is: "Add fields delimiters!!!"

Splunk indexing works by indexing every single "string", which means every sequence of letters and numbers divided by some non alphanumeric character, like ".,!/!?:" and so on.

So, the solution is to modify the raw data written in the Index using such a separator.

Here is how:
1. props.conf
[mysourcetype]
TRANSFORMS = add_separators
EXTRACT = ^(?.{4}).(?.{8}). and so on....

  1. transforms.conf [add_separators] DEST_KEY = _raw SOURCE_KEY = _raw REGEX = ^(.{4})(.8{})(.*) FORMAT = $1.$2.$3

Now, finally, everything works back as usual!!

0 Karma

marcoscala
Builder

and for the empty fields you can use the "fillnull" command.

0 Karma

sideview
SplunkTrust
SplunkTrust

I've had to index data like this in the past and I've always just used a very specific regex to do it. Shoretel CDR data for instance is fixed-width and many of the fields are often blank. Yet that didn't stop us from creating a pretty sweet app for that data - http://sideviewapps.com/apps/splunk-for-shoretel/

Getting the regex to the point where it is reliable even when one or more fields are absent in the given row can be done and tools such as RegexBuddy is utterly instrumental in testing out the representative cases in your data.

http://docs.splunk.com/Documentation/Splunk/5.0.2/Knowledge/Addfieldsatsearchtime#Configure_field_ex...

0 Karma
Get Updates on the Splunk Community!

Index This | I am a number, but when you add ‘G’ to me, I go away. What number am I?

March 2024 Edition Hayyy Splunk Education Enthusiasts and the Eternally Curious!  We’re back with another ...

What’s New in Splunk App for PCI Compliance 5.3.1?

The Splunk App for PCI Compliance allows customers to extend the power of their existing Splunk solution with ...

Extending Observability Content to Splunk Cloud

Register to join us !   In this Extending Observability Content to Splunk Cloud Tech Talk, you'll see how to ...