Getting Data In

Splunk - Flat file - Field Extraction - Transforms.conf

vasanthmss
Motivator

Hi,
I want to extract the fields using transforms.conf from a flat file. The file contains the following fields. There is no field delimiter. I have tried the below regex but not working. pls help me.

Field Length:
ID -10
FIRST_NAME -15
LAST_NAME -12
ADDRESS - 25
PHONE - 10

Example File1:

ID FIRST_NAME LAST_NAME ADDRESS PHONE
0000001 Name1 Lastname1 Address1 1234567890

00000021 name2 Address2 123456
001 name3 Address3


REGEX = (?i)^(?P<ID>.{10})\S(?<FIRST_NAME>.{15})\S(?<LAST_NAME>.{12})\S(?<ADDRESS>.{25})\S(?<PHONE>.{10})

Thanks,
V.S

V
Tags (2)
1 Solution

kristian_kolb
Ultra Champion

One problem with your REGEX above is that you state that there is a non-space character between your fields. Perhaps you want to use \s instead of \S? Also, if you want to use that regex, I suggest you change it a little and put it in a normal EXTRACT in props.conf only, like so;

props.conf

[your_sourcetype]
EXTRACT-blah = ^(?P<ID>.{10})\s+(?<FIRST_NAME>.{15})\s+(?<LAST_NAME>.{12})\s+(?<ADDRESS>.{25})\s+(?<PHONE>.{10})

This works well for logs like;

1234567890 gregor          von klopp   1, high street            1-555-2345
345323     billy bob ben   joe         buckingham palace, london +44 123123

So as you can see, this approach requires you to have fixed length fields in your log file.


If your fields do NOT contain spaces, and you have a delimiter that is a single space, it could also be done (better) with a REPORT (which uses transforms.conf);

props.conf

[your_sourcetype]
REPORT-blah = extract_blah

transforms.conf

[extract_blah]
DELIMS = " "
FIELDS = ID, FIRST_NAME, LAST_NAME, ADDRESS, PHONE

Alternatively, you can do the same thing by changing the regex and use an ordinary EXTRACT that does not require fixed length fields;

props.conf

[your_sourcetype]
EXTRACT-bleh = ^(?P<ID>\S+)\s+(?<FIRST_NAME>\S+)\s+(?<LAST_NAME>\S+)\s+(?<ADDRESS>\S+)\s+(?<PHONE>\S+)

These two approaches work with events like;

1234123 gregor von_klopp 1,high_street 1-555-2345
23421 billy_bob_ben joe buckingham_palace,london +44123123

Makes sense?


UPDATE:

If you have truly fixed fields, and the field content is truncated if it's above a certain field length, you'd better use the first approach and remove the delimiters from the regex, i.e.

file looks like;

12345678  HM Queen ElizabN/A         Buckingham Palace, London+44123123
0987654321Arnold         Schwarzenegg25, Fifth avenue, NY     +1-555-123

props.conf

EXTRACT-blah = ^(?P<ID>.{10})(?<FIRST_NAME>.{15})(?<LAST_NAME>.{12})(?<ADDRESS>.{25})(?<PHONE>.{10})

View solution in original post

kristian_kolb
Ultra Champion

One problem with your REGEX above is that you state that there is a non-space character between your fields. Perhaps you want to use \s instead of \S? Also, if you want to use that regex, I suggest you change it a little and put it in a normal EXTRACT in props.conf only, like so;

props.conf

[your_sourcetype]
EXTRACT-blah = ^(?P<ID>.{10})\s+(?<FIRST_NAME>.{15})\s+(?<LAST_NAME>.{12})\s+(?<ADDRESS>.{25})\s+(?<PHONE>.{10})

This works well for logs like;

1234567890 gregor          von klopp   1, high street            1-555-2345
345323     billy bob ben   joe         buckingham palace, london +44 123123

So as you can see, this approach requires you to have fixed length fields in your log file.


If your fields do NOT contain spaces, and you have a delimiter that is a single space, it could also be done (better) with a REPORT (which uses transforms.conf);

props.conf

[your_sourcetype]
REPORT-blah = extract_blah

transforms.conf

[extract_blah]
DELIMS = " "
FIELDS = ID, FIRST_NAME, LAST_NAME, ADDRESS, PHONE

Alternatively, you can do the same thing by changing the regex and use an ordinary EXTRACT that does not require fixed length fields;

props.conf

[your_sourcetype]
EXTRACT-bleh = ^(?P<ID>\S+)\s+(?<FIRST_NAME>\S+)\s+(?<LAST_NAME>\S+)\s+(?<ADDRESS>\S+)\s+(?<PHONE>\S+)

These two approaches work with events like;

1234123 gregor von_klopp 1,high_street 1-555-2345
23421 billy_bob_ben joe buckingham_palace,london +44123123

Makes sense?


UPDATE:

If you have truly fixed fields, and the field content is truncated if it's above a certain field length, you'd better use the first approach and remove the delimiters from the regex, i.e.

file looks like;

12345678  HM Queen ElizabN/A         Buckingham Palace, London+44123123
0987654321Arnold         Schwarzenegg25, Fifth avenue, NY     +1-555-123

props.conf

EXTRACT-blah = ^(?P<ID>.{10})(?<FIRST_NAME>.{15})(?<LAST_NAME>.{12})(?<ADDRESS>.{25})(?<PHONE>.{10})
Get Updates on the Splunk Community!

Updated Team Landing Page in Splunk Observability

We’re making some changes to the team landing page in Splunk Observability, based on your feedback. The ...

New! Splunk Observability Search Enhancements for Splunk APM Services/Traces and ...

Regardless of where you are in Splunk Observability, you can search for relevant APM targets including service ...

Webinar Recap | Revolutionizing IT Operations: The Transformative Power of AI and ML ...

The Transformative Power of AI and ML in Enhancing Observability   In the realm of IT operations, the ...