Splunk Search

How to apply regex rules in props.conf and transforms.conf to filter unstructured data before indexing it in Splunk?

prachisaxena
Explorer

The requirement is a multilevel filter
1. I need to create a line break at Header|521|02|00|521| which I am doing using props.conf

props.conf

BREAK_ONLY_BEFORE = Header\|\S*\|\S*\|\S*\|521\|
  1. I need to extract a number of fields using transforms.conf
    transforms.conf

    REGEX = (?P[^|])|(?P[^|])|(?P[^|])|(?P[^|])|(?P[^|])|(?P[^|])|(?P[^|]*)

    DEST_KEY = _raw
    FORMAT = $1,$7

  2. I Also need to filter the event with a specific value in field such as f7=SCL
    Log file looks like as below

****log file **
512 521 1054 14447916361 SCL@YOK 384 P 2 10GNS@GOC Header|521|02|00|521||SCL@YOK||scl11adm|TYO|NRT|2015-10-14 12:00:33+09:00|2015-10-14 12:00:36+09:00|2015-10-14 11:00:36+08:00|
Identifier 3235897206|
Detail YOK|AHG|SYD|SSE|2015-10-14 11:59:00+09:00|YA4VC|P|P|82.000|0.000||
Reference F7P43||||1|I|
PieceDetail JD014600000733002464|82.0|||178.6|||58.0|||110.0|||140.0||||WPX||||
ExtraCharge YW|JP||0.000||JPY|FOCJPBBX||2015-10-14 11:59:00+09:00||I|
Document|3235897206||FCA||||||||||
DocumentLine 3235897206||1|||||||JP|P|1||0.000||BREAK BULK EXPRESS|KGS.|AUD||
512 15206781 14447916361 SCL@TYO 384 P 2 10GNS2@GOC Header|15206|02|00|521||SCL@TYO||scl11adm|TYO|---|2015-10-14 12:00:36+09:00|2015-10-14 12:00:36+09:00|2015-10-14 11:00:36+08:00|
Identifier 9929275941|
Detail TYO||LBA|SHF|2015-10-14 10:59:00+09:00|NEW0|D|D|0.50|0.40||K|A|DOCUMENT|DOX|0.000|
PieceDetail JD014600002447636977|0.5||||||1.0|||48.0|||39.0||||||||
512 518 246 14447915821 GOP@PEK 384 PKUL 2 10GNS2@GOC

Header||02|00|518||GOP|GOP@PEK||PEK|WOC|2015-10-14 10:59:00+08:00|2015-10-14 10:59:00+08:00||
EventCommon P|JD014600001332139235|||2015-10-14 10:59:00+08:00|PEK|WOC|PEK|PEK|000001|OK||<lhj>|d|
EventSpecific 7329|WOZA|A|||<lhj>|
512 518 246 14447915871 GOP@PEK 384 PKUL 2 10GNS2@GOC Header||02|00|518||GOP|GOP@PEK||PEK|WOC|2015-10-14 10:59:00+08:00|2015-10-14 10:59:00+08:00||
EventCommon P|JD013059718270005069|||2015-10-14 10:59:00+08:00|PEK|WOC|PEK|PEK|000001|OK||<lhj>|d|
EventSpecific7329|WOZA|A|||<lhj>|
512 518 246 14447915931 GOP@PEK 384 PKUL 2 10GNS2@GOC

Header||02|00|518||GOP|GOP@PEK||PEK|WOC|2015-10-14 10:59:00+08:00|2015-10-14 10:59:00+08:00||

0 Karma

jplumsdaine22
Influencer

An excellent place to test regular expressions is https://regex101.com/

Header\S+521|\s should be enough to break your event.

As for the field extraction, can you give some examples of what your trying to extract? The regex you've posted looks like it will match every character

0 Karma

prachisaxena
Explorer

Thank you very much for helping.
The log file is pipe delimited ( although not completely). I have created regex to extract all the fields delimited by pipe. After this using the FORMAT statement, i am extracting only the required text from REGEX lets say $1 and $7 ( or f1 and f7). After this i need to only retain the lines where f7=SCL

512 15206781 14447916361 SCL@TYO 384 P 2 10GNS2@GOC Header|15206|02|00|521||SCL@TYO||scl11adm|TYO|---|2015-10-14 12:00:36+09:00|2015-10-14 12:00:36+09:00|2015-10-14 11:00:36+08:00|
Identifier 9929275941|

0 Karma
Get Updates on the Splunk Community!

.conf24 | Registration Open!

Hello, hello! I come bearing good news: Registration for .conf24 is now open!   conf is Splunk’s rad annual ...

Splunk is officially part of Cisco

Revolutionizing how our customers build resilience across their entire digital footprint.   Splunk ...

Splunk APM & RUM | Planned Maintenance March 26 - March 28, 2024

There will be planned maintenance for Splunk APM and RUM between March 26, 2024 and March 28, 2024 as ...