Splunk Search

How can I dynamically split my sample data using regex or any other options are available?

Shan
Builder

I have data in a log file as mentioned below. Can I split it using regex or any other options are available?

0010213002040538

I want to split the data above like this:

001 02 13 
002 04 0538 

For example, we can take:

001 02 13 

001 is a transaction code
02 is length of next value's value
13 is the value

Based on the length, I need to split the value dynamically.

So, how can I dynamically write the rex search to split it? If "02" appears as the length, I need to use that length and split the next value "13".
If the length is "04" then, I need to split based on the length to get "0538".

Thanks in advance
Kindly help me.

Tags (2)
0 Karma

alacercogitatus
SplunkTrust
SplunkTrust

This cannot be currently done. The regular expressions won't ever match properly, and using .* gets way to much data to be useful. The only fix here is to edit the source of the data (or perform prior processing with a script) to sed the data correctly.

Here is a sample bash script that will separate out the portions you need.

#!/bin/bash
data="001021300204053800309123d5-78900404data00503get"
myIndex=0
while [ $myIndex -lt ${#data} ]
do
  txnid=${data:$myIndex:3}
  myIndex=$[$myIndex+3]
  txnlen=`echo ${data:$myIndex:2}|sed 's/^0*//'`
  myIndex=$[$myIndex+2]
  txnstr=${data:$myIndex:$txnlen}
  myIndex=$[$myIndex+$txnlen]
  echo "txnid=$txnid txnlen=$txnlen txnstr=\"$txnstr\" "
done

This can be setup as a scripted input (passing in the correct values for data from command line) or by running it on the logs on the server, placing the output into a new location, and using the forwarder on the new logs with proper parsing. Then this is consumed and search like:

<your_scripted_input> | table txnid txnlen txnstr

woodcock
Esteemed Legend

Not with a single rex but with this chain of commands:

 ... | rex "(?<TransactionCode>.{3})(?<FieldValueLen>.{2})(?<FieldValue>.*)" | eval FieldValue=substr(FieldValue,1,FieldValueLen)
0 Karma

Shan
Builder

Woodcock,

First of all. Thank you very much for your valuable reply.
When I use the above rex search, it's splitting the first value and stopped there itself. How can I make use of the same rex for multiple value separation?

Sample data:

001021300204053800309123d5-78900404data00503get

Current Search:

sourcetype=testrex | table * | rex field=_raw "(?&lt;TransactionCode&gt;.{3})(?<FieldValueLen>.{2})(?<FieldValue>.&#42;)" | eval FieldValue=substr(FieldValue,1,FieldValueLen) | table TransactionCode FieldValueLen FieldValue

Desired Result:

001 02 13
002 04 0538
003 09 123d5-789
004 04 data
005 03 get

Current Result:

TransactionCode FieldValueLen FieldValue
001 02 13
001 02 13

Regards,
Shankar

0 Karma

woodcock
Esteemed Legend

Hopefully you have a limited chain otherwise an iterative approach like mine won't work. Let's assume you can have at most 4 in a chain; this should work:

... | rex "(?<TransactionCode>.{3})(?<FieldValueLen>.{2})(?<TempFieldValue>.*)"
| eval FieldValue=substr(TempFieldValue,1,FieldValueLen)
| eval TempFieldValue=substr(TempFieldValue,1+FieldValueLen)
| eval subevent=TransactionCode . ":::" . FieldValueLen . ":::" . FieldValue
| rex "(?<TempTransactionCode>.{3})(?<TempFieldValueLen>.{2})(?<TempFieldValue>.*)"
| eval TransactionCode=mvappend(TransactionCode, TempTransactionCode)
| eval FieldValueLen=mvappend(FieldValueLen, TempFieldValueLen)
| eval FieldValue=mvppend(FieldValue, substr(TempFieldValue,1,TempFieldValueLen)
| eval TempFieldValue=substr(TempFieldValue,1+TempFieldValueLen)
| eval subevent=mvappend(subevent, TempTransactionCode . ":::" . TempFieldValueLen . ":::" . TempFieldValue)
| rex "(?<TempTransactionCode>.{3})(?<TempFieldValueLen>.{2})(?<TempFieldValue>.*)"
| eval TransactionCode=mvappend(TransactionCode, TempTransactionCode)
| eval FieldValueLen=mvappend(FieldValueLen, TempFieldValueLen)
| eval FieldValue=mvppend(FieldValue, substr(TempFieldValue,1,TempFieldValueLen)
| eval TempFieldValue=substr(TempFieldValue,1+TempFieldValueLen)
| eval subevent=mvappend(subevent, TempTransactionCode . ":::" . TempFieldValueLen . ":::" . TempFieldValue)
| rex "(?<TempTransactionCode>.{3})(?<TempFieldValueLen>.{2})(?<TempFieldValue>.*)"
| eval TransactionCode=mvappend(TransactionCode, TempTransactionCode)
| eval FieldValueLen=mvappend(FieldValueLen, TempFieldValueLen)
| eval FieldValue=mvppend(FieldValue, substr(TempFieldValue,1,TempFieldValueLen)
| eval subevent=mvappend(subevent, TempTransactionCode . ":::" . TempFieldValueLen . ":::" . TempFieldValue)

Each event has several new multivalued fields and if you need to break out each subevent into a separate event, you add this:

| mvexpand subevent | rex field=subevent "(?<TransactionCode>.*?):::(?<FieldValueLen>.*?):::(?<FieldValue>.*)"  | table TransactionCode FieldValueLen FieldValue
0 Karma

Shan
Builder

Hai Woodcock,

Thank you very much.
I will try it with another sample file.

0 Karma

woodcock
Esteemed Legend

Don't forget to "Accept" the answer to close this question.

0 Karma
Get Updates on the Splunk Community!

Join Us for Splunk University and Get Your Bootcamp Game On!

If you know, you know! Splunk University is the vibe this summer so register today for bootcamps galore ...

.conf24 | Learning Tracks for Security, Observability, Platform, and Developers!

.conf24 is taking place at The Venetian in Las Vegas from June 11 - 14. Continue reading to learn about the ...

Announcing Scheduled Export GA for Dashboard Studio

We're excited to announce the general availability of Scheduled Export for Dashboard Studio. Starting in ...