Splunk Search

Extract fields from single event(s) consisting of mutiple lines

ashabc
Contributor

I have a csv file that has only one column without any header. The data set includes values for userid, property1, property2, property3 and then again userid, propperty1, property2, property3 and so on. How can I extract fields useird, property1, property2 and property3 ?

Tried something like below (e.g. for userid), does not work

.....| rex field=_raw "(?<userid>^(.*)\n)"
0 Karma
1 Solution

MuS
Legend

Hi ashabc,

take a look at this answer http://answers.splunk.com/answers/305727/why-is-my-rex-statement-unable-to-extract-the-fiel.html#ans... to learn about pcregextest And how you can Test regex in Splunk.

cheers, MuS

View solution in original post

MuS
Legend

Hi ashabc,

take a look at this answer http://answers.splunk.com/answers/305727/why-is-my-rex-statement-unable-to-extract-the-fiel.html#ans... to learn about pcregextest And how you can Test regex in Splunk.

cheers, MuS

MuS
Legend

Based on the just provided examples you can try this:

| gentimes start=-1 | eval foo="user1
101253
DTZ
Penrith, Cumberland
user2
2151614
FCC
Balnd, Temora" | rex max_match=0 field=foo "user\d[\r\n](?<userID>[^\r\n]*)[\r\n](?<property1>[^\r\n]*)[\r\n](?<property2>[^\r\n]*)" | table userID, property1, property2

or use the internal pcregextest like this:

$SPLUNK_HOME/bin/splunk cmd pcregextest mregex="user\d[\r\n](?<userID>[^\r\n]*)[\r\n](?<property1>[^\r\n]*)[\r\n](?<property2>[^\r\n]*)" test_str="user1
>     101253
>     DTZ
>     Penrith, Cumberland
>     user2
>     2151614
>     FCC
>     Balnd, Temora"
Original Pattern: 'user\d[\r\n](?<userID>[^\r\n]*)[\r\n](?<property1>[^\r\n]*)[\r\n](?<property2>[^\r\n]*)'
Expanded Pattern: 'user\d[\r\n](?<userID>[^\r\n]*)[\r\n](?<property1>[^\r\n]*)[\r\n](?<property2>[^\r\n]*)'
Regex compiled successfully. Capture group count = 3. Named capturing groups = 3.
SUCCESS - match against: 'user1
    101253
    DTZ
    Penrith, Cumberland
    user2
    2151614
    FCC
    Balnd, Temora'

#### Capturing group data ##### 
Group |            Name | Value
--------------------------------------
    1 |          userID |     101253
    2 |       property1 |     DTZ
    3 |       property2 |     Penrith, Cumberland
0 Karma

ashabc
Contributor

Its kind of work.

What I still don't get it is you used eval foo="data_string". Its OK for 2 sets of sample data. When I have thousands of data in the csv file, how can I tackle that?

0 Karma

Runals
Motivator

He was using gentimes and the eval as a way to test the methodology. If you do the search as

... | rex max_match=0 field=foo "user\d[\r\n](?<userID>[^\r\n]*)[\r\n](?<property1>[^\r\n]*)[\r\n](?<property2>[^\r\n]*)" | table userID, property1, property2

It should work. Certainly change the "field=" part of the rex command to fit the sourcetype and field that contains the data.

Runals
Motivator

Can you post an example of the data? Does the data just contain the values or is there something unique to each line that could be useful to key in on for the extraction process.

0 Karma

ashabc
Contributor

Here is sample data for 2 users. It basically contains a set of strings and numbers. The userid will be string, followed by some other form of id (property1) in number form, then 2 other properties, both strings and so on.

user1
101253
DTZ
Penrith, Cumberland
user2
2151614
FCC
Balnd, Temora

0 Karma
Get Updates on the Splunk Community!

Extending Observability Content to Splunk Cloud

Watch Now!   In this Extending Observability Content to Splunk Cloud Tech Talk, you'll see how to leverage ...

More Control Over Your Monitoring Costs with Archived Metrics!

What if there was a way you could keep all the metrics data you need while saving on storage costs?This is now ...

New in Observability Cloud - Explicit Bucket Histograms

Splunk introduces native support for histograms as a metric data type within Observability Cloud with Explicit ...