Getting Data In

How do you configure props.conf for multiline events?

emilbach
New Member

Hi fellow Splunkers!

Having issues configuring props.conf for sourcing data to Splunk. We have now spent a couple of days trying these forums and testing, but to no avail.

Basically we have log files in the following format:

2019-03-27T09:00:00.0028098Z;avgsize;WSJYSIPQ01;S:;18547.9245283019
2019-03-27T09:00:00.0028098Z;avgtime;WSJYSIPQ01;S:;0.437509433962264
2019-03-27T09:00:00.0028098Z;count;WSJYSIPQ01;S:;53
2019-03-27T09:00:00.0028098Z;maxtime;WSJYSIPQ01;S:;0.841
2019-03-27T09:00:00.0028098Z;mintime;WSJYSIPQ01;S:;0.234
2019-03-27T09:00:00.0028098Z;avgsize;WSJYSIPQ01;V:;22639.7090909091
2019-03-27T09:00:00.0028098Z;avgtime;WSJYSIPQ01;V:;0.488685714285715
2019-03-27T09:00:00.0028098Z;count;WSJYSIPQ01;V:;385
....

and so on in a repeating pattern, but with varying host and drives (here WSJYSUPQ01 and S for line 1).

What we would like the data in Splunk to look like after import is this:

Headers: Timestamp:avgsize:avgtime:count:maxtime:mintime:host:drive

which would then give one line for each combination of timestamp, host and drive.

We have figured out how to correctly convert the timestamp. But how do we collapse the multiple lines into one per timestamp-host-drive combination and extract the fields as headers?

Many thanks!

br

0 Karma
1 Solution

woodcock
Esteemed Legend

This is not possible with Splunk so you will have to pre-process it with a glue script that you write OR you can use cribl.

View solution in original post

0 Karma

woodcock
Esteemed Legend

This is not possible with Splunk so you will have to pre-process it with a glue script that you write OR you can use cribl.

0 Karma

emilbach
New Member

Thank you. The correct way to handle this would be to run a report on top of the data then, for example daily? (given that I cannot change the input data layout)

0 Karma

harsmarvania57
SplunkTrust
SplunkTrust

Hi,

Try below config on your Indexer/Heavy Forwarder.

props.conf

[yoursourcetype]
FIELD_DELIMITER=;
FIELD_NAMES=Timestamp,type,ext_host,drive,value
TIMESTAMP_FIELDS=Timestamp

EDIT: I have updated above config because the way you want header is not possible. but you can extract those values in total 5 headers Timestamp,type,ext_host,drive,value . With these headers avgtime, count .... will assign it to header with name type , hostname goes to ext_host header/field.

0 Karma

emilbach
New Member

Hi harsmarvania57,
Thank you for your reply.

I have tested your suggestion and it gives the fields avgtime, count etc. as a value in the field type. This is not as intended. When you say it is impossible to have them as headers, do you mean impossible due to some technical contraint, or that it is just hard, but doable via regex or something? We were thinking to have each set of 5 rows grouped as one event, and then extract each field via regex? Again, thanks for taking the time to reply.

0 Karma

harsmarvania57
SplunkTrust
SplunkTrust

To achieve this, you need to index data with headers I have provided in above props.conf config and then you can create search that split those data into different header with their values.

In my lab environment, I have ingested sample data which you have provided with props.conf config which I have given earlier and then ran below search which is giving result which you require.

<yourBaseSearch>
| stats list(type) as type, list(value) as value by drive,ext_host,Timestamp
| eval field_merge=mvzip(type,value)
| mvexpand field_merge
| eval a=mvindex(split(field_merge,","),0), b=mvindex(split(field_merge,","),1)
| eval {a}=b
| stats values(avgsize) as avgsize, values(avgtime) as avgtime, values(count) as count, values(mintime) as mintime, values(maxtime) as maxtime by drive,ext_host,Timestamp
0 Karma

emilbach
New Member

Thank you for the reply again. So there is no way to do this before the search?

My worry is that data volume is huge, and having to search the rows, is trawling through 5 times more data than if I had one line for each event. Is that not true? Basically as I understand your answer, the final "by" clause groups the 5 lines into one in the stats statement.

I have played around with RegEx, and can identify the key-value-pairs in a RegEx simulator, but when I try to implement it in both props.conf and transforms.conf, I cannot get the EXTRACT-field to work.... 😞

0 Karma

harsmarvania57
SplunkTrust
SplunkTrust

To achieve this at search time on Search Head with props.conf and transforms.conf , you can try below config

props.conf

[yoursourcetype]
REPORT-test_hdt = hostdrv_extract
REPORT-test_type = type_extract

transforms.conf

[type_extract]
CLEAN_KEYS = 0
FORMAT = $1::$2
REGEX = ^(?:([^;]*)[;]){2}(?:[^;]*[;]){2}([^\v]*)

[hostdrv_extract]
CLEAN_KEYS = 0
FORMAT = Timestamp::$1 hostname::$2 drive::$3
REGEX = ^([^;]*);(?:[^;]*);([^;]*);([^;]*);
0 Karma

emilbach
New Member

Thank you again. But I was trying to implement this while loading the data to Splunk, not in a search. Seems it is not possible after all. We will revert to correcting the data creation process I think

0 Karma
Get Updates on the Splunk Community!

Routing logs with Splunk OTel Collector for Kubernetes

The Splunk Distribution of the OpenTelemetry (OTel) Collector is a product that provides a way to ingest ...

Welcome to the Splunk Community!

(view in My Videos) We're so glad you're here! The Splunk Community is place to connect, learn, give back, and ...

Tech Talk | Elevating Digital Service Excellence: The Synergy of Splunk RUM & APM

Elevating Digital Service Excellence: The Synergy of Real User Monitoring and Application Performance ...