Hi fellow Splunkers!
Having issues configuring props.conf for sourcing data to Splunk. We have now spent a couple of days trying these forums and testing, but to no avail.
Basically we have log files in the following format:
2019-03-27T09:00:00.0028098Z;avgsize;WSJYSIPQ01;S:;18547.9245283019
2019-03-27T09:00:00.0028098Z;avgtime;WSJYSIPQ01;S:;0.437509433962264
2019-03-27T09:00:00.0028098Z;count;WSJYSIPQ01;S:;53
2019-03-27T09:00:00.0028098Z;maxtime;WSJYSIPQ01;S:;0.841
2019-03-27T09:00:00.0028098Z;mintime;WSJYSIPQ01;S:;0.234
2019-03-27T09:00:00.0028098Z;avgsize;WSJYSIPQ01;V:;22639.7090909091
2019-03-27T09:00:00.0028098Z;avgtime;WSJYSIPQ01;V:;0.488685714285715
2019-03-27T09:00:00.0028098Z;count;WSJYSIPQ01;V:;385
....
and so on in a repeating pattern, but with varying host and drives (here WSJYSUPQ01 and S for line 1).
What we would like the data in Splunk to look like after import is this:
Headers: Timestamp:avgsize:avgtime:count:maxtime:mintime:host:drive
which would then give one line for each combination of timestamp, host and drive.
We have figured out how to correctly convert the timestamp. But how do we collapse the multiple lines into one per timestamp-host-drive combination and extract the fields as headers?
Many thanks!
br
This is not possible with Splunk so you will have to pre-process it with a glue script that you write OR you can use cribl
.
This is not possible with Splunk so you will have to pre-process it with a glue script that you write OR you can use cribl
.
Thank you. The correct way to handle this would be to run a report on top of the data then, for example daily? (given that I cannot change the input data layout)
Hi,
Try below config on your Indexer/Heavy Forwarder.
props.conf
[yoursourcetype]
FIELD_DELIMITER=;
FIELD_NAMES=Timestamp,type,ext_host,drive,value
TIMESTAMP_FIELDS=Timestamp
EDIT: I have updated above config because the way you want header is not possible. but you can extract those values in total 5 headers Timestamp,type,ext_host,drive,value
. With these headers avgtime, count ....
will assign it to header with name type
, hostname goes to ext_host
header/field.
Hi harsmarvania57,
Thank you for your reply.
I have tested your suggestion and it gives the fields avgtime, count etc. as a value in the field type. This is not as intended. When you say it is impossible to have them as headers, do you mean impossible due to some technical contraint, or that it is just hard, but doable via regex or something? We were thinking to have each set of 5 rows grouped as one event, and then extract each field via regex? Again, thanks for taking the time to reply.
To achieve this, you need to index data with headers I have provided in above props.conf config and then you can create search that split those data into different header with their values.
In my lab environment, I have ingested sample data which you have provided with props.conf config which I have given earlier and then ran below search which is giving result which you require.
<yourBaseSearch>
| stats list(type) as type, list(value) as value by drive,ext_host,Timestamp
| eval field_merge=mvzip(type,value)
| mvexpand field_merge
| eval a=mvindex(split(field_merge,","),0), b=mvindex(split(field_merge,","),1)
| eval {a}=b
| stats values(avgsize) as avgsize, values(avgtime) as avgtime, values(count) as count, values(mintime) as mintime, values(maxtime) as maxtime by drive,ext_host,Timestamp
Thank you for the reply again. So there is no way to do this before the search?
My worry is that data volume is huge, and having to search the rows, is trawling through 5 times more data than if I had one line for each event. Is that not true? Basically as I understand your answer, the final "by" clause groups the 5 lines into one in the stats statement.
I have played around with RegEx, and can identify the key-value-pairs in a RegEx simulator, but when I try to implement it in both props.conf and transforms.conf, I cannot get the EXTRACT-field to work.... 😞
To achieve this at search time on Search Head with props.conf and transforms.conf , you can try below config
props.conf
[yoursourcetype]
REPORT-test_hdt = hostdrv_extract
REPORT-test_type = type_extract
transforms.conf
[type_extract]
CLEAN_KEYS = 0
FORMAT = $1::$2
REGEX = ^(?:([^;]*)[;]){2}(?:[^;]*[;]){2}([^\v]*)
[hostdrv_extract]
CLEAN_KEYS = 0
FORMAT = Timestamp::$1 hostname::$2 drive::$3
REGEX = ^([^;]*);(?:[^;]*);([^;]*);([^;]*);
Thank you again. But I was trying to implement this while loading the data to Splunk, not in a search. Seems it is not possible after all. We will revert to correcting the data creation process I think