Splunk Search

extract fields from csv file preamble

dominiquevocat
SplunkTrust
SplunkTrust

I have a csv file where the column header is on the fourth line. Before that are several interesting fields which i would like to extract.

So the first few lines would be like this:

sep=;
StationName;Long;Lat;ModuleName;ModuleType
Bahnhofstr1Dach;7.6516696493516;47.583540135635;Aussen;Outdoor
Timestamp;"Timezone : Europe/Berlin";Temperature;Humidity
1427649042;"2015/03/29 19:10:42";12.9;69

As you can see there is a first block with meta information followed by a endless block of metrics. I wish to extract the meta information from the header per file and set them as fields added to each record. The last line in the sample is the first data point.

I failed to find this in the documentation but imagine it to be possible.

Any help would be aprechiated. Even a silly link to the docs (doh!)

n00badmin
Communicator

I believe I did a similar thing by not indexing as csv. instead index the csv as 1 event, then use multikv to extract ur columns and rex to extract the preamble...

0 Karma

n00badmin
Communicator

actually we did this with tabular data...I assume your data is true csv?

0 Karma

dominiquevocat
SplunkTrust
SplunkTrust

yes, i use the "default" csv inptu as tabular data and it works fone for the payload but alas the (also tabluar) header contains nice stuff... Naturally the header lines get eliminated by the definition of the headerline as the 4th line. (Unless i am mistaken).

0 Karma

n00badmin
Communicator

yea u are right, thats why I didnt use the csv sourcetype.

Can you post and example of the pre-amble then headers with data?

0 Karma

jeffland
SplunkTrust
SplunkTrust

First I tried something with multivalued fields, but that didn't work out well - so I ended up with this sorry excuse for a hacky workaround:
You do a lookup on the file, remove everything except the actual line you are interested in, and rex the contents out of it:

| inputlookup temp.csv | head 2 | tail 1 | rename "sep=;" as field | rex field=field "(?<StationName>[^\;]+)\;(?<Long>[^\;]+)\;(?<Lat>[^\;]+)\;(?<ModuleName>[^\;]+)\;(?<ModuleType>[^\;]+)"

You can then think about how to get these fields unto every event, maybe by doing a subsearch and then some magic with eval or eventstats. This would require a thought that I can't come up with at the moment though.
But perhaps it's easier to export the results of the above search with outputlookup somehow, and then do regular lookups on that exported data during the actual search. This however looks like it should more sensibly done with some script on the filesystem rather than with splunk, as you could simply exctract lines 2 and 3... unfortunately, I don't know how to do that off the top of my head either.

0 Karma

dominiquevocat
SplunkTrust
SplunkTrust

Since the data is usually not that big that might well be a way.
When one does some steps manually might as well use a script to separate the second and third line into a lookup and strip it from the .csv
I was hoping for a pure splunk way as this seems to me to be a relatively common case.
Still this seems to be a good workarround. Alternatively since if you copy the csv into a folder where it can be indexed one could also set some tags manually.
If the .csv file were in /opt/splunk/var/run/splunk one could use inputcsv

0 Karma

jeffland
SplunkTrust
SplunkTrust

I believe the file has to be in %SPLUNK_HOME/etc/your_app/lookups, alternatively in %SPLUNK_HOME/etc/system/lookups.

0 Karma

otman01
Communicator

I don't understand what exactley would you extratct but Look at this link
https://regex101.com/

and this is an exemple that could help you :
in the REGULAR EXPRESSION part type ** ....(?...).**
and in the TEST STRING part type I'm not expert in splunk but look at this ** it will extrat the world **not
it could be a good exercice for you
and for this exemple you can in Splunk search barre this command:
rex field=_raw "....(?...).*" .

0 Karma

dominiquevocat
SplunkTrust
SplunkTrust

thanks but the data is not in the index data because it is before the header line.

0 Karma

krdo
Communicator
0 Karma

dominiquevocat
SplunkTrust
SplunkTrust

yeah, i read that, i set header line to 3 so i get the payload but i wish to extract data from the first block.

0 Karma

HarisIdris
Engager

I have this exact problem, have you found a solution?

0 Karma

dominiquevocat
SplunkTrust
SplunkTrust

nope, sorry
but it didn't matter because it was only playing around with my netatmo and the test splunk at home.

0 Karma
Get Updates on the Splunk Community!

Announcing Scheduled Export GA for Dashboard Studio

We're excited to announce the general availability of Scheduled Export for Dashboard Studio. Starting in ...

Extending Observability Content to Splunk Cloud

Watch Now!   In this Extending Observability Content to Splunk Cloud Tech Talk, you'll see how to leverage ...

More Control Over Your Monitoring Costs with Archived Metrics GA in US-AWS!

What if there was a way you could keep all the metrics data you need while saving on storage costs?This is now ...