extract fields from csv file preamble

dominiquevocat · ‎07-13-2015

I have a csv file where the column header is on the fourth line. Before that are several interesting fields which i would like to extract.

So the first few lines would be like this:

sep=;
StationName;Long;Lat;ModuleName;ModuleType
Bahnhofstr1Dach;7.6516696493516;47.583540135635;Aussen;Outdoor
Timestamp;"Timezone : Europe/Berlin";Temperature;Humidity
1427649042;"2015/03/29 19:10:42";12.9;69

As you can see there is a first block with meta information followed by a endless block of metrics. I wish to extract the meta information from the header per file and set them as fields added to each record. The last line in the sample is the first data point.

I failed to find this in the documentation but imagine it to be possible.

Any help would be aprechiated. Even a silly link to the docs (doh!)

n00badmin · ‎07-13-2015

I believe I did a similar thing by not indexing as csv. instead index the csv as 1 event, then use multikv to extract ur columns and rex to extract the preamble...

n00badmin · ‎07-13-2015

actually we did this with tabular data...I assume your data is true csv?

dominiquevocat · ‎07-13-2015

yes, i use the "default" csv inptu as tabular data and it works fone for the payload but alas the (also tabluar) header contains nice stuff... Naturally the header lines get eliminated by the definition of the headerline as the 4th line. (Unless i am mistaken).

n00badmin · ‎07-13-2015

yea u are right, thats why I didnt use the csv sourcetype.

Can you post and example of the pre-amble then headers with data?

jeffland · ‎07-13-2015

First I tried something with multivalued fields, but that didn't work out well - so I ended up with this sorry excuse for a hacky workaround:
You do a lookup on the file, remove everything except the actual line you are interested in, and rex the contents out of it:

| inputlookup temp.csv | head 2 | tail 1 | rename "sep=;" as field | rex field=field "(?<StationName>[^\;]+)\;(?<Long>[^\;]+)\;(?<Lat>[^\;]+)\;(?<ModuleName>[^\;]+)\;(?<ModuleType>[^\;]+)"

You can then think about how to get these fields unto every event, maybe by doing a subsearch and then some magic with eval or eventstats. This would require a thought that I can't come up with at the moment though.
But perhaps it's easier to export the results of the above search with outputlookup somehow, and then do regular lookups on that exported data during the actual search. This however looks like it should more sensibly done with some script on the filesystem rather than with splunk, as you could simply exctract lines 2 and 3... unfortunately, I don't know how to do that off the top of my head either.

dominiquevocat · ‎07-15-2015

Since the data is usually not that big that might well be a way.
When one does some steps manually might as well use a script to separate the second and third line into a lookup and strip it from the .csv
I was hoping for a pure splunk way as this seems to me to be a relatively common case.
Still this seems to be a good workarround. Alternatively since if you copy the csv into a folder where it can be indexed one could also set some tags manually.
If the .csv file were in /opt/splunk/var/run/splunk one could use inputcsv

jeffland · ‎07-15-2015

I believe the file has to be in %SPLUNK_HOME/etc/your_app/lookups, alternatively in %SPLUNK_HOME/etc/system/lookups.

otman01 · ‎07-13-2015

I don't understand what exactley would you extratct but Look at this link
https://regex101.com/

and this is an exemple that could help you :
in the REGULAR EXPRESSION part type ** ....(?...).**
and in the TEST STRING part type I'm not expert in splunk but look at this ** it will extrat the world **not
it could be a good exercice for you
and for this exemple you can in Splunk search barre this command:
rex field=_raw "....(?...).*" .

dominiquevocat · ‎07-13-2015

thanks but the data is not in the index data because it is before the header line.

krdo · ‎07-13-2015

"Extract data from files with headers" ( http://docs.splunk.com/Documentation/Splunk/6.2.4/Data/Extractfieldsfromfileheadersatindextime ) might help

dominiquevocat · ‎07-13-2015

yeah, i read that, i set header line to 3 so i get the payload but i wish to extract data from the first block.

HarisIdris · ‎05-21-2018

I have this exact problem, have you found a solution?

dominiquevocat · ‎06-06-2018

nope, sorry
but it didn't matter because it was only playing around with my netatmo and the test splunk at home.

extract fields from csv file preamble

Announcing Scheduled Export GA for Dashboard Studio

Extending Observability Content to Splunk Cloud

More Control Over Your Monitoring Costs with Archived Metrics GA in US-AWS!