Solved: How to parse date and time in different columns du...

felipetesta · ‎09-08-2015

Hello,

My raw CSV data looks like this (it's open data, no anonymization needed):

Anno;Reg;Articolo;Classe;Dat Infr;Loc1;Numciv1;Bisinternolettera;Loc2;Ora Infr;Sanzione accessoria;Tipoinfr;Numero Verbali;Sanzioni
2014;Legge 285/92;158;Autovettura;01/07/2014;SEBASTOPOLI/(CORSO);238;B;(Altri);0:00;(Mancante);Sosta;1;1
2014;Legge 285/92;142;Autovettura;01/07/2014;UNITA' D'ITALIA/(CORSO);96;AFR.EST;(Altri);0:01;(Mancante);Limiti di Velocita';1;1

Date is in column 5 "Dat Infr" and time in column 9 "Ora Infr". I tried most combinations with/out quotes, comma, semicolon, SHOULD_LINEMERGE true/false, but nothing has worked and the correct timestamp doesn't come up. On the other hand, I can get Splunk to recognize the date field if I leave it alone in TIMESTAMP_FIELDS and TIME_FORMAT. The proposed props.conf is:

[ csv ]
TIME_FORMAT=%d/%m/%Y,%H:%M
TZ=Europe/Rome
MAX_TIMESTAMP_LOOKAHEAD=500
TIMESTAMP_FIELDS=Dat Infr,Ora Infr
SHOULD_LINEMERGE=false
INDEXED_EXTRACTIONS=csv
NO_BINARY_CHECK=true
KV_MODE=none
disabled=false
pulldown_type=true

What am I missing? Even the official docs have a similar unanswered question at http://docs.splunk.com/Documentation/Splunk/6.2.2/Data/ConfigurePositionalTimestampExtraction

Thanks.

felipetesta · ‎09-08-2015

Answering my own question after half-a-day of research and trial&error.

I checked strptime() man page and found a "wildcard" field descriptor: %n, "Arbitraty whitespace". So, replacing the "comma" with "%n" into

TIME_FORMAT=%d/%m/%Y%n%H:%M

did the trick. Mind that CSV field names are case sensitive. Double quotes are not necessary if the CSV field name contains whitespace.

View solution in original post

felipetesta · ‎09-08-2015

Answering my own question after half-a-day of research and trial&error.

I checked strptime() man page and found a "wildcard" field descriptor: %n, "Arbitraty whitespace". So, replacing the "comma" with "%n" into

TIME_FORMAT=%d/%m/%Y%n%H:%M

did the trick. Mind that CSV field names are case sensitive. Double quotes are not necessary if the CSV field name contains whitespace.

felipetesta · ‎09-08-2015

Findings after further investigation.

Browsing further into the preview the timestamp gets recognized and importing the 2014 CSV works, except for first ~100 events. But CSV from year 2013 has the time in hh.mm (dot instead of colon) and I cannot import it.

Curiously, if I preview the extraction of date or time only, the TIME_FORMAT (%d/%m/%Y or %H.%M) is comfirmed to be correct. Concatenation doesn't seem to work (reliably). Swapping date and time neither.

I would prefer not to edit the source CSV with +700k lines.

Splunk version 6.2.2.

How to parse date and time in different columns during CSV import?

More Ways To Control Your Costs With Archived Metrics | Register for Tech Talk

.conf24 | Personalize your .conf experience with Learning Paths!

Threat Hunting Unlocked: How to Uplevel Your Threat Hunting With the PEAK Framework ...