Getting Data In

How to parse date and time in different columns during CSV import?

felipetesta
Path Finder

Hello,

My raw CSV data looks like this (it's open data, no anonymization needed):

Anno;Reg;Articolo;Classe;Dat Infr;Loc1;Numciv1;Bisinternolettera;Loc2;Ora Infr;Sanzione accessoria;Tipoinfr;Numero Verbali;Sanzioni
2014;Legge 285/92;158;Autovettura;01/07/2014;SEBASTOPOLI/(CORSO);238;B;(Altri);0:00;(Mancante);Sosta;1;1
2014;Legge 285/92;142;Autovettura;01/07/2014;UNITA' D'ITALIA/(CORSO);96;AFR.EST;(Altri);0:01;(Mancante);Limiti di Velocita';1;1

Date is in column 5 "Dat Infr" and time in column 9 "Ora Infr". I tried most combinations with/out quotes, comma, semicolon, SHOULD_LINEMERGE true/false, but nothing has worked and the correct timestamp doesn't come up. On the other hand, I can get Splunk to recognize the date field if I leave it alone in TIMESTAMP_FIELDS and TIME_FORMAT. The proposed props.conf is:

[ csv ]
TIME_FORMAT=%d/%m/%Y,%H:%M
TZ=Europe/Rome
MAX_TIMESTAMP_LOOKAHEAD=500
TIMESTAMP_FIELDS=Dat Infr,Ora Infr
SHOULD_LINEMERGE=false
INDEXED_EXTRACTIONS=csv
NO_BINARY_CHECK=true
KV_MODE=none
disabled=false
pulldown_type=true

What am I missing? Even the official docs have a similar unanswered question at http://docs.splunk.com/Documentation/Splunk/6.2.2/Data/ConfigurePositionalTimestampExtraction

Thanks.

1 Solution

felipetesta
Path Finder

Answering my own question after half-a-day of research and trial&error.

I checked strptime() man page and found a "wildcard" field descriptor: %n, "Arbitraty whitespace". So, replacing the "comma" with "%n" into

TIME_FORMAT=%d/%m/%Y%n%H:%M

did the trick. Mind that CSV field names are case sensitive. Double quotes are not necessary if the CSV field name contains whitespace.

View solution in original post

felipetesta
Path Finder

Answering my own question after half-a-day of research and trial&error.

I checked strptime() man page and found a "wildcard" field descriptor: %n, "Arbitraty whitespace". So, replacing the "comma" with "%n" into

TIME_FORMAT=%d/%m/%Y%n%H:%M

did the trick. Mind that CSV field names are case sensitive. Double quotes are not necessary if the CSV field name contains whitespace.

felipetesta
Path Finder

Findings after further investigation.

Browsing further into the preview the timestamp gets recognized and importing the 2014 CSV works, except for first ~100 events. But CSV from year 2013 has the time in hh.mm (dot instead of colon) and I cannot import it.

Curiously, if I preview the extraction of date or time only, the TIME_FORMAT (%d/%m/%Y or %H.%M) is comfirmed to be correct. Concatenation doesn't seem to work (reliably). Swapping date and time neither.

I would prefer not to edit the source CSV with +700k lines.

Splunk version 6.2.2.

0 Karma
Get Updates on the Splunk Community!

More Ways To Control Your Costs With Archived Metrics | Register for Tech Talk

Tuesday, May 14, 2024  |  11AM PT / 2PM ET Register to Attend Join us for this Tech Talk and learn how to ...

.conf24 | Personalize your .conf experience with Learning Paths!

Personalize your .conf24 Experience Learning paths allow you to level up your skill sets and dive deeper ...

Threat Hunting Unlocked: How to Uplevel Your Threat Hunting With the PEAK Framework ...

WATCH NOWAs AI starts tackling low level alerts, it's more critical than ever to uplevel your threat hunting ...