Getting Data In

How to parse date and time in different columns during CSV import?

felipetesta
Path Finder

Hello,

My raw CSV data looks like this (it's open data, no anonymization needed):

Anno;Reg;Articolo;Classe;Dat Infr;Loc1;Numciv1;Bisinternolettera;Loc2;Ora Infr;Sanzione accessoria;Tipoinfr;Numero Verbali;Sanzioni
2014;Legge 285/92;158;Autovettura;01/07/2014;SEBASTOPOLI/(CORSO);238;B;(Altri);0:00;(Mancante);Sosta;1;1
2014;Legge 285/92;142;Autovettura;01/07/2014;UNITA' D'ITALIA/(CORSO);96;AFR.EST;(Altri);0:01;(Mancante);Limiti di Velocita';1;1

Date is in column 5 "Dat Infr" and time in column 9 "Ora Infr". I tried most combinations with/out quotes, comma, semicolon, SHOULD_LINEMERGE true/false, but nothing has worked and the correct timestamp doesn't come up. On the other hand, I can get Splunk to recognize the date field if I leave it alone in TIMESTAMP_FIELDS and TIME_FORMAT. The proposed props.conf is:

[ csv ]
TIME_FORMAT=%d/%m/%Y,%H:%M
TZ=Europe/Rome
MAX_TIMESTAMP_LOOKAHEAD=500
TIMESTAMP_FIELDS=Dat Infr,Ora Infr
SHOULD_LINEMERGE=false
INDEXED_EXTRACTIONS=csv
NO_BINARY_CHECK=true
KV_MODE=none
disabled=false
pulldown_type=true

What am I missing? Even the official docs have a similar unanswered question at http://docs.splunk.com/Documentation/Splunk/6.2.2/Data/ConfigurePositionalTimestampExtraction

Thanks.

1 Solution

felipetesta
Path Finder

Answering my own question after half-a-day of research and trial&error.

I checked strptime() man page and found a "wildcard" field descriptor: %n, "Arbitraty whitespace". So, replacing the "comma" with "%n" into

TIME_FORMAT=%d/%m/%Y%n%H:%M

did the trick. Mind that CSV field names are case sensitive. Double quotes are not necessary if the CSV field name contains whitespace.

View solution in original post

felipetesta
Path Finder

Answering my own question after half-a-day of research and trial&error.

I checked strptime() man page and found a "wildcard" field descriptor: %n, "Arbitraty whitespace". So, replacing the "comma" with "%n" into

TIME_FORMAT=%d/%m/%Y%n%H:%M

did the trick. Mind that CSV field names are case sensitive. Double quotes are not necessary if the CSV field name contains whitespace.

felipetesta
Path Finder

Findings after further investigation.

Browsing further into the preview the timestamp gets recognized and importing the 2014 CSV works, except for first ~100 events. But CSV from year 2013 has the time in hh.mm (dot instead of colon) and I cannot import it.

Curiously, if I preview the extraction of date or time only, the TIME_FORMAT (%d/%m/%Y or %H.%M) is comfirmed to be correct. Concatenation doesn't seem to work (reliably). Swapping date and time neither.

I would prefer not to edit the source CSV with +700k lines.

Splunk version 6.2.2.

0 Karma
Get Updates on the Splunk Community!

What's new in Splunk Cloud Platform 9.1.2312?

Hi Splunky people! We are excited to share the newest updates in Splunk Cloud Platform 9.1.2312! Analysts can ...

What’s New in Splunk Security Essentials 3.8.0?

Splunk Security Essentials (SSE) is an app that can amplify the power of your existing Splunk Cloud Platform, ...

Let’s Get You Certified – Vegas-Style at .conf24

Are you ready to level up your Splunk game? Then, let’s get you certified live at .conf24 – our annual user ...