Getting Data In

Transforms before or after detection of timestamp in props.conf

phoenixdigital
Builder

Hi All,

I am getting some annoying messages in splunkd.log

03-20-2014 15:47:27.631 +1000 WARN  DateParserVerbose - Failed to parse timestamp. Defaulting to timestamp of previous event (Thu Mar 20 16:45:00 2014). Context: source::/opt/mydata/PUBLIC_P5MIN_201403201550_20140320154535.CSV|host::amo-web|p5_reports|38558

Now I know what this error means but it doesnt really fit with my data. But I suspect I know what its occurring I just want to stop it.

So I have my CSV data file which is of the following format

I,P5MIN,LOCAL_PRICE,1,RUN_DATETIME,DUID,INTERVAL_DATETIME,LOCAL_PRICE_ADJUSTMENT,LOCALLY_CONSTRAINED,LASTCHANGED
D,P5MIN,LOCAL_PRICE,1,"2014/03/19 12:00:00",DATA1,"2014/03/19 12:00:00",0,0,"2014/03/19 11:55:29"
D,P5MIN,LOCAL_PRICE,1,"2014/03/19 12:00:00",DATA1,"2014/03/19 12:05:00",0,0,"2014/03/19 11:55:29"
D,P5MIN,LOCAL_PRICE,1,"2014/03/19 12:00:00",DATA1,"2014/03/19 12:10:00",0,0,"2014/03/19 11:55:29"
D,P5MIN,LOCAL_PRICE,1,"2014/03/19 12:00:00",DATA1,"2014/03/19 12:15:00",0,0,"2014/03/19 11:55:29"
D,P5MIN,LOCAL_PRICE,1,"2014/03/19 12:00:00",DATA1,"2014/03/19 12:20:00",0,0,"2014/03/19 11:55:29"
D,P5MIN,LOCAL_PRICE,1,"2014/03/19 12:00:00",DATA1,"2014/03/19 12:25:00",0,0,"2014/03/19 11:55:29"
I,P5MIN,REGIONSOLUTION,4,RUN_DATETIME,INTERVAL_DATETIME,REGIONID,RRP
D,P5MIN,REGIONSOLUTION,4,"2014/03/19 12:00:00","2014/03/19 12:00:00",STATE1,54.07
D,P5MIN,REGIONSOLUTION,4,"2014/03/19 12:00:00","2014/03/19 12:05:00",STATE1,53.8101
D,P5MIN,REGIONSOLUTION,4,"2014/03/19 12:00:00","2014/03/19 12:10:00",STATE1,53.8101
D,P5MIN,REGIONSOLUTION,4,"2014/03/19 12:00:00","2014/03/19 12:15:00",STATE1,53.8101

Now as you can see there are two sets of data in this file. I am only interested in the last section of data to go into Splunk.

This is achieved with the following props.conf

[p5_reports]
KV_MODE = none
SHOULD_LINEMERGE = false
TRANSFORMS-filterprices = setnull,getFiveMinutePrices
REPORT-extracts = fiveMinuteCsvExtract
TIME_PREFIX=D,P5MIN,REGIONSOLUTION,[^,]*,[^,]*
TIME_FORMAT=%y/%m/%d %H:%M:%S

and associated transforms.conf

[setnull]
REGEX = .
DEST_KEY = queue
FORMAT = nullQueue

[getFiveMinutePrices]
REGEX = ^D,P5MIN,REGIONSOLUTION,(.*)
DEST_KEY = queue
FORMAT = indexQueue

[fiveMinuteCsvExtract]
DELIMS = ","
FIELDS = "I","P5MIN","REGIONSOLUTION","4","RUN_DATETIME","INTERVAL_DATETIME","REGIONID","RRP"

Now this all works fine and my data comes in and _time is associated with the second time field INTERVAL_DATETIME.

But my logfiles are FULL of these

03-20-2014 15:47:27.631 +1000 WARN  DateParserVerbose - Failed to parse timestamp. Defaulting to timestamp of previous event (Thu Mar 20 16:45:00 2014). Context: source::/opt/mydata/PUBLIC_P5MIN_201403201550_20140320154535.CSV|host::amo-web|p5_reports|38558

So is props.conf running first and generating these errors BEFORE I have filtered out only the stuff I want?

ie at what point is the timestamp looked for?

  • after transforms?
  • before tranforms?

And for bonus points it will be near impossible to extract both these types of data into seperate sourcetypes as the _times I want will be in different places?

0 Karma
1 Solution

kristian_kolb
Ultra Champion

Timestamps are extracted before transforms.

Maybe you can craft a more complex regex for TIME_PREFIX?

TIME_PREFIX= ^([^,]*,){5}(\w+,)?\"

Which in theory (not tested it) should make the 6th element optional. In the example above \w is used for matching this part. Adjust as needed.

Hope this helps,

K

View solution in original post

kristian_kolb
Ultra Champion

Timestamps are extracted before transforms.

Maybe you can craft a more complex regex for TIME_PREFIX?

TIME_PREFIX= ^([^,]*,){5}(\w+,)?\"

Which in theory (not tested it) should make the 6th element optional. In the example above \w is used for matching this part. Adjust as needed.

Hope this helps,

K

phoenixdigital
Builder

Actually it might be possible with this in transforms.conf

[]
REGEX =
FORMAT = sourcetype::
DEST_KEY = MetaData:Sourcetype

http://docs.splunk.com/Documentation/Splunk/6.0.2/Data/Advancedsourcetypeoverrides

But I am straying far from my original question now 🙂

0 Karma

phoenixdigital
Builder

This is exactly what I came here to post but you beat me to it.

The data file example I provided about is only an example. There are actually 6 header rows and I am getting 6 errors.

So yes there is nothing really I can do about these errors I just need to live with them.

Thanks for the regexp tip though.

I dont think I will be able to extract both sets of data into different sourcetypes unless transforms.conf allows me to override a sourcetype setting for a particular event if it matches a particular regexp.

0 Karma

kristian_kolb
Ultra Champion

Additionally, if the header rows are part of the file they will also generate these errors (since they do not contain any timestamp). Perhaps you should change your nullQueue:ing a bit to drop them too.

props

[p5_reports]
TRANSFORMS-filterprices = setnull,getFiveMinutePrices, drop5mHeader

transforms

[setnull]
REGEX = .
DEST_KEY = queue
FORMAT = nullQueue

[getFiveMinutePrices]
REGEX = REGIONSOLUTION
DEST_KEY = queue
FORMAT = indexQueue

[drop5mHeader]
REGEX = REGIONID
DEST_KEY = queue
FORMAT = nullQueue

SEDCMD in props is an alternative for removing headers.

K

0 Karma
Get Updates on the Splunk Community!

What's new in Splunk Cloud Platform 9.1.2312?

Hi Splunky people! We are excited to share the newest updates in Splunk Cloud Platform 9.1.2312! Analysts can ...

What’s New in Splunk Security Essentials 3.8.0?

Splunk Security Essentials (SSE) is an app that can amplify the power of your existing Splunk Cloud Platform, ...

Let’s Get You Certified – Vegas-Style at .conf24

Are you ready to level up your Splunk game? Then, let’s get you certified live at .conf24 – our annual user ...