I'm trying to make a datatype for a specific kind of CSV data seen by Splunk. Here's an example of the individual data that Splunk sees, stored as CSV for importing:
# address,alternativeid,alternativeid_restriction,asn,asn_desc,assessment,cc,confidence,description,detecttime,guid,id,portlist,prefix,protocol,purpose,rdata,relatedid,relatedid_restriction,reporttime,restriction,rir,severity
1.2.3.4,some-url-here,public,1234,Description of 1234,scanner,CN,85,ssh,2014-04-23T04:27:21Z,everyone,1acb6224-dde9-4465-a34c-32283a130c00,22,1.2.3.0/18,6,mitigation,,,,2014-04-23T02:31:15Z,need-to-know,APNIC,medium
There are two timestamps, here. When using this regex, it finds the second timestamp: \d\d\d\d-\d\d-\d\d[A-Z]\d\d:\d\d:\d\dZ
What I need it to do is to read the first timestamp, which is the detection time for that specific data rather than the reported time for it. Can anyone help me figure out how to make Splunk detect the first timestamp, and only that first timestamp? Note that where it says "ssh" it will not always be "ssh" so you can't use that as part of the detection.
props.conf
TIME_PREFIX = ^(.*?,){9}
TIME_FORMAT = %Y-%m-%dT%H:%M:%S
MAX_TIMESTAMP_LOOKAHEAD = 20
props.conf
TIME_PREFIX = ^(.*?,){9}
TIME_FORMAT = %Y-%m-%dT%H:%M:%S
MAX_TIMESTAMP_LOOKAHEAD = 20
Works perfectly, thanks again!
Also, check out: http://docs.splunk.com/Documentation/Splunk/6.0.3/Admin/Propsconf for the section entitled: Structured Data Header Extraction and configuration. There are many new options for handling csv data where you can throw out the header, manage the column names, etc.
Updated the regex above to match the requirement. I didn't know that you had examples where nothing would be between the commas.
That doesn't get caught because the regex requires a char between the commas. Replace the plus with an asterisk.
I could, if the system would accept the bloody captcha entry... >.< 91.121.201.180,not-a-url//blah.foo.bar/lists/date_all.txt,public,,,scanner,,85,ssh,2014-04-22T11:46:41Z,everyone,44e05139-a0d2-492a-b9bf-674daf81194e,22,,6,mitigation,,,,2014-04-23T02:31:30Z,need-to-know,,medium
is an example of something that's not caught.
The regex searches past the 9th comma and does not care about the word SSH or anything specific. Can you update the sample log with a few lines where the answer does not work for you?
There's an anomaly in some of the data, and I've added that to the question for additional help if you can give it. Otherwise, this works perfectly.