I have a log file that is tab delimited. It has a field called "date" and a field called "time" next to each other.
This is the format of the fields:
2013-10-24 11:40:23
The issue I'm having is that sometimes another field such as "8.1.5"(just some random value) that does not match the TIME_FORMAT shows up in the user agent string and somehow takes control over the date and time fields, thus producing wrong date/time on those records.(in this case August 1st 2005)
I have tried TIME_PREFIX=[\t]
and leaving off the TIME_PREFIX, no change.
I have tried:
TIME_FORMAT=%Y-%m-%d%n%H:%M:%S
TIME_FORMAT=%Y-%m-%d%t%H:%M:%S
TIME_FORMAT=%Y-%m-%d<tab>%H:%M:%S
Here is my props.conf:
SHOULD_LINEMERGE=false
TZ=GMT
pulldown_type=true
NO_BINARY_CHECK=1
TIME_PREFIX=[\t]
MAX_TIMESTAMP_LOOKAHEAD=500
TIME_FORMAT=%Y-%m-%d%n%H:%M:%S
REPORT-isawebw3c=isawebw3c
transforms.conf:
[isawebw3c]
DELIMS = "\t"
FIELDS="c-ip","cs-username","c-agent","sc-authenticated","date","time","s-svcname","s-computername","cs-referred","r-host","r-ip","r-port","time-taken","cs-bytes","sc-bytes","cs-protocol","cs-transport","s-operation","cs-uri","cs-mime-type","s-object-source","sc-status","s-cache-info","rule","FilterInfo","cs-Network","sc-Network","error-info","action","GMT-Time","AuthenticationServer"
Have you considered using REGEX in your transforms.conf? Something like:
REGEX=(?<c-ip>[^\t]?)\t(?<cs-username>[^\t]?)\t...\t(?<date>[^\t]>)...
Have you considered using REGEX in your transforms.conf? Something like:
REGEX=(?<c-ip>[^\t]?)\t(?<cs-username>[^\t]?)\t...\t(?<date>[^\t]>)...
I just came to that answer as well 🙂 thanks for your help
TIME_PREFIX=^[^\t]+\t[^\t]+\t[^\t]+\t[^\t]+\t
It worked perfectly
What you describe should be covered by [^\t]+\t[^\t]+\t[^\t]+\t[^\t]+\t
. Can you share some sample data so I can experiment with regex strings?
This is very close
^\S+\t\S+\t\S+\t\S+\t
However, there are spaces within the user agent strings that is the issue. Really it needs to be ^ non-tab, tab, non-tab, tab, non-tab, tab, non-tab, tab
Have you tried [\S\t\S\t\S\t\S\t]? I think \w is failing because of non-word characters (like '.') in one or more of your fields.
Unfortunately this did not end up working. It still picked up the .. 10.8.5 within the c-agent field which is a mac computer operating system. It seemed to work on import as it was only highlighting the date fields once I put in that regex.
[^\w*\t\w*\t\w*\t\w*\t]
unless anyone can tell me different seems to have done the trick. Thanks for your help in getting me to the right answer!
I think I get it now. What you need is a MIN_TIMESTAMP_LOOKAHEAD-type of feature where Splunk will look for times x characters into the event. Until we have one of those, can you make your TIME_PREFIX string more specific? Maybe "[^\t]\t[^\t]\t[^\t]\t[^\t]\t"?
Oh, the date and time fields parse just fine into the "date" and "time" fields.. the issue is that other fields show up in the _time field.. if they appear before the date/time fields. c-agent sometimes gets random IP's or random dates or random version numbers in it from a user's user agent string. Instead of the Timestamp using the date and time fields with the format I gave it, it grabs those random numbers and converts them into a date.
I suspect the TIME_FORMAT config doesn't understand you have a tab character within your time string. You also said that other fields sometimes appears in your date and time fields so maybe your transform isn't always working. Maybe a REGEX will work better or maybe it's Monday and my brain isn't at full steam yet. 🙂
ok so how would that get me the right timestamp? My transform is working just fine.