Getting Data In

Tab Delimited Date Time Not Getting Correct Date/Time

aelliott
Motivator

I have a log file that is tab delimited. It has a field called "date" and a field called "time" next to each other.
This is the format of the fields:

2013-10-24  11:40:23

The issue I'm having is that sometimes another field such as "8.1.5"(just some random value) that does not match the TIME_FORMAT shows up in the user agent string and somehow takes control over the date and time fields, thus producing wrong date/time on those records.(in this case August 1st 2005)

I have tried TIME_PREFIX=[\t] and leaving off the TIME_PREFIX, no change.

I have tried:

TIME_FORMAT=%Y-%m-%d%n%H:%M:%S
TIME_FORMAT=%Y-%m-%d%t%H:%M:%S
TIME_FORMAT=%Y-%m-%d<tab>%H:%M:%S

Here is my props.conf:

SHOULD_LINEMERGE=false
TZ=GMT
pulldown_type=true
NO_BINARY_CHECK=1
TIME_PREFIX=[\t]
MAX_TIMESTAMP_LOOKAHEAD=500
TIME_FORMAT=%Y-%m-%d%n%H:%M:%S
REPORT-isawebw3c=isawebw3c

transforms.conf:

[isawebw3c]
DELIMS = "\t"
FIELDS="c-ip","cs-username","c-agent","sc-authenticated","date","time","s-svcname","s-computername","cs-referred","r-host","r-ip","r-port","time-taken","cs-bytes","sc-bytes","cs-protocol","cs-transport","s-operation","cs-uri","cs-mime-type","s-object-source","sc-status","s-cache-info","rule","FilterInfo","cs-Network","sc-Network","error-info","action","GMT-Time","AuthenticationServer"
Tags (2)
0 Karma
1 Solution

richgalloway
SplunkTrust
SplunkTrust

Have you considered using REGEX in your transforms.conf? Something like:

REGEX=(?<c-ip>[^\t]?)\t(?<cs-username>[^\t]?)\t...\t(?<date>[^\t]>)...
---
If this reply helps you, Karma would be appreciated.

View solution in original post

richgalloway
SplunkTrust
SplunkTrust

Have you considered using REGEX in your transforms.conf? Something like:

REGEX=(?<c-ip>[^\t]?)\t(?<cs-username>[^\t]?)\t...\t(?<date>[^\t]>)...
---
If this reply helps you, Karma would be appreciated.

aelliott
Motivator

I just came to that answer as well 🙂 thanks for your help
TIME_PREFIX=^[^\t]+\t[^\t]+\t[^\t]+\t[^\t]+\t

It worked perfectly

0 Karma

richgalloway
SplunkTrust
SplunkTrust

What you describe should be covered by [^\t]+\t[^\t]+\t[^\t]+\t[^\t]+\t. Can you share some sample data so I can experiment with regex strings?

---
If this reply helps you, Karma would be appreciated.

aelliott
Motivator

This is very close
^\S+\t\S+\t\S+\t\S+\t
However, there are spaces within the user agent strings that is the issue. Really it needs to be ^ non-tab, tab, non-tab, tab, non-tab, tab, non-tab, tab

0 Karma

richgalloway
SplunkTrust
SplunkTrust

Have you tried [\S\t\S\t\S\t\S\t]? I think \w is failing because of non-word characters (like '.') in one or more of your fields.

---
If this reply helps you, Karma would be appreciated.

aelliott
Motivator

Unfortunately this did not end up working. It still picked up the .. 10.8.5 within the c-agent field which is a mac computer operating system. It seemed to work on import as it was only highlighting the date fields once I put in that regex.

0 Karma

aelliott
Motivator

[^\w*\t\w*\t\w*\t\w*\t]
unless anyone can tell me different seems to have done the trick. Thanks for your help in getting me to the right answer!

0 Karma

richgalloway
SplunkTrust
SplunkTrust

I think I get it now. What you need is a MIN_TIMESTAMP_LOOKAHEAD-type of feature where Splunk will look for times x characters into the event. Until we have one of those, can you make your TIME_PREFIX string more specific? Maybe "[^\t]\t[^\t]\t[^\t]\t[^\t]\t"?

---
If this reply helps you, Karma would be appreciated.
0 Karma

aelliott
Motivator

Oh, the date and time fields parse just fine into the "date" and "time" fields.. the issue is that other fields show up in the _time field.. if they appear before the date/time fields. c-agent sometimes gets random IP's or random dates or random version numbers in it from a user's user agent string. Instead of the Timestamp using the date and time fields with the format I gave it, it grabs those random numbers and converts them into a date.

0 Karma

richgalloway
SplunkTrust
SplunkTrust

I suspect the TIME_FORMAT config doesn't understand you have a tab character within your time string. You also said that other fields sometimes appears in your date and time fields so maybe your transform isn't always working. Maybe a REGEX will work better or maybe it's Monday and my brain isn't at full steam yet. 🙂

---
If this reply helps you, Karma would be appreciated.
0 Karma

aelliott
Motivator

ok so how would that get me the right timestamp? My transform is working just fine.

0 Karma
Get Updates on the Splunk Community!

Detecting Remote Code Executions With the Splunk Threat Research Team

REGISTER NOWRemote code execution (RCE) vulnerabilities pose a significant risk to organizations. If ...

Observability | Use Synthetic Monitoring for Website Metadata Verification

If you are on Splunk Observability Cloud, you may already have Synthetic Monitoringin your observability ...

More Ways To Control Your Costs With Archived Metrics | Register for Tech Talk

Tuesday, May 14, 2024  |  11AM PT / 2PM ET Register to Attend Join us for this Tech Talk and learn how to ...