Getting Data In

Two timestamps, trying to index based on the second with TIME_PREFIX

supersleepwalke
Communicator

I have logs with two timestamps, one in UTC, one in local. I'm trying to index based on the second, because the first is buggy. I'm trying to use TIME_PREFIX to do this, but I can't get it to work for me.

I've tested the TIME_PREFIX string several ways to ensure that it matches. I've used command line grep, regex in the splunk search, and rex in the splunk search, and they all find the string. Additionally, if I manually cut the first time stamp out, Splunk recognizes and translates the other timestamp correctly. However, using TIME_PREFIX, the time always gets set to the time of ingest.

Example log line:
(Notice that what should be UTC time is in error. Wrong time zone and seconds field incorrect.)

2012-03-15 10:05:130 "15/Mar/2012:10:05:05 -0400" 192.XXX.XXX.XXX GET http://www.google.com/ 74.125.91.103 80 TCP_MISS 200 - - 34337 text/html 34176 DIRECT - - 50

indexes.conf

[proxy]
coldPath = $SPLUNK_DB/proxy/colddb
homePath = $SPLUNK_DB/proxy/db
thawedPath = $SPLUNK_DB/proxy/thaweddb

inputs.conf:

[batch:///var/log/proxy]
disabled = false
move_policy = sinkhole
host = ironport
index = proxy
sourcetype = ironport

props.conf

[ironport]
SHOULD_LINEMERGE = False
SEDCMD-10 = s/^\#.*//
TIME_PREFIX="^20\d{2}\-\d{2}\-\d{2} \d{2}\:\d{2}\:\d+"
1 Solution

supersleepwalke
Communicator

Solved.

The problem is my quotes in the TIME_PREFIX. Here's the correct string:

TIME_PREFIX=^20d{2}-d{2}-d{2} d{2}:d{2}:d+

The quotes are taken literally on that string. There are no quotes in what I'm trying to ignore. I just quoted the regex without thinking about it, since that's what I'm used to in the splunk web interface.

View solution in original post

0 Karma

supersleepwalke
Communicator

Solved.

The problem is my quotes in the TIME_PREFIX. Here's the correct string:

TIME_PREFIX=^20d{2}-d{2}-d{2} d{2}:d{2}:d+

The quotes are taken literally on that string. There are no quotes in what I'm trying to ignore. I just quoted the regex without thinking about it, since that's what I'm used to in the splunk web interface.

0 Karma

cvajs
Contributor

wow, if you post the data pls post the real data (sanitize is ok) but dont add chars.
the regex you have is way too long, shorten preface to
\d{3}\s+
simpler is faster

0 Karma

supersleepwalke
Communicator

Solved.

The problem is my quotes in the TIME_PREFIX. Here's the correct string:

TIME_PREFIX=^20\d{2}-\d{2}-\d{2} \d{2}:\d{2}:\d+

The quotes are taken literally on that string. There are no quotes in what I'm trying to ignore. I just quoted the regex without thinking about it, since that's what I'm used to in the splunk web interface.

FYI, to answer your other questions:

The "130" is a bug, verified with the vendor.

The second time stamp is accurate, in local time, with TZ indicated. The first time stamp is incorrect, which is why I'm trying to skip it.

0 Karma

jbsplunk
Splunk Employee
Splunk Employee

I don't think that you need to include the space and double quote. TIME_PREFIX tells Splunk to look after the pattern for a time stamp.

I think the problem here is probably 2 fold.

  1. you should include the space between the date and time, I don't know how Splunk would behave when you've got a literal space in a regex within props.conf.

    TIME_PREFIX = 20\d{2}-\d{2}-\d{2}\s+\d{2}:\d{2}:\d{3}

    1. TIME_PREFIX should be used with TIME_FORMAT, which doesn't appear to have been specified. I would suggest that you provide instructions for TIME_FORMAT as well(the space below is fine, in case you're wondering).

    TIME_FORMAT = %d/%b/%Y:%H:%M:%S %z

Additionally, I would recommend using MAX_TIMESTAMP_LOOKAHEAD to prevent splunk from looking beyond the time stamp boundaries.

MAX_TIMESTAMP_LOOKAHEAD = 27

So, to put it together:

[ironport]
SHOULD_LINEMERGE = False
SEDCMD-10 = s/^\#.*//
TIME_PREFIX = 20\d{2}\-\d{2}\-\d{2}\s+\d{2}\:\d{2}\:\d{3}
TIME_FORMAT = %d/%b/%Y:%H:%M:%S %z
MAX_TIMESTAMP_LOOKAHEAD = 27
0 Karma

jbsplunk
Splunk Employee
Splunk Employee

Time format does improve performance, so even if you don't need it, it won't hurt to have it in place. Glad to hear you got things working 🙂

0 Karma

supersleepwalke
Communicator

Once I get Splunk to ignore the bad timestamp at the beginning, it recognizes the second timestamp out-of-the-box. No TIME_FORMAT configuration is necesary.

0 Karma

kristian_kolb
Ultra Champion

I think you might have to include the two extra characters preceeding the second timestamp: the space and the double quote. And remove the leading double quote in the regex.

TIME_PREFIX= ^20\d{2}\-\d{2}\-\d{2} \d{2}\:\d{2}\:\d+\s"

Hope this helps,

Kristian

0 Karma

cvajs
Contributor

i just used " and it finds the whole 2nd timestamp
there's only one " prefacing the timestamp so i dont see any need to define all of the preface, plus its probably faster just using "

0 Karma

cvajs
Contributor

then props.conf shows (in GUI, etc)

# your settings
MAX_TIMESTAMP_LOOKAHEAD=50
NO_BINARY_CHECK=1
SHOULD_LINEMERGE=false
TIME_PREFIX="

0 Karma

cvajs
Contributor
  1. your raw data has wrong $TZ so Splunk doesnt know any different unless you tell it this source has specific $TZ.
  2. i used your raw data above, i then create a new source, preview the source, adjust the timestamp, i use quotes " as the preface and it highlights the 2nd timestamp including the -0400, and then you can choose UTC for TZ, etc. your 1st time "130" looks like ms and not sec. is this format a option from your ironport web appliance? i dont get that format from my ESA's using syslog.
0 Karma
Get Updates on the Splunk Community!

Webinar Recap | Revolutionizing IT Operations: The Transformative Power of AI and ML ...

The Transformative Power of AI and ML in Enhancing Observability   In the realm of IT operations, the ...

.conf24 | Registration Open!

Hello, hello! I come bearing good news: Registration for .conf24 is now open!   conf is Splunk’s rad annual ...

ICYMI - Check out the latest releases of Splunk Edge Processor

Splunk is pleased to announce the latest enhancements to Splunk Edge Processor.  HEC Receiver authorization ...