All Apps and Add-ons

How can I tell splunk to IGNORE time stamps embedded in the event text?

msantich
Path Finder

Splunk (version 4.2.5) interprets timestamps that are embedded in an event as though it is a separate event. The target event is then effectively broken up (with the first "piece" terminating at the last line prior to the embedded time stamp) and another event is indexed for each subsequent embedded time stamp.

The event entry in the native log file looks like this:

2014-11-24 21:13:26.991 EventText EventText EventText EventText EventText EventText EventText
-xml-
-xmlxmlxmlxmlxmlxmxmlxmlxmlxml xmlxmlxmlxmlxmlxmxmlxmlxmlxml xmlxmlxmlxmlxmlxmxmlxmlxmlxml xmlxmlxmlxmlxmlxmxmlxmlxmlxml xmlxmlxmlxmlxmlxmxmlxmlxmlxml xmlxmlxmlxmlxmlxmxmlxmlxmlxml xmlxmlxmlxmlxmlxmxmlxmlxmlxml xmlxmlxmlxmlxmlxmxmlxmlxmlxml
xmlxmlxmlxmlxmlxmxmlxmlxmlxml xmlxmlxmlxmlxmlxmxmlxmlxmlxml xmlxmlxmlxmlxmlxmxmlxmlxmlxml xmlxmlxmlxmlxmlxmxmlxmlxmlxml-
-timeCreated-2014-11-24T21:13:26.914Z-/timeCreated-
-xml_tag-non-time VALUE-/xml_tag-
-xml_tag- non-time VALUE -/xml_tag-
-xml_tag- non-time VALUE -/xml_tag-
-xml_tag- non-time VALUE -/xml_tag-
-xml_tag- non-time VALUE -/xml_tag-
-xml_tag- non-time VALUE -/xml_tag-
-start-2014-11-24T21:21:00Z-/start-
-end-2014-11-24T21:21:00Z-/end-
-xml_tag-non-time VALUE-/xml_tag-
-xml_tag- non-time VALUE -/xml_tag-
-xml_tag- non-time VALUE -/xml_tag-
-xml_tag- non-time VALUE -/xml_tag-
-xml_tag- non-time VALUE -/xml_tag-
-xml_tag- non-time VALUE -/xml_tag-
etc...

Splunk reacts to this in a few different ways,

A: When splunk indexes the “single” event above, it actually creates 3 events.

  1. The first event looks like:

2014-11-24 21:13:26.991 EventText EventText EventText EventText EventText EventText EventText
-xml-
-xmlxmlxmlxmlxmlxmxmlxmlxmlxml xmlxmlxmlxmlxmlxmxmlxmlxmlxml xmlxmlxmlxmlxmlxmxmlxmlxmlxml xmlxmlxmlxmlxmlxmxmlxmlxmlxml xmlxmlxmlxmlxmlxmxmlxmlxmlxml xmlxmlxmlxmlxmlxmxmlxmlxmlxml xmlxmlxmlxmlxmlxmxmlxmlxmlxml xmlxmlxmlxmlxmlxmxmlxmlxmlxml
xmlxmlxmlxmlxmlxmxmlxmlxmlxml xmlxmlxmlxmlxmlxmxmlxmlxmlxml xmlxmlxmlxmlxmlxmxmlxmlxmlxml xmlxmlxmlxmlxmlxmxmlxmlxmlxml-
-timeCreated-2014-11-24T21:13:26.914Z-/timeCreated-
-xml_tag-non-time VALUE-/xml_tag-
-xml_tag- non-time VALUE -/xml_tag-
-xml_tag- non-time VALUE -/xml_tag-
-xml_tag- non-time VALUE -/xml_tag-
-xml_tag- non-time VALUE -/xml_tag-
-xml_tag- non-time VALUE -/xml_tag-

Note that splunk actually does “look past” the second time tag and simply includes it as part of the message (this is the behavior we want for all subsequent embedded time stamps) but this event ends when splunk hits the next embedded time tag.

  1. The second event looks like:

-start-2014-11-24T21:21:00Z-/start-

  1. The third event looks like:

-end-2014-11-24T21:21:00Z-/end-
-xml_tag-non-time VALUE-/xml_tag-
-xml_tag- non-time VALUE -/xml_tag-
-xml_tag- non-time VALUE -/xml_tag-
-xml_tag- non-time VALUE -/xml_tag-
-xml_tag- non-time VALUE -/xml_tag-
-xml_tag- non-time VALUE -/xml_tag-
etc...

B. sometimes Splunk creates just 2 events, then indicates “n lines omitted”

  1. The first event looks like:

2014-11-24 21:13:26.991 EventText EventText EventText EventText EventText EventText EventText
-xml-
-xmlxmlxmlxmlxmlxmxmlxmlxmlxml xmlxmlxmlxmlxmlxmxmlxmlxmlxml xmlxmlxmlxmlxmlxmxmlxmlxmlxml xmlxmlxmlxmlxmlxmxmlxmlxmlxml xmlxmlxmlxmlxmlxmxmlxmlxmlxml xmlxmlxmlxmlxmlxmxmlxmlxmlxml xmlxmlxmlxmlxmlxmxmlxmlxmlxml xmlxmlxmlxmlxmlxmxmlxmlxmlxml
xmlxmlxmlxmlxmlxmxmlxmlxmlxml xmlxmlxmlxmlxmlxmxmlxmlxmlxml xmlxmlxmlxmlxmlxmxmlxmlxmlxml xmlxmlxmlxmlxmlxmxmlxmlxmlxml-
-timeCreated-2014-11-24T21:13:26.914Z-/timeCreated-
-xml_tag-non-time VALUE-/xml_tag-
-xml_tag- non-time VALUE -/xml_tag-
-xml_tag- non-time VALUE -/xml_tag-
-xml_tag- non-time VALUE -/xml_tag-
-xml_tag- non-time VALUE -/xml_tag-
-xml_tag- non-time VALUE -/xml_tag-

  1. The second event looks like:

-start-2014-11-24T21:21:00Z-/start-
300 lines omitted….

Most of the events in the native log are typical events with a single event time stamp and some message text. Some events (like this one), have embedded xml, but no additional time stamps within the xml….these events are all indexed appropriately.

The events like the one here are few, but we need a way to tell splunk to “look beyond” other embedded time stamps when indexing events from this log without interfering with the proper indexing of other “normal events”

Thank you so much for any assistance offered.

MichaelS

0 Karma

somesoni2
Revered Legend

You would need to configure the timestamp recognition and event breaking configuration explicitly in props.conf On Indexers, so that Splunk would not parse the events with its default settings.

Based on the sample entries in the question, this should work

props.conf (on Indexer)

[YourSourceType]
BREAK_ONLY_BEFORE=^\d{4}-\d{1,2}-\d{1,2}
MAX_TIMESTAMP_LOOKAHEAD=30
NO_BINARY_CHECK=1
SHOULD_LINEMERGE=true
TIME_FORMAT=%Y-%m-%d %H:%M:%S.%3q
TIME_PREFIX=^
0 Karma

martin_mueller
SplunkTrust
SplunkTrust

I wouldn't recommend using that BREAK_ONLY_BEFORE expression. That breaks any time it finds something looking a bit like a date at the beginning of a line - that's too weak, it'd be better to remove that setting and let Splunk break on timestamps. That takes the entire TIME_FORMAT into consideration without having to specify the format twice.

0 Karma

msantich
Path Finder

Thank you so much somesoni2 and mmueller.
so to be clear the props.conf entry should look like this with the recommendation from somesoni1 as amended by recommendation from mmueller?

[YourSourceType]
MAX_TIMESTAMP_LOOKAHEAD=30
NO_BINARY_CHECK=1
SHOULD_LINEMERGE=true
TIME_FORMAT=%Y-%m-%d %H:%M:%S.%3q
TIME_PREFIX=^

0 Karma

msantich
Path Finder

Thanks all. Almost there but can you help me with one more iteration?
I've added the following to my props.conf per your recommendations and I'm now properly indexing the target event will all included lines. BUT, when the log contains other event entries with the EXACT same timestamp, splunk is not indexing them (they do show up if I select "show source" - but the events are not indexed:

[YourSourceType]
MAX_TIMESTAMP_LOOKAHEAD=30
NO_BINARY_CHECK=1
SHOULD_LINEMERGE=true
TIME_FORMAT=%Y-%m-%d %H:%M:%S.%3q
TIME_PREFIX=^

so in the raw log I have:
2014-12-10 03:31:14.843 TRACE [Thread-7] ...EventTxt
2014-12-10 03:31:14.844 TRACE [Thread-7] ...EventText
2014-12-10 03:31:14.844 TRACE [Thread-7] ...EventText- - LOONG XML with embedded time stamps -
2014-12-10 03:31:14.844 DEBUG [Thread-7] ...EventText
2014-12-10 03:31:14.844 TRACE [pool-2-thread-2] ...EventText
2014-12-10 03:31:14.844 TRACE [pool-2-thread-2] ...EventText
2014-12-10 03:31:14.846 TRACE [pool-2-thread-2] ...EventText

but splunk indexes ONLY
2014-12-10 03:31:14.843 TRACE [Thread-7] ...EventTxt
2014-12-10 03:31:14.844 TRACE [Thread-7] ...EventText- - LOONG XML with embedded time stamps -
2014-12-10 03:31:14.846 TRACE [pool-2-thread-2] ...EventText

Thanks so much for you continued help

0 Karma

somesoni2
Revered Legend

That looks correct.

0 Karma
Get Updates on the Splunk Community!

Introducing the 2024 SplunkTrust!

Hello, Splunk Community! We are beyond thrilled to announce our newest group of SplunkTrust members!  The ...

Introducing the 2024 Splunk MVPs!

We are excited to announce the 2024 cohort of the Splunk MVP program. Splunk MVPs are passionate members of ...

Splunk Custom Visualizations App End of Life

The Splunk Custom Visualizations apps End of Life for SimpleXML will reach end of support on Dec 21, 2024, ...