Getting Data In

Index entire file content as one event

sdkp03
Communicator

I am trying to index a file and i dont see why the events are broken. I have tried defining line breaker setting both at indexer and forwarded level as suggested in multiple articles but with no luck. Is there a way that would help me in identifying whats breaking events. Or is there any configuration that would over ride all settings and ensure that events are not broken. Any help would be much appreciated.

log file that is being indexed has content like below:
2020-03-10T11:20:27.456+1100: 687196.162: [Event1, 0.0207885 secs]
[Parallel Time: 19.8 ms, Workers: 4]
[Worker Start (ms): Min: 687196162.2, Avg: 687196162.3, Max: 687196162.3, Diff: 0.1]
[Ext Scanning (ms): Min: 0.9, Avg: 1.0, Max: 1.0, Diff: 0.1, Sum: 3.9]
[Update RS (ms): Min: 2.4, Avg: 2.4, Max: 2.6, Diff: 0.2, Sum: 9.7]
[Processed Buffers: Min: 3, Avg: 10.5, Max: 21, Diff: 18, Sum: 42]
[Scan RS (ms): Min: 6.8, Avg: 6.9, Max: 6.9, Diff: 0.1, Sum: 27.6]
[Code Root Scanning (ms): Min: 0.0, Avg: 0.0, Max: 0.0, Diff: 0.0, Sum: 0.0]
[Object Copy (ms): Min: 9.4, Avg: 9.4, Max: 9.5, Diff: 0.1, Sum: 37.7]
[Termination (ms): Min: 0.0, Avg: 0.0, Max: 0.0, Diff: 0.0, Sum: 0.0]
[Termination Attempts: Min: 1, Avg: 3.2, Max: 6, Diff: 5, Sum: 13]
[Worker Other (ms): Min: 0.0, Avg: 0.0, Max: 0.0, Diff: 0.0, Sum: 0.1]
[Worker Total (ms): Min: 19.7, Avg: 19.7, Max: 19.8, Diff: 0.1, Sum: 78.9]
[Worker End (ms): Min: 687196182.0, Avg: 687196182.0, Max: 687196182.0, Diff: 0.0]
[Code Root Fixup: 0.0 ms]
[Code Root Purge: 0.0 ms]
[Clear CT: 0.1 ms]
[Other: 0.8 ms]
[Choose CSet: 0.0 ms]
[Ref Proc: 0.2 ms]
[Ref Enq: 0.0 ms]
[Redirty Cards: 0.1 ms]
[Humongous Register: 0.1 ms]
[Humongous Reclaim: 0.0 ms]
[Free CSet: 0.2 ms]
[Eden: 44.0M(44.0M)->0.0B(44.0M) Survivors: 7168.0K->7168.0K Heap: 306.0M(1024.0M)->270.0M(1024.0M)]
[Times: user=0.08 sys=0.00, real=0.02 secs]

2020-03-10T11:20:38.710+1100: 687207.416: [Event2, 0.0204509 secs]

In splunk log file there are some warning messages around -
Failed to parse timestamp in first MAX_TIMESTAMP_LOOKAHEAD (128) characters of event. Defaulting to timestamp of previous event (Wed
Mar 11 13:42:59 2020).

Expectation: Splunk treats the first field in the format like - "2020-03-10T11:20:38.710+1100: 687207.416: " as date/timestamp and should not try to interpret other numbers as date/time.

Around the setup, we have a UF sending logs to Indexer.

Labels (2)
Tags (1)
0 Karma
1 Solution

gcusello
SplunkTrust
SplunkTrust

Hi @sdkp03,
without having a sample of your logs I can only think that in your logs there are some numbers or dates that Splunk parse as timestamps so it divides your file in different events, please try to set the TIME_PREFIX option in your props.conf.

If you could share a sample of your logs, I could help you better!

Ciao.
Giuseppe

View solution in original post

0 Karma

FrankVl
Ultra Champion

Is this bit in the same file: 2020-03-10T11:20:38.710+1100: 687207.416: [Event2, 0.0204509 secs] and that should go into a second event? Meaning: you don't actually want to ingest the whole file?

Try this in props.conf for the relevant sourcetype on your indexers:

TIME_PREFIX = ^
TIME_FORMAT = %Y-%m-%dT%H:%M:%S.%3N%z
MAX_TIMESTAMP_LOOKAHEAD = 28
TRUNCATE = 0
SHOULD_LINEMERGE = false
LINE_BREAKER = ([\r\n]+)\d+-\d+-\d+T
0 Karma

woodcock
Esteemed Legend

Try these settings on your Indexer or Heavy Forwarder:

TIME_PREFIX = ^
TIME_FORMAT = %Y-%m-%dT%H:%M:%S.%3N%z
MAX_TIMESTAMP_LOOKAHEAD = 28
TRUNCATE = 0
SHOULD_LINEMERGE = false
LINE_BREAKER = (?!)
0 Karma

FrankVl
Ultra Champion

Can you please share a sample of the log file, a screenshot or so of how it shows up broken in splunk and your current config. And also some info on the setup. Is it just a UF sending to Indexer, or is there a heavy forwarder involved?

0 Karma

sdkp03
Communicator

Apologies had not shared enough details. Added required details in question

0 Karma

gcusello
SplunkTrust
SplunkTrust

Hi @sdkp03,
without having a sample of your logs I can only think that in your logs there are some numbers or dates that Splunk parse as timestamps so it divides your file in different events, please try to set the TIME_PREFIX option in your props.conf.

If you could share a sample of your logs, I could help you better!

Ciao.
Giuseppe

0 Karma

sdkp03
Communicator

Apologies shared log in question now. Yes what you are suspecting is true in my case. TIME_PREFIX, am not sure what should i use in my case considering event starts with timestamp in yyyy-mm-dd pattern.

0 Karma

gcusello
SplunkTrust
SplunkTrust

Hi @sdkp03,
in your props.conf set:

[your_sourcetype]
TIME_PREFIX = ^
TIME_FORMAT = %Y-%m-%dT%H:%M:%S.%3N%z
SHOULD_LINEMERGE = true
MAX_TIMESTAMP_LOOKAHEAD = 28

Ciao.
Giuseppe

0 Karma

sdkp03
Communicator

Thanks, this worked like magic 🙂

0 Karma
Get Updates on the Splunk Community!

Index This | I am a number, but when you add ‘G’ to me, I go away. What number am I?

March 2024 Edition Hayyy Splunk Education Enthusiasts and the Eternally Curious!  We’re back with another ...

What’s New in Splunk App for PCI Compliance 5.3.1?

The Splunk App for PCI Compliance allows customers to extend the power of their existing Splunk solution with ...

Extending Observability Content to Splunk Cloud

Register to join us !   In this Extending Observability Content to Splunk Cloud Tech Talk, you'll see how to ...