Getting Data In

possible file input defect

jgauthier
Contributor

Greetings,

I've noticed that one of my input files was not reading into splunk entirely. (ver 4.2 96430)
After doing some troubleshooting, it appears that the file is read short, and I can reproduce this based on the contents of the file.

For instance, I have inputting this data:

Host=SERVERNAME
Source=Broadcom BCM5709S NetXtreme II GigE [NDIS VBD Client]
BytesReceivedPerSec="3362"
BytesSentPerSec="0"
BytesTotalPerSec="3362"
Description=""
Frequency_Object=""
Frequency_PerfTime=""
Frequency_Sys100NS=""
OutputQueueLength="0"
CurrentBandwidth="1000000000"
PacketsOutboundDiscarded="0"
PacketsOutboundErrors="0"
PacketsPerSec="24"
PacketsReceivedDiscarded="0"
PacketsReceivedErrors="0"
PacketsReceivedNonUnicastPerSec="13"
PacketsReceivedPerSec="24"
PacketsReceivedUnicastPerSec="10"
PacketsReceivedUnknown="0"
PacketsSentNonUnicastPerSec="0"
PacketsSentPerSec="0"
PacketsSentUnicastPerSec="0"
Timestamp_Object=""
Timestamp_PerfTime=""
Timestamp_Sys100NS=""

Splunk only shows the data up until the line that reads:
CurrentBandwidth="1000000000"

The reading is aborted at the line. If I move the line up, or down, it follows the line.
If I change the data in the line, even removing a single 0, then splunk reads the data correctly in its entirety.

I'm pretty sure this is a defect, but I don't know how to open a bug report. Hopefully this is sufficient.

Other data that might be relevant is that I am putting this into a directory that is a sinkhole, s it's batch inputted.

Let me know what other information may be useful to supply.

Tags (3)
1 Solution

gkanapathy
Splunk Employee
Splunk Employee

actually it's probably getting split into a new event based on the BREAK_ONLY_BEFORE_DATE default rule. The number is read as an epoch timestamp, and the data is getting indexed but with a time range outside where you're looking. a solution would be a to specify an explicit TIME_FORMAT, or DATETIME_CONFIG = CURRENT, or if you're trying to index the entire file as a single event (vs one line per event), set SHOULD_LINEMERGE = false and LINE_BREAKER = (?!)

View solution in original post

gkanapathy
Splunk Employee
Splunk Employee

actually it's probably getting split into a new event based on the BREAK_ONLY_BEFORE_DATE default rule. The number is read as an epoch timestamp, and the data is getting indexed but with a time range outside where you're looking. a solution would be a to specify an explicit TIME_FORMAT, or DATETIME_CONFIG = CURRENT, or if you're trying to index the entire file as a single event (vs one line per event), set SHOULD_LINEMERGE = false and LINE_BREAKER = (?!)

jrodman
Splunk Employee
Splunk Employee

I would not recommend LINE_BREAKER = (?!)
Instead, please use something like LINE_BREAKER = ()zqsxtp-will-not-be-present In other words, a paren pair, and text which will not be present in your event, preferably starting with a rare character.

0 Karma

jgauthier
Contributor

Wow. It was reading it as a time stamp. That's really interesting. Those adjustments allowed the data to process correctly. Thanks for the advice.

0 Karma

jgauthier
Contributor

This is definitely a defect. I have been able to reproduce this repeatedly on large numbers starting with a 1. for instance, all of these lines, in different files cause splunk to stop processing:
Timestamp_PerfTime="1309871324971"
CommitLimit="103056531456"
AvgDiskQueueLength="18446743927822183616"

However, if I change the leading digit to a 2, or other number the data loads correctly.

I've put in a bug, but didn't realize this was the problem, so I am going to try and update it.

0 Karma

jgauthier
Contributor

It shows the truncated data as well.

0 Karma

Simeon
Splunk Employee
Splunk Employee

What happens when you "view source" for the event?

0 Karma
Get Updates on the Splunk Community!

Observability | Use Synthetic Monitoring for Website Metadata Verification

If you are on Splunk Observability Cloud, you may already have Synthetic Monitoringin your observability ...

More Ways To Control Your Costs With Archived Metrics | Register for Tech Talk

Tuesday, May 14, 2024  |  11AM PT / 2PM ET Register to Attend Join us for this Tech Talk and learn how to ...

.conf24 | Personalize your .conf experience with Learning Paths!

Personalize your .conf24 Experience Learning paths allow you to level up your skill sets and dive deeper ...