Splunk Dev

What are these symbols in the log file Splunk refuses to ingest?

EricLloyd79
Builder

We have a Python script that just generates some dummy log files in order to test ingestion, we are using MapR and Splunk.
Every once in awhile we get an error saying Out of Memory when reading a very large line... which I thought was unusual since the events being put into the log file by the Python script are pretty small.
Then I found the offending line...
It looks like this ^@^@^@^@^@^@^@^@^@^@^@
And repeats for 2 million characters. In the one instances I found it it was at the beginning of a file that had just rolled over
Does anyone know what these symbols mean? Why would they be being put into our log file?

0 Karma
1 Solution

EricLloyd79
Builder

So, the issue ended up being with the file-rotation script which was written with Perl combined with some methods I was using in the Python script.

I fixed it by writing a new file rotation script in Python that is more efficient and the characters are no longer being put in.
So heres what I suspect was happening:
I had a python script writing dummy events by opening a file with w+ permissions before a loop that inserted the events. Then I closed it after the loop.
The perl script was attempting to file rotate the file while it was open via the Python script .. and I think it was working but putting a ton of artifacts in.
Thanks for your work and reply.

View solution in original post

0 Karma

EricLloyd79
Builder

So, the issue ended up being with the file-rotation script which was written with Perl combined with some methods I was using in the Python script.

I fixed it by writing a new file rotation script in Python that is more efficient and the characters are no longer being put in.
So heres what I suspect was happening:
I had a python script writing dummy events by opening a file with w+ permissions before a loop that inserted the events. Then I closed it after the loop.
The perl script was attempting to file rotate the file while it was open via the Python script .. and I think it was working but putting a ton of artifacts in.
Thanks for your work and reply.

0 Karma

trobbins_splunk
Splunk Employee
Splunk Employee

Are you saying the python script is adding these characters into the dummy log files? Or these characters show up in a Splunk search after ingesting a log file that does not have them?

0 Karma

EricLloyd79
Builder

Actually it was the script we used for file rotation that was adding them in.

0 Karma

acharlieh
Influencer

I'd be curious as to how the python script is handling character encoding and building the string and, and writing data out to the file, how you are performing the file rotation (and how that is playing with your script)... and what you were using to find the line...

If you're using vi to look at the file, ^@ is a visual representation of the NUL byte: https://unix.stackexchange.com/a/217011/7493

(and unfortunately I can think of quite a few ways that null bytes might be able to wind up in files..)

Converting to an answer, as I did find a different stack exchange answer that could be related to how your script is opening the file plus rotation settings causing null bytes: https://serverfault.com/a/522687/72374

Trying things out locally, using my favorite hex editor to create a file with "meow" followed by 31 null bytes followed by a newline character... I was able to upload this file to Splunk using a custom sourcetype that skipped the binary checks, and Splunk ingested my event for 128 bytes of license usage instead of 35 bytes as:

meow\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00

Also I should note, without the NO_BINARY_CHECK = true setting in my props, Splunk refused to ingest this file as well, and I am able to see events like the following in _internal:

05-30-2018 18:52:09.027 -0500 WARN FileClassifierManager - The file 'foo' is invalid. Reason: binary

0 Karma

somesoni2
SplunkTrust
SplunkTrust

What's your monitoring stanza for the log files generated by the script?

0 Karma

EricLloyd79
Builder

To be clear, I think its an artifact of the file rotation... has anyone seen anything like this related to a file rotation?

0 Karma
Get Updates on the Splunk Community!

What's new in Splunk Cloud Platform 9.1.2312?

Hi Splunky people! We are excited to share the newest updates in Splunk Cloud Platform 9.1.2312! Analysts can ...

What’s New in Splunk Security Essentials 3.8.0?

Splunk Security Essentials (SSE) is an app that can amplify the power of your existing Splunk Cloud Platform, ...

Let’s Get You Certified – Vegas-Style at .conf24

Are you ready to level up your Splunk game? Then, let’s get you certified live at .conf24 – our annual user ...