Getting Data In

Splunk fails monitoring a log file...

lpolo
Motivator

I have a log file that is a text file. Splunk does not monitor this file because it finds it as a binary file. The following linux command shows the contrary:

file /usr/local/rex/azkaban/logs/azkaban.log
/usr/local/rex/azkaban/logs/azkaban.log: ASCII text, with very long lines

This is the log splunkd.log is reporting:

10-22-2012 17:53:21.733 +0000 WARN FileClassifierManager - The file '/usr/local/rex/azkaban/logs/azkaban.log' is invalid. Reason: binary
10-22-2012 17:53:21.734 +0000 INFO TailingProcessor - Ignoring file '/usr/local/rex/azkaban/logs/azkaban.log' due to: binary

These are the first 2 lines of the file in question and I do not see any bad encoded ASCII character or any file magic number that may indicate the file is binary.

0000000: 3230 3132 2d31 302d 3234 2030 393a 3536 2012-10-24 09:56 
0000010: 3a33 332c 3233 3320 494e 464f 2020 5b54 :33,233 INFO [T

The question remains without answer in this forum.

What are the steps splunk use to identify if the file is binary? Use as example the man pages of "file" unix command. It clearly explains what I am looking for in this question. In this way, I can solve this problem from its root.

Why does splunk report the file in question is a binary file?
How can this problem be solved?

This issue has been addressed previously. Example:
http://splunk-base.splunk.com/answers/7370/splunk-thinks-text-file-is-binary

Thanks,

Lp

Tags (1)
1 Solution

jbsplunk
Splunk Employee
Splunk Employee

If the data isn't text, it's binary. If thats the case, you need this setting in props.conf:

#******************************************************************************
# Binary file configuration
#******************************************************************************

NO_BINARY_CHECK = [true|false]
* When set to true, Splunk processes binary files.
* Can only be used on the basis of [<sourcetype>], or [source::<source>], not [host::<host>].
* Defaults to false (binary files are ignored).

http://docs.splunk.com/Documentation/Splunk/5.0/admin/Propsconf

The data you provided in the sample clearly isn't text data, which means it will be considered as 'binary', irrespective of what is contained in the file.

View solution in original post

lpolo
Motivator

This method allowed me to solve this problem.

1) If Splunk identifies a non ASCI character in any event it will flag the file as binary and it will log an event in splunkd.log as follow:

10-22-2012 17:53:21.734 +0000 INFO TailingProcessor - Ignoring file '/usr/local/rex/azkaban/logs/azkaban.log' due to: binary

2) To identify non ASCI characters you can use the following linux command line. The non ASCI characters will be higlighted.

grep --color='auto' -P -n -r "[\x80-\xff]" azkaban.log

3) Solution:
Use the "Binary file configuration" in props.conf as presented in the previous answer.

Regards,
Lp

jbsplunk
Splunk Employee
Splunk Employee

If the data isn't text, it's binary. If thats the case, you need this setting in props.conf:

#******************************************************************************
# Binary file configuration
#******************************************************************************

NO_BINARY_CHECK = [true|false]
* When set to true, Splunk processes binary files.
* Can only be used on the basis of [<sourcetype>], or [source::<source>], not [host::<host>].
* Defaults to false (binary files are ignored).

http://docs.splunk.com/Documentation/Splunk/5.0/admin/Propsconf

The data you provided in the sample clearly isn't text data, which means it will be considered as 'binary', irrespective of what is contained in the file.

lpolo
Motivator

Of course Splunk does not use the "file" command to determine if a file to be monitored is binary. I am presenting as example the documentation of Linux "file" command. It clearly shows the steps it executes in order to determine the type of file. I am presenting my question again:
What are the steps splunk use to identify if the file is binary?

0 Karma

jbsplunk
Splunk Employee
Splunk Employee

We don't use the unix 'file' command, so that comparison is invalid. We don't do anything of that sort. We try to read the data, and if it isn't text, we fail and consider the file binary.

0 Karma

lpolo
Motivator

These are the first 2 lines of the file in question and I do not see any bad encoded ASCII character or any file magic number that may indicate the file is binary.

0000000: 3230 3132 2d31 302d 3234 2030 393a 3536 2012-10-24 09:56
0000010: 3a33 332c 3233 3320 494e 464f 2020 5b54 :33,233 INFO [T

The question remains without answer in this forum.

What are the steps splunk use to identify if the file is binary? Use as example the man pages of "file" unix command. It clearly explains what I am looking for in this question.

0 Karma

jbsplunk
Splunk Employee
Splunk Employee

because the ASCII data looks like binary data to Splunk when it tries to read the beginning of the file, and as such, the file is ignored.

0 Karma

lpolo
Motivator

Why does splunk report the file in question is a binary file?

0 Karma
Get Updates on the Splunk Community!

Introducing the 2024 SplunkTrust!

Hello, Splunk Community! We are beyond thrilled to announce our newest group of SplunkTrust members!  The ...

Introducing the 2024 Splunk MVPs!

We are excited to announce the 2024 cohort of the Splunk MVP program. Splunk MVPs are passionate members of ...

Splunk Custom Visualizations App End of Life

The Splunk Custom Visualizations apps End of Life for SimpleXML will reach end of support on Dec 21, 2024, ...