I have a script that queries a database and outputs the results to a csv file. When the file is finished being written to, the csv file is moved to a monitored directory and eaten by a universal forwarder. This process happens twice a day, once at 1200 and again at 2400. The first file (at 1200), is received by our indexers fine. The second file however always looks similar to this:
\x002\x000\x001\x001\x00-\x000\x004\x00-\x001\x009\x00 \x001\x002\x00:\x000\x000\x00:\x000\x001\x00.\x009\x001\x007\x000\x000\x000\x000\x000\x000\x00,\x00F\x00i\x00l\x00e\x00 \x00C\x00o\x00p\x00y\x00,\x005\x000\x001\x003\x000\x004\x006\x006\x003\x00,\x00c\x00f\x00s\x00l\x00o\x00u\x0
Process is all the same, script, database, directories, query, etc. The only thing that is different is the time that the script is executed. Thoughts?
That's what Splunk does with characters that are outside of the charset. The default charset is utf-8. I'm guessing the files are not utf-8 and sometimes it's okay, and sometimes it isn't?
You can try other charsets in props.conf. The most common I've seen outside utf-8 is UTF-16LE.
http://www.splunk.com/base/Documentation/4.2.1/Data/Configurecharactersetencoding
That's what Splunk does with characters that are outside of the charset. The default charset is utf-8. I'm guessing the files are not utf-8 and sometimes it's okay, and sometimes it isn't?
You can try other charsets in props.conf. The most common I've seen outside utf-8 is UTF-16LE.
http://www.splunk.com/base/Documentation/4.2.1/Data/Configurecharactersetencoding