I'm migrating from a Windows server to a Linux machine, and I'm finding that the Linux machine isn't reading a specific file. The file is UTF16-LE per the file command:
[root@bpoundiag01 xml]# file test.xml
test.xml: Little-endian UTF-16 Unicode text, with very long lines
Why isn't this working?
This is due to a difference of the default CHARSET setting in the props.conf in Windows vs. Linux.
By default, the CHARSET in Windows is set to AUTO, and will determine which CHARSET to use.
In Linux, it defaults to UTF-8.
The fix is to add the following to the props.conf:
[mysourcetype]
CHARSET=UTF16-LE