Splunk Search

Lookup files with foreign characters

gpburgett
Splunk Employee
Splunk Employee

I am setting up an app for a financial customer in Korea. They are using a standardized business reporting language that is all in English. I've indexed it and extracted all the necessary fields, but the customer needs the terms of some fields to be translated into Korean. I've created a csv file with all of the English terms and their Korean equivalents and put it under system>lookups, but when I restart Splunk and run the search on that sourcetype the following message comes up:

[EventsViewer module] Input is not proper UTF-8, indicate encoding ! Bytes: 0xB1 0xB8 0xBA 0xD0, line 59, column 8

I tried editing the charset in props.conf, and I've changed the format of the file to UTF-8 and even tried using the Korean character set that splunk supports, but I still get the same message. Does Splunk not support foreign character lookups? or am I missing something in my configurations?

Tags (2)
1 Solution

mitch_1
Splunk Employee
Splunk Employee

The CHARSET in props.conf only applies to index-time processing. i.e. it's the character set that the data Splunk is indexing is in. It doesn't affect anything at search time.

Do you know what character set your CSV file is using (UTF-16? KSC-5601?) Probably you'll need to convert it to UTF-8. If you're on UNIX you can use the system "iconv" utility to do this. I'm sure there are similar utilities available for windows. Also most editors have options for saving a file in a particular encoding.

View solution in original post

mitch_1
Splunk Employee
Splunk Employee

The CHARSET in props.conf only applies to index-time processing. i.e. it's the character set that the data Splunk is indexing is in. It doesn't affect anything at search time.

Do you know what character set your CSV file is using (UTF-16? KSC-5601?) Probably you'll need to convert it to UTF-8. If you're on UNIX you can use the system "iconv" utility to do this. I'm sure there are similar utilities available for windows. Also most editors have options for saving a file in a particular encoding.

gpburgett
Splunk Employee
Splunk Employee

Thanks for the tip. It turned out that even though the text editor I was using was set for UTF-8 it wasn't converting the file properly. I used a specialized converter to change it to UTF-8 and now it works fine.

0 Karma
Get Updates on the Splunk Community!

What’s New in Splunk App for PCI Compliance 5.3.1?

The Splunk App for PCI Compliance allows customers to extend the power of their existing Splunk solution with ...

Extending Observability Content to Splunk Cloud

Register to join us !   In this Extending Observability Content to Splunk Cloud Tech Talk, you'll see how to ...

What's new in Splunk Cloud Platform 9.1.2312?

Hi Splunky people! We are excited to share the newest updates in Splunk Cloud Platform 9.1.2312! Analysts can ...