All Apps and Add-ons

Is captured stream data transcoded to UTF8 or is there any configuration to specify character code for certain data?

melonman
Motivator

Hi,

I am playing with Splunk app for Stream, and gathering Samba information between CentOS Samba server and Windows clients.

When I created a directory in Japanese language from Windows 8.1, the captured data is garbled, some are garbled and some are not. (depends on commands of SMB)

Here is the screenshot.

alt text

I came to a question if the captured data is transcoded into UTF8 or if there is any way to specify a character code for a particular stream data.

Is there any configuration for character set for stream capture data?

Thank you in advance..

0 Karma
1 Solution

jrodman
Splunk Employee
Splunk Employee

All data in splunk indexes is stored as UTF8.

However, for the log contents, you can specify the encoding to convert from, using CHARSET in props.conf.

On the third hand, though, for filenames, there is no such functionality. On UNIX we assume filenames are UTF8 and on windows we assume they are UTF16. On Windows this is pretty much always true, but on UNIX you can choose to present filenames in other encodings.

For your own sanity I strongly recommend you only use UTF8 filenames on modern Unix systems. However, if your use-case and goals require you to do otherwise, please tell us about this in a support ticket. This is a known limitation and there is an entry in the work database relating to this limitation, but understanding why it matters will help prioritize.

View solution in original post

jrodman
Splunk Employee
Splunk Employee

All data in splunk indexes is stored as UTF8.

However, for the log contents, you can specify the encoding to convert from, using CHARSET in props.conf.

On the third hand, though, for filenames, there is no such functionality. On UNIX we assume filenames are UTF8 and on windows we assume they are UTF16. On Windows this is pretty much always true, but on UNIX you can choose to present filenames in other encodings.

For your own sanity I strongly recommend you only use UTF8 filenames on modern Unix systems. However, if your use-case and goals require you to do otherwise, please tell us about this in a support ticket. This is a known limitation and there is an entry in the work database relating to this limitation, but understanding why it matters will help prioritize.

Get Updates on the Splunk Community!

Index This | I am a number, but when you add ‘G’ to me, I go away. What number am I?

March 2024 Edition Hayyy Splunk Education Enthusiasts and the Eternally Curious!  We’re back with another ...

What’s New in Splunk App for PCI Compliance 5.3.1?

The Splunk App for PCI Compliance allows customers to extend the power of their existing Splunk solution with ...

Extending Observability Content to Splunk Cloud

Register to join us !   In this Extending Observability Content to Splunk Cloud Tech Talk, you'll see how to ...