Getting Data In

Why does my indexed data appear as a series of x and o characters?

agoktas
Communicator

Hello,

I am running a PowerShell script to download HTML code from two pages:

i.e.:

$wc.downloadstring("https://www.website.com/index.html") >C:\Output\Output.txt
$wc.downloadstring("https://www.website.com/pages/page1.html") >C:\Output\Output_Page1.txt

I then configured Splunk to monitor c:\output*

output.txt injests just fine, but when Output_Page1.txt injests, 2 things happen:
1) all you see is x's & 0's (you can click event actions -- show source)
2) the sourcetype appends -too_small

HTML pages aren't very different. Not sure why these 2 downloaded HTML sources are behaving differently.

Ideas?

Thanks in advance!

0 Karma

maraman_splunk
Splunk Employee
Splunk Employee

Hi,

Have you checked that your script produce UTF-8 ?
If not, you probably need to specify the charset associated to the sourcetype used in your monitor stanza so that splunk can convert the text to UTF-8
(so in inputs.conf , you monitor your file and used sourcetype1 (as a example) , in props.conf, you specify CHARSET for this sourcetype1 used.)

0 Karma

richgalloway
SplunkTrust
SplunkTrust

The "-too_small" suffix is added by Splunk when it doesn't have enough data to guess about the correct sourcetype. The fix for that is to provide a sourcetype in inputs.conf so Splunk doesn't have to guess. This a Splunk Best Practice.
If you provide some sample data we may be able to help with the necessary props.conf settings for the sourcetype.

---
If this reply helps you, Karma would be appreciated.
0 Karma
Get Updates on the Splunk Community!

Announcing Scheduled Export GA for Dashboard Studio

We're excited to announce the general availability of Scheduled Export for Dashboard Studio. Starting in ...

Extending Observability Content to Splunk Cloud

Watch Now!   In this Extending Observability Content to Splunk Cloud Tech Talk, you'll see how to leverage ...

More Control Over Your Monitoring Costs with Archived Metrics GA in US-AWS!

What if there was a way you could keep all the metrics data you need while saving on storage costs?This is now ...