Hello,
I am running a PowerShell script to download HTML code from two pages:
i.e.:
$wc.downloadstring("https://www.website.com/index.html") >C:\Output\Output.txt
$wc.downloadstring("https://www.website.com/pages/page1.html") >C:\Output\Output_Page1.txt
I then configured Splunk to monitor c:\output*
output.txt injests just fine, but when Output_Page1.txt injests, 2 things happen:
1) all you see is x's & 0's (you can click event actions -- show source)
2) the sourcetype appends -too_small
HTML pages aren't very different. Not sure why these 2 downloaded HTML sources are behaving differently.
Ideas?
Thanks in advance!
Hi,
Have you checked that your script produce UTF-8 ?
If not, you probably need to specify the charset associated to the sourcetype used in your monitor stanza so that splunk can convert the text to UTF-8
(so in inputs.conf , you monitor your file and used sourcetype1 (as a example) , in props.conf, you specify CHARSET for this sourcetype1 used.)
The "-too_small" suffix is added by Splunk when it doesn't have enough data to guess about the correct sourcetype. The fix for that is to provide a sourcetype in inputs.conf
so Splunk doesn't have to guess. This a Splunk Best Practice.
If you provide some sample data we may be able to help with the necessary props.conf
settings for the sourcetype.