We're having to write some custom scripts to read/tail binary data, format them into something Splunk-able (k1=v1 k2=v2 k3=v3), and get them into Splunk. This will be running on a machine that will have a UF....
At this point, I see three options for the "get them into Splunk" end:
I know that #1 performs well (and is easy to troubleshoot and test), but leaves me with a small scratch file management problem (which is very manageable). Since I am so lazy that I don't even want to solve that problem, I was wondering if anyone had any experience as to how well #2 and #3 hold up when looking at 7 million events/1.1 Gb a day...
I don't know your scripts but probably you could directly send script output to Splunk launching your script in inputs.conf
[script://yourscript]
....
...
In this way you redirect script output directly in Splunk.
You have only to correctly set script permissions.
This is better then files.
Bye.
Giuseppe
why is it better than files?
Any of those methods will work fine at the scale of GB/day. Writing files at larger scales will run into normal universal forwarder issues such as ulimits, race conditions of reading large files before you log rotate them out of the UF observation etc. I am a fan of HTTP Event Collector (HEC) if you are already working in something like Python where your data is likely in a JSON payload format already. I have a simple threaded python class for it already. There customers with HEC up in the TB/day.
http://blogs.splunk.com/2015/12/11/http-event-collect-a-python-class/
Because writing script output in a file and monitoring it requests more time to execute.
Bye.
Giuseppe