I need some help with what I am trying to accomplish. I have many forwarders running and successfully sending log files to the indexers. The problem is that I want to add a script in the middle on the indexing side to parse the data in a more meaningful manner and strip away all the repetitive XML cruft.
So what I am asking is how to accomplish this. I have a nodejs script that parses the input statically, not streaming, and it's configured now to send to HEC on my local dev environment. I read about scripted and modular inputs and this seems like it would be a better way to go potentially but I'm confused still as to what I need to write to accept the forwarded data and parse it. Is ther documentation somewhere about the type of data sent via forwarders? Should it be Cooked vs Uncooked data? Does the script also have to parse the splunk _internal log files or can I direct specific sources via inputs.conf on the forwarder side? Does anyone have experience with running such a script on their indexers that will do this?
It gets tricky as soon as your changes are to complex to be done in an index time transform either on an indexer or an intermediate heavy forwarder. If you know your source systems well and receive the events from an input which reads a log file, you could probably change the input to a scripted input where a script reads the log file and only passes the desired parts to Splunk.
Mika Borner has written a blog about using Apache NiFi to preprocess data, this gives you a lot of flexibility although it has also its complexity.
There's no way I know of for a script to intercept the communication between a forwarder and an indexer.
Your nodejs script seems like a good approach. An alternative to using HEC is to write the results to a monitored directory and let a forwarder send it to an indexer.