I agree, it would be far easier just to emit the logs with all contents uncompressed. But for performance reasons that's not a great option for our production environment. We periodically emit from a background thread a single log event with a field containing gzip'd and base64 encoded collection of JSON records. Output looks something like this.
2014-10-04 11:37:27 [pool-2-thread-1] INFO : TxnSummaryLogWriter.logTiming JSON_GZIP_TXN_SUMMARIES [[[H4sIAAAAAAAAAOWYS09jRxBG/wq6a9eon9XV3hGwNBPNAGKMFCkaoeruaoQCJjImioT47ynCLHPhLry5zs6Pbln36NTj8/Ow+3vz+HR/z9tbeRyWvz8P......yG+8XHwAA]]]
As these events are written to log, we'd like to get the deltas and unpack the contents to create an event per txn that gets sent to Splunk. I have a simple python script that does this, I just can't seem to figure out how to plug into Splunk's ingest pipeline to run this only on deltas.
Scripted inputs would work, but I don't want to have to worry about identifying deltas in my script - Splunk's great at that. And waiting until the log rolls over means an unwanted delay in getting the info into Splunk.
... View more