I'm looking into using splunk to analyse java GC logs. A particularly useful piece of information is the amount of time that the JVM suspends application threads, especially when using the CMS garbage collector. This data is included in GC log output for recent Hotspot JVMs by specifying the command line options "-XX:+PrintGCApplicationConcurrentTime -XX:+PrintGCApplicationStoppedTime", Unfortunately the resulting messages come very frequently, without timestamps and can come to dominate log volume for a busy system:
Total time for which application threads were stopped: 0.0001350 seconds Application time: 0.0002920 seconds Total time for which application threads were stopped: 0.0000940 seconds Application time: 0.0003570 seconds Total time for which application threads were stopped: 0.0000540 seconds Application time: 0.0038580 seconds Total time for which application threads were stopped: 0.0000710 seconds Application time: 0.0002740 seconds Total time for which application threads were stopped: 0.0000410 seconds Application time: 0.0061750 seconds Total time for which application threads were stopped: 0.0018030 seconds Application time: 0.0002270 seconds Total time for which application threads were stopped: 0.0001000 seconds Application time: 0.0002120 seconds Total time for which application threads were stopped: 0.0007030 seconds Application time: 0.0001580 seconds Total time for which application threads were stopped: 0.0000920 seconds Application time: 0.0004850 seconds Total time for which application threads were stopped: 0.0008960 seconds Application time: 0.0001550 seconds Total time for which application threads were stopped: 0.0002070 seconds Application time: 0.0265690 seconds Total time for which application threads were stopped: 0.0002650 seconds Application time: 0.0023260 seconds Total time for which application threads were stopped: 0.0002500 seconds Application time: 0.0024420 seconds Total time for which application threads were stopped: 0.0001770 seconds Application time: 0.4834220 seconds 2012-07-04T16:53:51.124+0100: 8.613: [GC 8.613: [ParNew: 170496K->21248K(191744K), 0.1399040 secs] 170496K->6 3448K(3562752K), 0.1399890 secs] [Times: user=0.31 sys=0.00, real=0.14 secs]
Ideally, I'd like to aggregate the application time and stopped time until a timestamped GC event occurs, then create and index a synthetic log event with the timestamp of the GC event and the aggregated values.
Is this possible? If so, how?
It's not possible as inbuilt functionality within Splunk itself. What you could do is create a scripted input that will perform the aggregation you want and then output its results to Splunk.
http://docs.splunk.com/Documentation/Splunk/latest/Developer/ScriptedInputsIntro